Icon

02 Production data pipeline with data validation & logging

This workflow:
Step 1. Connect to the S3 bucket

  1. Connect to the S3 bucket similar to the exercises in Part 2


Step 3. Call the data transformation workflow deployed in the CDDS Production space on the KNIME Business Hub and enable logging and email alert for failures

  1. Connect to KNIME Business Hub space with the Space Connector node

    • Select the CDDS Production space where your validated and deployed Data Transformation workflow resides

  2. Call the Data Transformation workflow residing in your project folder in the CDDS Production space with the Call Workflow Service node

    • Select the latest version and the valid execution context


Step 2. Call the data validation workflow and enable logging and email alert for failures

  1. Connect to your user space on KNIME Business Hub with the Space Connector

  2. Call the data validation workflow uploaded to KNIME Business Hub in the previous exercise with the Call Workflow Service node

    • Select the latest version and the valid execution context

  3. Open the Logging and Send Email - Failure components for further instructions


Data Validation
Transform
Load (production DB)

Learning objective: Learn how to use workflow orchestration in a production data pipeline that extracts, validates, transforms, and loads data


Workflow description: This workflow:

  • Reads and validates the data from real data sources,

  • Calls the developed and tested Data Transformation workflow to transform the raw orders data,

  • Loads the transformed data to the production database,

  • Besides, it logs the validation and transformation steps to a log file and sends an email in case of failures.

The workflow then can be deployed as a schedule to run regularly and update the database table.


You'll find the instructions to the exercises in the yellow annotations.

Part 4 - Best Practices to productionize data pipelines

Exercise 02 Production data pipeline with data validation & logging

Extract raw data
Secrets Retriever
Connect to S3 Specify working directory
Amazon S3 Connector
Logging
Send Email - Failure
Top : non-emptyBottom : empty
Switch
Call data trasformationworkflow
Call Workflow Service
Call the datavalidation workflow
Call Workflow Service
Top : Success Bottom : Failure
Switch
Logging
Read orders datafrom Amazon S3
CSV Reader
Authenticateagainst AWS
Amazon Authenticator
KNIME Hub Authenticator
CDDS Production
Space Connector
Your user space
Space Connector
SQLite Connector
Send Email - Failure
Append:orders_cleaned
DB Writer (deprecated)

Nodes

Extensions

Links