Icon

02 Production data pipeline with data validation & logging - Exercise

This workflow:
Step 3. Call the data transformation workflow on KNIME Hub

  1. Connect to KNIME Hub space with the Space Connector node

  2. Call the Data_Transformation workflow saved to KNIME Hub with the Call Workflow Service node

    • Select the latest version and the valid execution context


Step 1. Connect to the S3 bucket

  1. Connect to the S3 bucket similar to the exercises in Part 2


Step 2. Call the data validation workflow and enable logging and email alert for failures

  1. Connect to your user space on KNIME Hub with the Space Connector

  2. Call the data validation workflow uploaded to KNIME Business Hub in the previous exercise with the Call Workflow Service node

    • Select the latest version and the valid execution context

  3. Open the Logging and Send Email - Failure components for further instructions


Data Validation
Transform
Load (production DB)

Learning objective: Learn how to use workflow orchestration in a production data pipeline that extracts, validates, transforms, and loads data


Workflow description: This workflow:

  • Reads and validates the data from real data sources,

  • Calls the developed and tested Data Transformation workflow to transform the raw orders data,

  • Loads the transformed data to the production database,

  • Besides, it logs the validation and transformation steps to a log file and sends an email in case of failures.

The workflow then can be deployed as a schedule to run regularly and update the database table.


You'll find the instructions to the exercises in the yellow annotations.

Part 3 - Data governance and best practices

Exercise 02 Production data pipeline with data validation & logging

Extract raw data
Alternative: Read raw data from file

If you don't have access to the S3 bucket, use the raw data from the CSV Reader below instead.

Append:orders_cleaned
DB Writer
Secrets Retriever
Connect to S3 Specify working directory
Amazon S3 Connector
Logging
Send Email - Failure
Top : non-emptyBottom : empty
Switch
Call data trasformationworkflow
Call Workflow Service
Call the datavalidation workflow
Call Workflow Service
Top : Success Bottom : Failure
Switch
Read ordersdata from file
CSV Reader
Logging
Read orders datafrom Amazon S3
CSV Reader
Authenticateagainst AWS
Amazon Authenticator
KNIME Hub Authenticator
Your user space
Space Connector
Your user space
Space Connector
SQLite Connector
Send Email - Failure

Nodes

Extensions

Links