Icon

01 Data validation & Statistics Process Control

This workflow implements the input data validation and checks:
Input data validation
Step 1. Empty table check

  1. Execute the Extract Table Dimension node - it exports the flow variable Number Rows

  2. Add the Breakpoint node and enable the breakpoint when variable Number Rows = 0


Learning objective: Learn how to create a workflow for input data validation


Workflow description: This workflow implements the input data validation and checks:

  • Whether input data table is empty,

  • Whether input data has all the required columns,

  • Statistical properties of the input data,

  • And raises error if any of the tests fails.


You'll find the instructions to the exercises in the yellow annotations.

Part 3 - Data governance and best practices

Exercise 01 Data validation & Statistics Process Control

Step 2. Table structure validation

  1. Add the Table Validator node between the Workflow Service Input and the Breakpoint nodes

  2. In the configuration of the Table Validator node, add a new column group by double clicking on the column and drag all the other columns to this group

  3. Add the Breakpoint node and enable the breakpoint for inactive branch


Step 3. Statistical Process Control

Create an error trigger if the maximal price is unexpectedly high:

  1. Use the Expression node to calculate the maximum price in the Price column.

  2. Convert the maximum price to a flow variable with Table Row to Variable node.

  3. Use the Variable Expression node to check if the maximal price is higher than 1000 and append a new variable price_flag of type integer with the value 1 if the maximal price is higher, and 0 otherwise.

  4. Add the Breakpoint node and enable the breakpoint when variable price_flag = 1


Step 4. Upload to KNIME Business Hub

Reset the workflow and upload to your user space on KNIME Business Hub

Status in case of Success
Variable Creator
Merge Variables
Active Branch Inverter
Column Renamer
Max price
Expression
Capture informationrelated to testexecution
Variable to Table Row
Wrong data types, missingor unknown columns > failure
Breakpoint
Example data forworkflow developmentWill be replaced bythe data from the caller
Table Reader
Workflow execution status and metadata
Workflow Output
Status in case of Failure
Variable Creator
Max price > 1000-> failure
Breakpoint
Export status and metadata
Catch Errors (Var Ports)
Price to double
String to Number
Number of rows
Extract Table Dimension
Column Resorter
Table Row to Variable
Raise error ifmax price > 1000
Variable Expression
Try (Variable Ports)
Workflow Metadata
Workflow Input
Check data types &columns presence
Table Validator
Rows number = 0 => failure
Breakpoint

Nodes

Extensions

Links