Icon

00 Data Generation

Generate 10 years of synthetic data taking into account the dependency columns using the Synthetic Data Generator component.
For the first 5 years the data are consistent with the input data. During the remaining 5 years the data drift for the focus column and its and dependent columns.
The data generated by this workflow are used to test the model monitoring application able to detect data drift.
Learn more about this workflow in the linked blogpost.

2. Generate multiple training sets 1. Calculate dependency columns 3. Generate multiple test sets with a drift Visualization Read adult.csvUS and othersas listgenerated(5 years)shifted(5 years)generatedoriginalgenerated (drift) CSV Reader Table Writer Missing Value Rule Engine Dependency Columns Column Filter Table Rowto Variable GroupBy Shuffle Shuffle Resample 10%from oldest 90% Column Appender Column Appender Table Writer Synthetic DataGenerator (Numeric) Synthetic DataGenerator (Numeric) Variable Loop End Counting Loop Start RecursiveLoop Start Recursive Loop End Add column fordays of the year Create file name Add column fordays of the year Create file name Histogram Histogram Histogram 2. Generate multiple training sets 1. Calculate dependency columns 3. Generate multiple test sets with a drift Visualization Read adult.csvUS and othersas listgenerated(5 years)shifted(5 years)generatedoriginalgenerated (drift)CSV Reader Table Writer Missing Value Rule Engine Dependency Columns Column Filter Table Rowto Variable GroupBy Shuffle Shuffle Resample 10%from oldest 90% Column Appender Column Appender Table Writer Synthetic DataGenerator (Numeric) Synthetic DataGenerator (Numeric) Variable Loop End Counting Loop Start RecursiveLoop Start Recursive Loop End Add column fordays of the year Create file name Add column fordays of the year Create file name Histogram Histogram Histogram

Nodes

Extensions

Links