0 ×

KN-121 Advanced Data Generation v03

Workflow

[KNIME Nodes] KN-121 Advanced Data Generation
[KNIME Nodes] KN-121 Advanced Data Generation Generates Customer Profile sample data using advanced data generation techniques. All of the nodes used in this example come from KNIME. These nodes can be used as an alternative to the Market Simulation data generation capabilities. A comprehensive description of this workflow with step-by-step instructions can be found at the Scientific Strategy website: https://scientificstrategy.com/kn-121/
KNIME Nodes (KN-121) - Advanced Data GenerationGenerates Customer Profile sample data using advanced data generationtechniques. All of the nodes used in this example come from KNIME. These nodescan be used as an alternative to the Market Simulation data generationcapabilities. This example is originally based upon the KNIME workflow called "Scalable DataGeneration" but has been cleaned up with step-by-step explanations. The requiredExtensions include: (1) KNIME Data Generation, and (2) KNIME JSON-Processing.See Also: https://hub.knime.com/knime/workflows/Examples/50_Applications/53_Performance_and_Scalability/01_workflows/00_Scalable_Data_Generation Step 4: Generate an appropriateIncome for all Customers basedupon their Age Generation andwhether they are a Student. Step 5: Shred the Income ofsome Customers based uponthe Probability that they are notcurrently working. Step 6: Clean up the finalCustomer Profiles by roundingthe Income, removing extracolumns, and sorting byCustomerID. Step 1: Create 200 RandomCustomers (48% Male / 52%Female) with Customer RowID'srunning from c0 to c199. Step 2: Use 6 Age PyramidProfiles to randomly allocate theage of these Customers frombetween 17 years old and 100years old. Advanced Data Generation Steps:1. Generate list of 200 Male / Female Customers2. Create a Population Age Pyramid using 6 x Pyramid Profiles3. Assign all Customers an Occupation based upon Age Probability4. Generate an appropriate Income for all Customers5. Shred the Income based upon Probability that actively Working6. Clean up the final Customer Profiles Step 3: Assign an Occupation toeach Customer based upon theirAge and an OccupationProbability.Occupation =Non-StudentOccupation =StudentGenerate Incomesby Age GenerationGenerate RandomStudent IncomeRejoin AllOccupationsAssign Occupationby Age GenerationAssign Familyby Age GenerationRoundAgePopulationAge PyramidShred Income byWorking ProbabilityRemove ColumnHas IncomeRoundIncomeTop = Has IncomeBottom = No IncomeShred IncomeIncome = 0.0Rejoin AllIncome200 CustomersIn Emtpy TableRandom Gender48% Male / 52% FemaleRemove PyramidColumnBin Customer Ageinto GenerationsDefine AgeFrom PyramidSort ByCustomerIDGenerate Counterfor CustomerIDRenameCustomerID Nominal ValueRow Filter (#1) Nominal ValueRow Filter (#2) Gamma DistributedAssigner (#3) Gamma DistributedAssigner (#4) Concatenate (#7) Conditional LabelAssigner (#8) Conditional LabelAssigner (#9) Double To Int (#12) Random LabelAssigner (#13) Conditional LabelAssigner (#14) Column Filter (#15) Double To Int (#16) Row Splitter (#17) Constant ValueColumn (#18) Concatenate (#19) Empty TableCreator (#20) Random LabelAssigner (#21) Column Filter (#22) NumericBinner (#23) Gaussian DistributedAssigner (#24) Sorter (#25) CounterGeneration (#26) Column Rename (#27) KNIME Nodes (KN-121) - Advanced Data GenerationGenerates Customer Profile sample data using advanced data generationtechniques. All of the nodes used in this example come from KNIME. These nodescan be used as an alternative to the Market Simulation data generationcapabilities. This example is originally based upon the KNIME workflow called "Scalable DataGeneration" but has been cleaned up with step-by-step explanations. The requiredExtensions include: (1) KNIME Data Generation, and (2) KNIME JSON-Processing.See Also: https://hub.knime.com/knime/workflows/Examples/50_Applications/53_Performance_and_Scalability/01_workflows/00_Scalable_Data_Generation Step 4: Generate an appropriateIncome for all Customers basedupon their Age Generation andwhether they are a Student. Step 5: Shred the Income ofsome Customers based uponthe Probability that they are notcurrently working. Step 6: Clean up the finalCustomer Profiles by roundingthe Income, removing extracolumns, and sorting byCustomerID. Step 1: Create 200 RandomCustomers (48% Male / 52%Female) with Customer RowID'srunning from c0 to c199. Step 2: Use 6 Age PyramidProfiles to randomly allocate theage of these Customers frombetween 17 years old and 100years old. Advanced Data Generation Steps:1. Generate list of 200 Male / Female Customers2. Create a Population Age Pyramid using 6 x Pyramid Profiles3. Assign all Customers an Occupation based upon Age Probability4. Generate an appropriate Income for all Customers5. Shred the Income based upon Probability that actively Working6. Clean up the final Customer Profiles Step 3: Assign an Occupation toeach Customer based upon theirAge and an OccupationProbability.Occupation =Non-StudentOccupation =StudentGenerate Incomesby Age GenerationGenerate RandomStudent IncomeRejoin AllOccupationsAssign Occupationby Age GenerationAssign Familyby Age GenerationRoundAgePopulationAge PyramidShred Income byWorking ProbabilityRemove ColumnHas IncomeRoundIncomeTop = Has IncomeBottom = No IncomeShred IncomeIncome = 0.0Rejoin AllIncome200 CustomersIn Emtpy TableRandom Gender48% Male / 52% FemaleRemove PyramidColumnBin Customer Ageinto GenerationsDefine AgeFrom PyramidSort ByCustomerIDGenerate Counterfor CustomerIDRenameCustomerID Nominal ValueRow Filter (#1) Nominal ValueRow Filter (#2) Gamma DistributedAssigner (#3) Gamma DistributedAssigner (#4) Concatenate (#7) Conditional LabelAssigner (#8) Conditional LabelAssigner (#9) Double To Int (#12) Random LabelAssigner (#13) Conditional LabelAssigner (#14) Column Filter (#15) Double To Int (#16) Row Splitter (#17) Constant ValueColumn (#18) Concatenate (#19) Empty TableCreator (#20) Random LabelAssigner (#21) Column Filter (#22) NumericBinner (#23) Gaussian DistributedAssigner (#24) Sorter (#25) CounterGeneration (#26) Column Rename (#27)

Download

Get this workflow from the following link: Download

Nodes

KN-121 Advanced Data Generation v03 consists of the following 23 nodes(s):

Plugins

KN-121 Advanced Data Generation v03 contains nodes provided by the following 2 plugin(s):