Icon

02_​Data_​Manipulation_​Exercise_​1_​C

SOLUTION Manipulating, preparing and cleaning data - Exercise 1 We will create a small report table showing Europe monthly shipments by SMO for our Fabric Care Flagship business(Ariel laundry detergent and Lenor Fabric Enhancers) in the last 12 months. We have the following input files: 1. Monthly shipments for global Fabric Care 2. Shipments data for Russia that was actually missing in file #1 3. Corrections to shipments data that were realeased afterwards 4. Reference dataset in the format we expect our shipments data to be 5. Europe Geographic hierarchy mapping file From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Russia.csvfrom _INPUT subfolder Use Manipulation > Column > Split &Combine > Column Merger node toreplace the value of Shipment (MSU)column with corrected value whenavailable. Use Manipulation > Column > Split &Combine > Joiner node to left outerjoin corrected shipment data as anew column. Use Manipulation > Row > Transform> Concatenate node to merge theRussia data into the shipmentdataset. From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Corrections.csvfrom _INPUT subfolder From the KNIME Explorer, drag anddrop FC_Shipments_Monthly.csvfrom _INPUT subfolder Use Manipulation > Column > Filter >Missing Value node to set the blankshipment entries to 0. Use Manipulation > Column > Split &Combine > Joiner node to both:- join the Geo Hierarchy- subset the data to Europe only(Try different join mode: Left OuterJoin, Inner Join) From the KNIME Explorer, drag anddrop Geo_Hierarchy.csv from _INPUTsubfolder Use Manipulation > Column > Filter >Reference Column Filter node to keeponly the same column as the one inthe reference datastructure.(Connect the reference datasetloaded in part 1 of the exercise) Use Manipulation > Column > Convert& Replace > Constant Value Columnnode to create a new column named'Company' with 'Procter & Gamble' asfixed value.(Connect the dataset loaded in part 1of the exercise) Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). The structure should be thesame as the one fromReference_Format.table (use thelatter as the reference in the TableValidator). Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). Use IO > Read > Table Reader nodeto load the reference data structurefile: Reference_Format.table Manipulating, preparing and cleaning data - Exercise 1 - part B We check if the dataset format is in line with a reference data structure. We reshape the dataset to make it consistent with the reference format. Manipulating, preparing and cleaning data - Exercise 1 - part C We join the Geo hierarchy for Europe. We also handle missing values. Manipulating, preparing and cleaning data - Exercise 1 - part A We load the shipments data (Fabric_Care_Shipments_Monthly.csv), add to the missing data for Russia (Fabric_Care_Shipments_Monthly_Russia.csv) and update some incorrect datapoints (Fabric_Care_Shipments_Monthly_Corrections.csv). Use Manipulation > Column > Filter >Missing Value node to set the blankshipment entries to 0. From the KNIME Explorer, drag anddrop Geo_Hierarchy.csv from _INPUTsubfolder Use Manipulation > Column > Split &Combine > Joiner node to both:- join the Geo Hierarchy- subset the data to Europe only(Try different join mode: Left OuterJoin, Inner Join) Use IO > Read > Table Reader nodeto load the reference data structurefile: Reference_Format.table Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). The structure should be thesame as the one fromReference_Format.table (use thelatter as the reference in the TableValidator). Use Manipulation > Column > Convert& Replace > Constant Value Columnnode to create a new column named'Company' with 'Procter & Gamble' asfixed value.(Connect the dataset loaded in part 1of the exercise) Use Manipulation > Column > Filter >Reference Column Filter node to keeponly the same column as the one inthe reference datastructure.(Connect the reference datasetloaded in part 1 of the exercise) Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). Use Manipulation > Row > Transform> Concatenate node to merge theRussia data into the shipmentdataset. Use Manipulation > Column > Split &Combine > Column Merger node toreplace the value of Shipment (MSU)column with corrected value whenavailable. From the KNIME Explorer, drag anddrop FC_Shipments_Monthly.csvfrom _INPUT subfolder From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Corrections.csvfrom _INPUT subfolder Use Manipulation > Column > Split &Combine > Joiner node to left outerjoin corrected shipment data as anew column. From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Russia.csvfrom _INPUT subfolder Manipulating, preparing and cleaning data - Exercise 1 - part A We load the shipments data (Fabric_Care_Shipments_Monthly.csv), add to the missing data for Russia (Fabric_Care_Shipments_Monthly_Russia.csv) and update some incorrect datapoints (Fabric_Care_Shipments_Monthly_Corrections.csv). Manipulating, preparing and cleaning data - Exercise 1 - part B We check if the dataset format is in line with a reference data structure. We reshape the dataset to make it consistent with the reference format. Manipulating, preparing and cleaning data - Exercise 1 - part C We join the Geo hierarchy for Europe. We also handle missing values. Part A Merge shipments data together Part B Check that merged data is correctly formatted, adjust if needed Part C Join the geographic hierarchy Part D Filter last 12 months of data for FC Flagship business and create the table Geo_Hierarchy.csv(Europe only)Inner JoinGeo HierarchyLeft Outer JoinGeo HierarchyShipments missing valuesas 0.0FC_Shipments_Monthly_Corrections.csvCorrect dataCheck input fileformatAdd Company columnwith 'Procter & Gamble'as valueFC_Shipments_Monthly_Russia.csvReference_Format.tableMerging withRussia datasetKeep onlyspecifiedcolumnCheck input fileformatJoining correctionsif anyFC_Shipments_Monthly.csvCheck input fileformatKeep onlyspecifiedcolumnCheck input fileformatAdd Company columnwith 'Procter & Gamble'as valueReference_Format.tableJoining correctionsif anyFC_Shipments_Monthly.csvFC_Shipments_Monthly_Corrections.csvMerging withRussia datasetFC_Shipments_Monthly_Russia.csvCorrect data File Reader Joiner Joiner Missing Value File Reader Column Merger Table Validator(Reference) ConstantValue Column File Reader Table Reader Concatenate ReferenceColumn Filter Table Validator(Reference) Joiner File Reader Table Validator(Reference) ReferenceColumn Filter Table Validator(Reference) ConstantValue Column Table Reader Joiner File Reader File Reader Concatenate File Reader Column Merger SOLUTION Manipulating, preparing and cleaning data - Exercise 1 We will create a small report table showing Europe monthly shipments by SMO for our Fabric Care Flagship business(Ariel laundry detergent and Lenor Fabric Enhancers) in the last 12 months. We have the following input files: 1. Monthly shipments for global Fabric Care 2. Shipments data for Russia that was actually missing in file #1 3. Corrections to shipments data that were realeased afterwards 4. Reference dataset in the format we expect our shipments data to be 5. Europe Geographic hierarchy mapping file From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Russia.csvfrom _INPUT subfolder Use Manipulation > Column > Split &Combine > Column Merger node toreplace the value of Shipment (MSU)column with corrected value whenavailable. Use Manipulation > Column > Split &Combine > Joiner node to left outerjoin corrected shipment data as anew column. Use Manipulation > Row > Transform> Concatenate node to merge theRussia data into the shipmentdataset. From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Corrections.csvfrom _INPUT subfolder From the KNIME Explorer, drag anddrop FC_Shipments_Monthly.csvfrom _INPUT subfolder Use Manipulation > Column > Filter >Missing Value node to set the blankshipment entries to 0. Use Manipulation > Column > Split &Combine > Joiner node to both:- join the Geo Hierarchy- subset the data to Europe only(Try different join mode: Left OuterJoin, Inner Join) From the KNIME Explorer, drag anddrop Geo_Hierarchy.csv from _INPUTsubfolder Use Manipulation > Column > Filter >Reference Column Filter node to keeponly the same column as the one inthe reference datastructure.(Connect the reference datasetloaded in part 1 of the exercise) Use Manipulation > Column > Convert& Replace > Constant Value Columnnode to create a new column named'Company' with 'Procter & Gamble' asfixed value.(Connect the dataset loaded in part 1of the exercise) Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). The structure should be thesame as the one fromReference_Format.table (use thelatter as the reference in the TableValidator). Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). Use IO > Read > Table Reader nodeto load the reference data structurefile: Reference_Format.table Manipulating, preparing and cleaning data - Exercise 1 - part B We check if the dataset format is in line with a reference data structure. We reshape the dataset to make it consistent with the reference format. Manipulating, preparing and cleaning data - Exercise 1 - part C We join the Geo hierarchy for Europe. We also handle missing values. Manipulating, preparing and cleaning data - Exercise 1 - part A We load the shipments data (Fabric_Care_Shipments_Monthly.csv), add to the missing data for Russia (Fabric_Care_Shipments_Monthly_Russia.csv) and update some incorrect datapoints (Fabric_Care_Shipments_Monthly_Corrections.csv). Use Manipulation > Column > Filter >Missing Value node to set the blankshipment entries to 0. From the KNIME Explorer, drag anddrop Geo_Hierarchy.csv from _INPUTsubfolder Use Manipulation > Column > Split &Combine > Joiner node to both:- join the Geo Hierarchy- subset the data to Europe only(Try different join mode: Left OuterJoin, Inner Join) Use IO > Read > Table Reader nodeto load the reference data structurefile: Reference_Format.table Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). The structure should be thesame as the one fromReference_Format.table (use thelatter as the reference in the TableValidator). Use Manipulation > Column > Convert& Replace > Constant Value Columnnode to create a new column named'Company' with 'Procter & Gamble' asfixed value.(Connect the dataset loaded in part 1of the exercise) Use Manipulation > Column > Filter >Reference Column Filter node to keeponly the same column as the one inthe reference datastructure.(Connect the reference datasetloaded in part 1 of the exercise) Use Manipulation > Column > TableValidator node to check the structureof the dataset (e.g. all requiredcolumns in the right data domain arepresent). Use Manipulation > Row > Transform> Concatenate node to merge theRussia data into the shipmentdataset. Use Manipulation > Column > Split &Combine > Column Merger node toreplace the value of Shipment (MSU)column with corrected value whenavailable. From the KNIME Explorer, drag anddrop FC_Shipments_Monthly.csvfrom _INPUT subfolder From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Corrections.csvfrom _INPUT subfolder Use Manipulation > Column > Split &Combine > Joiner node to left outerjoin corrected shipment data as anew column. From the KNIME Explorer, drag anddropFC_Shipments_Monthly_Russia.csvfrom _INPUT subfolder Manipulating, preparing and cleaning data - Exercise 1 - part A We load the shipments data (Fabric_Care_Shipments_Monthly.csv), add to the missing data for Russia (Fabric_Care_Shipments_Monthly_Russia.csv) and update some incorrect datapoints (Fabric_Care_Shipments_Monthly_Corrections.csv). Manipulating, preparing and cleaning data - Exercise 1 - part B We check if the dataset format is in line with a reference data structure. We reshape the dataset to make it consistent with the reference format. Manipulating, preparing and cleaning data - Exercise 1 - part C We join the Geo hierarchy for Europe. We also handle missing values. Part A Merge shipments data together Part B Check that merged data is correctly formatted, adjust if needed Part C Join the geographic hierarchy Part D Filter last 12 months of data for FC Flagship business and create the table Geo_Hierarchy.csv(Europe only)Inner JoinGeo HierarchyLeft Outer JoinGeo HierarchyShipments missing valuesas 0.0FC_Shipments_Monthly_Corrections.csvCorrect dataCheck input fileformatAdd Company columnwith 'Procter & Gamble'as valueFC_Shipments_Monthly_Russia.csvReference_Format.tableMerging withRussia datasetKeep onlyspecifiedcolumnCheck input fileformatJoining correctionsif anyFC_Shipments_Monthly.csvCheck input fileformatKeep onlyspecifiedcolumnCheck input fileformatAdd Company columnwith 'Procter & Gamble'as valueReference_Format.tableJoining correctionsif anyFC_Shipments_Monthly.csvFC_Shipments_Monthly_Corrections.csvMerging withRussia datasetFC_Shipments_Monthly_Russia.csvCorrect data File Reader Joiner Joiner Missing Value File Reader Column Merger Table Validator(Reference) ConstantValue Column File Reader Table Reader Concatenate ReferenceColumn Filter Table Validator(Reference) Joiner File Reader Table Validator(Reference) ReferenceColumn Filter Table Validator(Reference) ConstantValue Column Table Reader Joiner File Reader File Reader Concatenate File Reader Column Merger

Nodes

Extensions

Links