Icon

06_​Data_​Aggregation_​Solution

Data Aggregation - Solution

Solution to exercise 7 for the KNIME Analytics Platform for Data Wranglers course
- Calculate multiple aggregation using the Pivoting and GroupBy node
- Join grouped data
- Extract date fields from a DateTime value
- Handle missing values
- Manipulate strings
- Calculate new features using the Math Formula node




Exercise 6: Data AggregationIn this exercise you generate new columns by aggregating data by groups and create pivot tables that showaggregated values by two or more groups 2a) Calculate the total purchase amount by a quarter and a transaction type (Pivoting)2b) Set missing numeric values to zero2c) Combine the year and quarter information to a single column. Use "/" as separator.(String Manipulation node) 3a) (optional) Calculate the numbers of orders by a basket size and a transaction type 3.1 Calculate the number of purchased products by an order number. Retain theinformation on the transaction type. 3.2 Create a pivot table that shows the numbers of orders by a basket size and atransaction type3b) (optional) Set missing integers to zero3c) (optional) Calculate the proportions of the different basket sizes among eachtransaction type 4) (optional) Extract the birth year. Birth year is the last4 character of the column Birthday. To extract thesecharacters,4.1. Reverse Birthday with the reverse function, thenextract the first 4 characters with the substr function,then finally reverse again to get the birth year. Appendas a new column Year4.2. Convert Year from string to integer with String toNumber (specify Integer). 1a) Calculate the total purchase amount by a customer ID both in 2019 and earlier 1.1 Calculate the sum of product prices by a customer ID in the data for purchasesin 2019 and retain the information whether or not a customer has signed up for thenewsletter as well as the birthday (GroupBy node) 1.2 Rename the aggregated column to "Basket Sum 2019" 1.3 Calculate the sum of product prices by a customer ID concerning thepurchases earlier than 2019. Rename the aggregated column to "Basket Sum Overall" 1.4 Use an inner join to join the aggregated columns on the CustomerID. (for nowwe are only interested in the customers who bought something in 2019 AND before 2019) Integer: 0Double: 0.0BasketSumper CustomerBasketSumper CustomerInteger: 0Share of column total"Year/Quarter"on Customer IDTop 2019Bottom before 2019Birth yearReverse BirthdayFirst 4 charsReverse backConvertbirth year tonumber Pivoting Missing Value GroupBy GroupBy Column Rename Column Rename GroupBy Pivoting Missing Value Math Formula(Multi Column) String Manipulation Joiner Read joined andpreprocessed data Row Splitter String Manipulation String To Number Exercise 6: Data AggregationIn this exercise you generate new columns by aggregating data by groups and create pivot tables that showaggregated values by two or more groups 2a) Calculate the total purchase amount by a quarter and a transaction type (Pivoting)2b) Set missing numeric values to zero2c) Combine the year and quarter information to a single column. Use "/" as separator.(String Manipulation node) 3a) (optional) Calculate the numbers of orders by a basket size and a transaction type 3.1 Calculate the number of purchased products by an order number. Retain theinformation on the transaction type. 3.2 Create a pivot table that shows the numbers of orders by a basket size and atransaction type3b) (optional) Set missing integers to zero3c) (optional) Calculate the proportions of the different basket sizes among eachtransaction type 4) (optional) Extract the birth year. Birth year is the last4 character of the column Birthday. To extract thesecharacters,4.1. Reverse Birthday with the reverse function, thenextract the first 4 characters with the substr function,then finally reverse again to get the birth year. Appendas a new column Year4.2. Convert Year from string to integer with String toNumber (specify Integer). 1a) Calculate the total purchase amount by a customer ID both in 2019 and earlier 1.1 Calculate the sum of product prices by a customer ID in the data for purchasesin 2019 and retain the information whether or not a customer has signed up for thenewsletter as well as the birthday (GroupBy node) 1.2 Rename the aggregated column to "Basket Sum 2019" 1.3 Calculate the sum of product prices by a customer ID concerning thepurchases earlier than 2019. Rename the aggregated column to "Basket Sum Overall" 1.4 Use an inner join to join the aggregated columns on the CustomerID. (for nowwe are only interested in the customers who bought something in 2019 AND before 2019) Integer: 0Double: 0.0BasketSumper CustomerBasketSumper CustomerInteger: 0Share of column total"Year/Quarter"on Customer IDTop 2019Bottom before 2019Birth yearReverse BirthdayFirst 4 charsReverse backConvertbirth year tonumber Pivoting Missing Value GroupBy GroupBy Column Rename Column Rename GroupBy Pivoting Missing Value Math Formula(Multi Column) String Manipulation Joiner Read joined andpreprocessed data Row Splitter String Manipulation String To Number

Nodes

Extensions

Links