Icon

01_​Transform_​Using_​Group_​By_​and_​Joiner_​nodes

Transform Data using GroupBy and Joiner nodes

In this workflow, a number of ETL operations are performed on the sales2008-2011.csv dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers.
The one-row description includes:
1. the customer's unique ID
2. the total amount of money paid by the customer to the company
3. the countries the customer has been active in
4. the date of the first contract (this is always useful to estimate customer loyalty)
5. the number of days between the first and the last purchase, that is the number of days the customer has been with the company
In the end, each one-row customer summary information is joined together with each contract data row from the original file and the resulting table is written to a CSV file in a "data" folder located in the workflow folder.





Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers Find the list of unique customers accordingto Country using Group By nodeStep 1: Double Click on the GroupBy node to open thedialog boxStep 2: In the "Groups" tab. Select the string column"country". In the "Manual aggregation" tab, select the string column"Cust_ID" and aggregation as "Unique concatenate"Step 3 : Select "OK" and close the dialog box, now rightclick on the node and select on execute to get the list ofcustomer Ids according to the country Join the Tables using Joiner Node Step 1: Connect the tables to be joined to the input port ofJoiner nodeStep 2: Open the dialog box and in "Joiner Settings" tabselect the key column in both the tables that will be used tojoin themStep 3: In the "Include in Output" section select the type ofjoinStep 4: Select "OK" and close the dialog box, now right clickon the node and select on execute to get the output of thejoin operation Concatenate the results using Concatenate nodeStep 1: Drag the "Concatenate" node to KNIME AnalyticsPlatformStep 2: Click on the three dots in the bottom corner of thenode, to add one extra input port for the node. Connect theOutputs of the Row Filter nodes to the Input ports of"Concatenate" nodeStep 3: Open the Dialog box and decide on the option forthe "Duplicate row ID handling" and "Column handling"Step 4: Select "OK" and close the dialog box, now right clickon the node and select on execute to get the output of thejoin operation excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending datecountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000convert dates from String to DateTimeformat dd.MM.yyyysales records 2008-2011inner join of original datawith aggregationson Cust_IDwrite to csv filesum(amount) < 5000country = *Brazil* Column Filter GroupBy GroupBy Sorter Row Filter Row Filter Row Filter Row Filter Concatenate String to Date&Time File Reader Joiner CSV Writer Row Filter Row Filter Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers Find the list of unique customers accordingto Country using Group By nodeStep 1: Double Click on the GroupBy node to open thedialog boxStep 2: In the "Groups" tab. Select the string column"country". In the "Manual aggregation" tab, select the string column"Cust_ID" and aggregation as "Unique concatenate"Step 3 : Select "OK" and close the dialog box, now rightclick on the node and select on execute to get the list ofcustomer Ids according to the country Join the Tables using Joiner Node Step 1: Connect the tables to be joined to the input port ofJoiner nodeStep 2: Open the dialog box and in "Joiner Settings" tabselect the key column in both the tables that will be used tojoin themStep 3: In the "Include in Output" section select the type ofjoinStep 4: Select "OK" and close the dialog box, now right clickon the node and select on execute to get the output of thejoin operation Concatenate the results using Concatenate nodeStep 1: Drag the "Concatenate" node to KNIME AnalyticsPlatformStep 2: Click on the three dots in the bottom corner of thenode, to add one extra input port for the node. Connect theOutputs of the Row Filter nodes to the Input ports of"Concatenate" nodeStep 3: Open the Dialog box and decide on the option forthe "Duplicate row ID handling" and "Column handling"Step 4: Select "OK" and close the dialog box, now right clickon the node and select on execute to get the output of thejoin operation excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending datecountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000convert dates from String to DateTimeformat dd.MM.yyyysales records 2008-2011inner join of original datawith aggregationson Cust_IDwrite to csv filesum(amount) < 5000country = *Brazil* Column Filter GroupBy GroupBy Sorter Row Filter Row Filter Row Filter Row Filter Concatenate String to Date&Time File Reader Joiner CSV Writer Row Filter Row Filter

Nodes

Extensions

Links