Icon

01 Transform using GroupBy and Joiner nodes

<p><strong>Transform Data using GroupBy and Joiner nodes</strong></p><p>In this workflow, a number of ETL operations are performed on the <em>sales_2008-2011.csv</em> dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers. The one-row description includes:</p><ol><li><p>The customer's unique ID</p></li><li><p>The total amount of money spent by the customer</p></li><li><p>The countries the customer has been active in</p></li><li><p>The date of the first purchase (the "Customer since" date - this is always useful to estimate customer loyalty)</p></li></ol><p>In the end, the one-row description table is filtered to some customers only and lastly joined with the original input data (inner join). The resulting table is written to a CSV file in a "data" folder located in the workflow folder.</p>

URL: KNIME Self Paced Course https://www.knime.com/knime-self-paced-courses
URL: KNIME Cheat Sheet : Building a KNIME Workflow for Beginners https://www.knime.com/sites/default/files/2021-07/CheatSheet_Beginner_A3.pdf
URL: What's data aggregation? - KNIME TV -YouTube https://www.youtube.com/watch?v=bDwF-TOMtWw&list=PLz3mQ6OlTI0YzldqzyyUTBDP4S3oOiVy7
URL: Basic Aggregations with the GroupBy node - KNIME TV - YouTube https://www.youtube.com/watch?v=JQ-OWMt48ew&list=PLz3mQ6OlTI0YzldqzyyUTBDP4S3oOiVy7&index=2
URL: ETL with KNIME. What is a Join operation - KNIME TV - YouTube https://www.youtube.com/watch?v=6BigLM6vbhs&list=PLz3mQ6OlTI0YzldqzyyUTBDP4S3oOiVy7&index=11
URL: ETL with KNIME. The Joiner Node - Part I - KNIME TV - YouTube https://www.youtube.com/watch?v=9uV99ByH-TA&list=PLz3mQ6OlTI0YzldqzyyUTBDP4S3oOiVy7&index=12
URL: ETL with KNIME. The Joiner Node - Part II - KNIME TV - YouTube https://www.youtube.com/watch?v=gj66visISa8&list=PLz3mQ6OlTI0YzldqzyyUTBDP4S3oOiVy7&index=13

Aggregate customer data

Instead of a series of contracts we want a one-row summary for each customer.

For each customer, we:

  • Calculate the total amount spent

  • List the number of active countries

  • Derive the "Customer since" date (day of first purchase)

Extract only some customers
Concatenate results using a Concatenate node

Step 1: Drag the "Concatenate" node to KNIME Analytics Platform.

Step 2:Click on the three dots in the bottom corner of the node, to add one extra input port for the node. Connect the Outputs of the Row Filter nodes to the Input ports of "Concatenate" node.

Step 3: Open the Dialog box and decide on the option for the "Duplicate row ID handling" and "Column handling".

List active customers for each country

Step 1:Double Click on the GroupBy node to open the dialog box.

Step 2: In the "Groups"tab. Select the string column "country". In the "Manual aggregation" tab, select the string column "Cust_ID" and aggregation as "Unique concatenate".

Step 3: Select "OK" and close the dialog box, now right click on the node and select on execute to get the list of customer Ids according to the country.

Join tables using a Joiner node

Step 1:Connect the tables to be joined to the input port of Joiner node.

Step 2: Open the dialog box and in "Matching Criteria" select the joining columns for both the Top input ('left' table) and Bottom input ('right' table).

Step 3: In the "Include in Output" section select the type of join.

Step 4: Select "OK" and close the dialog box, now right click on the node and select on execute to get the output of the join operation.

Transform Data using GroupBy and Joiner nodes


In this workflow, a number of ETL operations are performed on the sales_2008-2011.csv dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers. The one-row description includes:

  1. The customer's unique ID

  2. The total amount of money spent by the customer

  3. The countries the customer has been active in

  4. The date of the first purchase (the "Customer since" date - this is always useful to estimate customer loyalty)

In the end, the one-row description table is filtered to some customers only and lastly joined with the original input data (inner join). The resulting table is written to a CSV file in a "data" folder located in the workflow folder.

Workflow complete!

Keep the momentum going by exploring Just KNIME It!on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Excludecolumn "card"
Column Filter
Group by each customerand aggregate data
GroupBy
Amount spent < 5000and country = Brazil
Row Filter
Concatenate
Amount spent > 1000 and < 2000and country = Germany
Row Filter
Sort by "date"in ascending order
Sorter
Write result toContracts Summary.csv
CSV Writer
find which customers are active in which country
GroupBy
Readsales_2008-2011.csv
CSV Reader
Inner joinon "Cust_ID"
Joiner
Amount spent > 5000and country = USA
Row Filter
Convert "date" toDate format
String to Date&Time

Nodes

Extensions

Links