Icon

04 Transform Data using GroupBy and Joiner

<p><strong>Transform Data: GroupBy and Joiner</strong></p><p>In this workflow, a number of ETL operations are performed on the <em>sales_2008-2011.csv</em> dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers. The one-row description includes:</p><ol><li><p>The customer's unique ID</p></li><li><p>The total amount of money spent by the customer</p></li><li><p>The countries the customer has been active in</p></li><li><p>The date of the first purchase (the "Customer since" date - this is always useful to estimate customer loyalty)</p></li></ol><p>In the end, the one-row description table is filtered to some customers only and lastly joined with the original input data (inner join). The resulting table is written as a CSV file to the workflow data area (a "data" folder located in the workflow folder).</p>

URL: KNIME Learning Center https://www.knime.com/learning
URL: KNIME Cheat Sheet: Building a KNIME workflow for beginners https://www.knime.com/cheat-sheets/building-knime-workflow-beginners
URL: KNIME Cheat Sheet: Data wrangling with KNIME Analytics Platform https://www.knime.com/files/data-wrangling-with-knime.pdf
URL: YouTube: Data Aggregation with KNIME: Simple Aggregations https://youtu.be/Qyv6v26k0hA?si=x3Cw-kL_BThN8ekw
URL: YouTube: Data Aggregation with KNIME: The GroupBy Node https://youtu.be/ZI7v9SUnBxU?si=3YnnFIyDk8gI4W-c
URL: YouTube: Data Merging with KNIME: The Join Operation https://youtu.be/1Rqb1jxb86o?si=mgC9OhfsFnhdsnm3

Aggregate customer data

Instead of a series of contracts we want a one-row summary for each customer.

For each customer, we:

  • Calculate the total amount spent

  • List the number of active countries

  • Derive the "Customer since" date (day of first purchase)

Extract only some customers
Concatenate results using a Concatenate node

Step 1: Drag the "Concatenate" node into the workflow.

Step 2: Click the + on the left side of the node to add one input port. Connect the three "Row Filter" nodes to the input ports of "Concatenate" node.

Step 3: Click on the node to open the configuration window. Decide, how to combine the inputcolumns, how to handle RowIDs, and what to do in case of duplicate RowIDs.

List active customers for each country

Step 1: Double Click on the GroupBy node to open the dialog box.

Step 2: In the "Groups"tab. Select the string column "country". In the "Manual aggregation" tab, select the string column "Cust_ID" and aggregation as "Unique concatenate".

Step 3: Select "OK" and close the dialog box, now right click on the node and select on execute to get the list of customer Ids according to the country.

Join tables using a Joiner node

Step 1: Connect the tables to be joined to the input port of Joiner node.

Step 2: Open the dialog box and in "Matching Criteria" select the joining columns for both the Top input ('left' table) and Bottom input ('right' table).

Step 3: In the "Include in Output" section select the type of join.

Step 4: Select "OK" and close the dialog box, now right click on the node and select on execute to get the output of the join operation.

Transform Data: GroupBy and Joiner


In this workflow, a number of ETL operations are performed on the sales_2008-2011.csv dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers. The one-row description includes:

  1. The customer's unique ID

  2. The total amount of money spent by the customer

  3. The countries the customer has been active in

  4. The date of the first purchase (the "Customer since" date - this is always useful to estimate customer loyalty)

In the end, the one-row description table is filtered to some customers only and lastly joined with the original input data (inner join). The resulting table is written as a CSV file to the workflow data area (a "data" folder located in the workflow folder).

Workflow complete!

Keep the momentum going by exploring Just KNIME It! on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Excludecolumn "card"
Column Filter
Group by each customerand aggregate data
GroupBy
Amount spent < 5000and country = Brazil
Row Filter
Combine the threecustomer summaries
Concatenate
Amount spent > 1000 and < 2000and country = Germany
Row Filter
Sort by "date"in ascending order
Sorter
Write result toContracts Summary.csv
CSV Writer
Find list of activecustomers for each country
GroupBy
Readsales_2008-2011.csv
CSV Reader
Inner joinon "Cust_ID"
Joiner
Amount spent > 5000and country = USA
Row Filter
Convert "date" toDate format
String to Date&Time

Nodes

Extensions

Links