Read CustomerInfoSystem1.xlsx with the Excel Reader node
Read CustomerInfoSystem2.table with the Table Reader node
Read Stores.csv with the CSV Reader node
Connect to the database with the SQLite Connector node
With two DBTable Selector nodes, select the tables
Transactions
ProductNrAndPrice
Read the tables with two DB Reader nodes
Clean the Customer Information System 1 data
Exclude the first row and first column with the Table Cropper node
Remove rows with duplicate CustomerID with the Duplicate Row Filter node
When the age is missing, insert the mean age of the dataset with the Missing Value node
Repeat the steps of Data Cleaning and Data Transformation for the Customer Information System 2 data
Tidy up the workflow in a Metanode
Upload the workflow to your private space of KNIME Community Hub and create a version of the workflow*
*If you don't have one already, create a free account on hub.knime.com
Sum the Price for each CustomerID with the Row Aggregator node
Write the table containing the total price per customer with the Excel Writer node
Convert the BasketSize with the Number to String node
Show in a BarChart the amount of orders for each BasketSize, for both Online and Onsite Stores
Create a component containing the visualization nodes
Use the Layout Editor to rearrange the interactive view of the component
Select Enable Reporting in the Layout Editor
Add a Report Template Creator node before the component and export its view as a PDF report with the Report PDF Writer node
Calculate the Basket Size of each order (number of products in the order) with the GroupBy node. It can be obtained by counting any column. Retain information about the StoreType
Show the number of orders for each Basket Size in the Online Store and Onsite Store respectively with the Pivot node
In a new branch, repeat step 5.1 with a GroupBy node. This time, for each customer, keep
CustomerID
CustomerGroup
AgeGroup
Sum of Price
Mean Price
Unique Count of ProductNr
Assign a color to each row according to the CustomerGroup column with the Color Manager node
In a Scatter Plot show the Sum(Price) and Unique Count(ProductNr)
Show the count of occurrences for CustomerGroup in a Bar Chart
Show a Parallel Coordinates Plot with
Sum(Price)
Unique Count(ProductNr)
Create a title with the Text View node
Note: apply the color when possible. In the Bar Chart, in the Group Bar option select Frequency Dimension.
Learning objectives: Access, clean, transform, merge and aggregate data to report and visualize the results.
Workflow description: This workflow accesses customer and transactions data, transforms the data and generates an interactive view with multiple visualizations.
You’ll find the use case information in the blue annotations and the instructions to the exercises in the yellow annotations.
Pauline from Sales would like to write a report and visualize the insights on the customer transaction data her team collects every month. This includes the total price per customer and the basket size per store type. Since Pauline receives the data from multiple sources each month, which needs blending, cleaning, and transformation before it can be put into a report, she has reached out to your data team for help to automate the process and thereby save time.
Your continuous project will be to help Pauline build a workflow to report the monthly insights, by completing each step of the process she repeats every month below.
Pauline receives the data from multiple sources, including files, and data base tables. Your first task is to start building the workflow by accessing the different data sources.
Now that the data access is established, Pauline asks your help in cleaning the data, since there seems to be duplicates, missing values, and some columns that are not needed.
Next, you need to merge the cleaned data. To do so, Pauline outlines how she normally transforms some of the data in the rows and reshapes some columns to create the desired form.
Pauline is happy that the data is ready to be exported. Her team needs the results from 5.1 as an Excel file, 5.2 as a Bar Chart, and 5.3. in an interactive dashboard that can be exported as a pdf.
With the data cleaned and in the correct shape, you can now help Pauline to merge the different data sources, by stacking tables and matching or joining others to add the necessary columns.
You have now reached the point where the Pauline's data can be aggregated to get the necessary insight concerning the price and basket sizes for customers and stores.
Concatenate the two customer tables with the Concatenate node
Append the StoreType (in Stores.csv) and Price (from the ProductNrAndPrice DB table) to each transaction (stored in the Transactions DB table) with two Value Lookup nodes
Join the Customer and Transaction tables on the CustomerID column with the Joiner node.
Note: The first output port of the Joiner shows the rows with a match in the two tables. Use only these rows in the next steps.
Create a new column AgeGroup with values Adolescent, Adult, Older Adult according to the age value: <18 Adolescents, 18-65 Adults, >65 Older Adults
Replace the character "_" with a space " " from the column country Country
Split the column CustomerID on the character "_" with the Cell Splitter node
Rename the columns created by the split with the Column Renamer node
Merge the columns Email and CorporateEmail with the Column Merger node. Keep the Email when both are available.
Convert the column Newsletter with the Number to String node
Perform instructions 1 and 2 with two Expression nodes. Click Ask K-AI and copy paste the text of the instruction in the prompt.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.