Icon

03 Date&Time Transformation & In-Database processing - Solution

<p><strong>Learning objective: </strong>In this exercise you will learn how to handle date&amp;time data types and practice in-database data aggregation techniques.</p><p><strong>Workflow description:</strong></p><p>In Activity I, Date&amp;Time nodes are used for filtering and extracting date information. The total sale amount is aggregated for each quarter then the resulting time series data is plotted.</p><p>In Activity II, the data processed in Activity I is written to a database. This data is then read from a database, and the total sale amount is aggregated for each quarter within the database.</p><p>Activity III demonstrates other database-related nodes and focuses on a use case of a <strong>DB Joiner </strong>node to filter data.</p><p><strong>You will find the instructions to the exercises in the yellow annotations.</strong></p>
Activity I: Processing Date&Time data

  1. In the data table from the Excel Reader node, convert a string column Date to Date&Time data type (String to Date&Time node)

  2. In the resulting table, extract Month(number), Quarter and Year to new columns (Date&Time Part Extractor node)

  3. Extract the orders that were submitted on or after January 1, 2019 using the Date&Time-based Row Filter node. Set the Start as January 1, 2019, inclusive. No need to set the End.

  4. Aggregate the quarterly total sale by a Pivot node with Quarter and Year as groupsand StoreType as the pivot. Calculate the Sum of Price as the total sale amount.

  5. Combine Year and Quarter into a single string of the form "Year/Quarter" with Expression node (hint: instruct K-AI to generate an expression, or use string($["Year"]) + "/" + string($["Quarter"]) ). Append the resulting new column as QuarterAndYear.

  6. Use a Sorter node to sort the newly created column is ascending order.

  7. With a Stacked Area Chart node, visualize quarterly total sales of both store types (Online Store and Onsite Store), with QuarterAndYear on the Horizontal dimension.

  8. Optional: analyze seasonality in sales:

    • Let's focus on the output table from the Date&Time-based Row Filter node. Introduce another Pivot node to your workflow to aggregate monthly total sales. Use Month(number) as the group and Year as the pivot. Calculate the Sum of Price as the total sale amount.

    • Plot the resulting monthly total sale data with Line Plot node. Use the column Month (Number) as the Horizontal dimension, and plot the remaining columns, each representing a year.

    • Are there any commonalities (e.g., busy months) across different years?



03 - Date&Time Data Type and In-Database Processing

Learning objective: In this exercise you will learn how to handle date&time data types and practice in-database data aggregation techniques.


Workflow description:

In Activity I, Date&Time nodes are used for filtering and extracting date information. The total sale amount is aggregated for each quarter then the resulting time series data is plotted.

In Activity II, the data processed in Activity I is written to a database. This data is then read from a database, and the total sale amount is aggregated for each quarter within the database.

Activity III demonstrates other database-related nodes and focuses on a use case of a DB Joiner node to filter data.


You will find the instructions to the exercises in the yellow annotations.

Activity II: In-database processing: filtering, pivoting and sorting

  1. Connect to thedatabase.mv.dbdatabase in the data folder with the H2 Connector node*

  2. Write the data from Activity I with extracted Month(number), Quarter and Year fields (i.e., the output table of the Extract Date&Time Fields node) to this database with a DB Writer node as a new table called "CustomerData" with a schema PUBLIC

  3. Add another branch to the workflow by adding a DB Table Selector node and connect it to the H2 Connector node. Select the schema PUBLIC, then the "CustomerData" data table.

  4. From the selected table, extract the orders submitted on or after January 1, 2019 with a DB Row Filter node, by setting a condition Year>=2019.

  5. Aggregate quarterly total sales via a DBPivot node with Quarter and Year as groupsand StoreType as the pivot. Calculate the SUM of the column Price.

  6. Sort the aggregation results by Year andthen by Quarter in ascending order with a DB Sorter node

  7. Read the sorted table as a KNIME data table (DB Reader node)

  8. Use an Expression node to combine Year and Quarter as a single string in the form of "Year/Quarter" (refer to Step 5 of Activity I).

  9. Plot the results with a Stacked Area Chart node and compare the visualization with the one from Activity I. Are they the same?


*Use the KNIME protocol to specify the location to the database file relative to the current workflow (knime://knime.workflow/../../data/database.mv.db).



Activity III(optional): In-database processing with a DB Joiner node

  1. Start with the output of the DB Table Selector node from Activity II. Aggregate data with a DB GroupBy node with the StoreID column as the group, and calculate the SUM of the Price column.

  2. Filter the data such that only the stores with total sale amount above $6000 remain (DB Row Filter node with SUM(Price)>6000)

  3. Filter the original data table CustomerData(the output of the DB Table Selector node, Activity II)to include only the sores with total sale higher than 6000 (DB Joiner node).

    • The original data table should be the input to the top port (i.e., left table) and the table from Step 3 should be the input to the bottom port (i.e., right table).

    • StoreID should be used as the joining column. Use Inner Join as the the Join mode.

  4. Read the filtered database table from the previous step as a KNIME data table (DB Reader node)

Customer Data
Excel Reader
Month, Quarter & Year
Date&Time Part Extractor
Orders after 01.01.2019
Date&Time-based Row Filter
Group: MonthPivot: YearAggregation: sum of Price
Pivot
Total Sales duringthe course of ayear
Line Plot
Group: StoreIDManual Aggregation: Sum(Price)
DB GroupBy
Create a new column as a combination:"Year/Quarter"
Expression
Create a new column as a combination:"Year/Quarter"
Expression
Filter to stores from lower table
DB Joiner
Sort the newly created column
Sorter
Sum(Price) > 6000
DB Row Filter
Select column Date
String to Date&Time
Read as aKNIME table
DB Reader
DB Table Selector
DB Reader
DB Writer
Groups: Quarter, YearPivot: StoreTypeAggregation: sum of Price
Pivot
Connect to thedatabase
H2 Connector
Groups: Quarter, YearPivot: StoreTypeAggregation: sum of Price
DB Pivot
Sort by Year then Quarter
DB Sorter
Total Sales for each StoreTypeplotted quarterly
Stacked Area Chart
Orders after 01.01.2019
DB Row Filter
Total Sales for each StoreTypeplotted quarterly
Stacked Area Chart

Nodes

Extensions

Links