Icon

03_​Date&Time Transformation & In-Database processing - Exercise

Session 3 - Date&Time Data Type and In-Database Processing Learning objective: In this exercise you will learn how to handle date&time data types and practice in-database data aggregation techniques.Workflow description:In Activity I, Date&Time nodes are used for filtering and extracting date information. The total sale amount is aggregated for each quarter then the resulting time series data is plotted.In Activity II, the data processed in Exercise I is written to a database. The data analogous to Exercise I is read from a database, and the total saleamount is aggregated for each quarter within the database.Activity III demonstrates other database-related nodes and focuses on a use case of a DB Joiner node to filter data.You will find the instructions to the exercises in the yellow annotations. Activity I: Processing Date&Time data1. In the data table from the Excel Reader node, convert a string column Date to Date&Time data type (String to Date&Time node).2. In the resulting table, extract Month(number), Quarter and Year to new columns (Extract Date&Time Fields node).3. Extract the orders that were submitted on or after January 1,2019 using the Date&Time-based Row Filter node. Set the Start as January 1, 2019,inclusive. No need to set the End.4. Aggregate the quarterly total sale by a Pivoting node with Quarter and Year as groups and StoreType as the pivot. Calculate the Sum of Price as thetotal sale amount.5. Combine Year and Quarter into a single string of the form "Year/Quarter" with String Manipulation node (hint: use the string function to convert Yearand Quarter into strings, then use the join function to combine them with "/" in between; join(string($Year$),"/",string($Quarter$)) ). Append theresulting new column as QuarterAndYear.6. With a Stacked Area Chart node, visualize quarterly total sales of both store types (Online Store and Onsite Store), with QuarterAndYear on theHorizontal dimension.7. Optional: analyze seasonality in sales:- Let's focus on the output table from the Date&Time-based Row Filter node. Introduce another Pivoting node to your workflow to aggregate monthlytotal sales. Use Month(number) as the group and Year as the pivot. Calculate the Sum of Price as the total sale amount.- Plot the resulting monthly total sale data with Line Plot node. Use the column Month (Number) as the Horizontal dimension, and plot the remainingcolumns, each representing a year.- Are there any commonalities (e.g., busy months) across different years? Activity III(optional): In-database processing with a DB Joiner node1. Start with the output of the DB Table Selector node from Activity II. Aggregate data with a DB GroupBy node with the StoreID column as the group,and calculate the SUM of the Price column.2. Filter the data such that only the stores with total sale amount above $6000 remain (DB Row Filter node with SUM(Price)>6000)3. Filter the original data table CustomerData (the output of the DB Table Selector node, Activity II) to include only the sores with total sale higher than6000 (DB Joiner node).- The original data table should be the input to the top port (i.e., left table) and the table from Step 3 should be the input to the bottom port (i.e., right table).- StoreID should be used as the joining column. Use Inner Join as the the Join mode.4. Read the filtered database table from the previous step as a KNIME data table (DB Reader node) Activity II: In-database processing: filtering, pivoting and sorting1. Connect to the database.mv.db database in the data folder with the H2 Connector node*2. Write the data from Activity I with extracted Month(number), Quarter and Year fields (i.e., the output table of the Extract Date&Time Fields node) tothis database with a DB Writer node as a new table called "CustomerData".3. Add another branch to the workflow by adding a DB Table Selector node and connect it to the H2 Connector node. Select the "CustomerData" datatable.4. From the selected table, extract the orders submitted on or after January 1, 2019 with a DB Row Filter node, by setting a condition Year>=2019.5. Then aggregate quarterly total sales via a DB Pivot node with Quarter and Year as groups and StoreType as the pivot. Calculate the SUM of thecolumn Price.6. Sort the aggregation results by Year then Quarter in ascending order with a DB Sorter node7. Read the sorted table as a KNIME data table (DB Reader node)8. Use a String Manipulation node to combine Year and Quarter as a single string in the form of "Year/Quarter" (refer to Step 5 of Activity I).9. Plot the results with a Stacked Area Chart node and compare the visualization with the one from Activity I. Are they the same?*Use the KNIME protocol to specify the location to the database file relative to the current workflow (knime://knime.workflow/../../data/database.mv.db). Excel Reader Session 3 - Date&Time Data Type and In-Database Processing Learning objective: In this exercise you will learn how to handle date&time data types and practice in-database data aggregation techniques.Workflow description:In Activity I, Date&Time nodes are used for filtering and extracting date information. The total sale amount is aggregated for each quarter then the resulting time series data is plotted.In Activity II, the data processed in Exercise I is written to a database. The data analogous to Exercise I is read from a database, and the total saleamount is aggregated for each quarter within the database.Activity III demonstrates other database-related nodes and focuses on a use case of a DB Joiner node to filter data.You will find the instructions to the exercises in the yellow annotations. Activity I: Processing Date&Time data1. In the data table from the Excel Reader node, convert a string column Date to Date&Time data type (String to Date&Time node).2. In the resulting table, extract Month(number), Quarter and Year to new columns (Extract Date&Time Fields node).3. Extract the orders that were submitted on or after January 1,2019 using the Date&Time-based Row Filter node. Set the Start as January 1, 2019,inclusive. No need to set the End.4. Aggregate the quarterly total sale by a Pivoting node with Quarter and Year as groups and StoreType as the pivot. Calculate the Sum of Price as thetotal sale amount.5. Combine Year and Quarter into a single string of the form "Year/Quarter" with String Manipulation node (hint: use the string function to convert Yearand Quarter into strings, then use the join function to combine them with "/" in between; join(string($Year$),"/",string($Quarter$)) ). Append theresulting new column as QuarterAndYear.6. With a Stacked Area Chart node, visualize quarterly total sales of both store types (Online Store and Onsite Store), with QuarterAndYear on theHorizontal dimension.7. Optional: analyze seasonality in sales:- Let's focus on the output table from the Date&Time-based Row Filter node. Introduce another Pivoting node to your workflow to aggregate monthlytotal sales. Use Month(number) as the group and Year as the pivot. Calculate the Sum of Price as the total sale amount.- Plot the resulting monthly total sale data with Line Plot node. Use the column Month (Number) as the Horizontal dimension, and plot the remainingcolumns, each representing a year.- Are there any commonalities (e.g., busy months) across different years? Activity III(optional): In-database processing with a DB Joiner node1. Start with the output of the DB Table Selector node from Activity II. Aggregate data with a DB GroupBy node with the StoreID column as the group,and calculate the SUM of the Price column.2. Filter the data such that only the stores with total sale amount above $6000 remain (DB Row Filter node with SUM(Price)>6000)3. Filter the original data table CustomerData (the output of the DB Table Selector node, Activity II) to include only the sores with total sale higher than6000 (DB Joiner node).- The original data table should be the input to the top port (i.e., left table) and the table from Step 3 should be the input to the bottom port (i.e., right table).- StoreID should be used as the joining column. Use Inner Join as the the Join mode.4. Read the filtered database table from the previous step as a KNIME data table (DB Reader node) Activity II: In-database processing: filtering, pivoting and sorting1. Connect to the database.mv.db database in the data folder with the H2 Connector node*2. Write the data from Activity I with extracted Month(number), Quarter and Year fields (i.e., the output table of the Extract Date&Time Fields node) tothis database with a DB Writer node as a new table called "CustomerData".3. Add another branch to the workflow by adding a DB Table Selector node and connect it to the H2 Connector node. Select the "CustomerData" datatable.4. From the selected table, extract the orders submitted on or after January 1, 2019 with a DB Row Filter node, by setting a condition Year>=2019.5. Then aggregate quarterly total sales via a DB Pivot node with Quarter and Year as groups and StoreType as the pivot. Calculate the SUM of thecolumn Price.6. Sort the aggregation results by Year then Quarter in ascending order with a DB Sorter node7. Read the sorted table as a KNIME data table (DB Reader node)8. Use a String Manipulation node to combine Year and Quarter as a single string in the form of "Year/Quarter" (refer to Step 5 of Activity I).9. Plot the results with a Stacked Area Chart node and compare the visualization with the one from Activity I. Are they the same?*Use the KNIME protocol to specify the location to the database file relative to the current workflow (knime://knime.workflow/../../data/database.mv.db). Excel Reader

Nodes

Extensions

Links