Icon

Session-4

Step 7. Logging the events1. When the execution is a success: Create a success status message with the Variable Creator node. A string flow variable Status carriesthe status Success.2. When the execution fails:- The Active Branch Inverter node activate the alternate branch of the workflow for a failure.- The alternate branch includes the Variable Creator node, producing a string flow variable Status carrying the status Failed3. Closing the Try-Catch loop. The flow variable connections from both success and failure branches terminate at the Catch Errors (VarPorts) node.4. Log events with the Logging Component component. The context properties, the execution status, and the execution time stamp arewritten to a log file Logger.xlsx under the current workflow data area. L2-DE Demo Workflow for Session 4Accessing Data from Various Sources, In-Database Processing, and Logging Learning objectives:- Extract structured, unstructured, and semi-structured data from different sources- Use different ways to load data into a database- Perform in-database processing of data- Log events to a log fileWorkflow description: A series of steps in this workflow outlines some basic concepts of data engineering:- Data is extracted from a collection of PDF documents (Step 1)- Semi-structured data is loaded from a MongoDB database (Step 2)- A database is updated with the information from the other sources (Step 3)- Perform in-database processing (Step 5)- Log events during the execution of the workflow (Steps 6 & 7) Step 1. Extract Textual Data from PDF Files1. Locate and list the PDF files with the List Files/Folders node. The folder Store_IT PDFsunder the data folder contains all the PDF documents. Specify that folder with Files in foldermode.2. Convert the folder path to URI, so that the PDF can be accessed with a URI in thedownstream nodes. Use the node Path to URI, specifying Path column as the input.3. Parse the PDFs and read the textual data in it using the Tika Parser URL Input node.Specify the URI column as the File path column. Select PDF and any other file formats ofinterest. In the output table, the column Content contains the text from the PDF documents.4. Extract the license status with the Extract License Status metanode. It outputs a table ofStore_ID and Renewal Status (approved or rejected). Step 2. Extract data from NoSQL Database1. Connect to the MongoDB database with the MongoDB Connector node. The details onthe connection setting are given in the first exercise of Session 4.2. Access the MongoDB Collection using the MongoDB Reader node. Connect the outputof MongoDB Connector node to the Input of MongoDB Reader node. Select Database : L2-DE_Session4 and Collection : License_Renewal. The output is in JSON format.3. Convert the JSON Format to KNIME Table format with the JSON to Table node4. Extract the license validity date with the Extract Validity metanode. The output tablecontains Store_ID and Licence Validity. Step 3. Update the Database Table1. Join the tables with the license status and the license validity with the Joiner node, with Store_ID asthe join column.2. Convert the string column License Validity to Date&Time data by the String to Date&Time node.3. Connect to the SQLite Database with the SQLite Connector node. The database filemy_central_database.sqlite, located under the data folder, should be specified under Path using theKNIME Protocol*.4. Update the Renewal Status and License Validity in the Inspection_Overall table in the SQLiteDatabase with the DB Update node. Select Renewal Status& License Validity as the columns to beupdated and select Store_ID as the identification column* You can specify the database file by knime://knime.workflow/../../data/my_central_database.sqlite Step 4. Extract the Context Properties for Logging1. Use the Extract Context Properties node, to read workflow context related properties, includingworkflow name and the logged in user. This information will be used for logging2. Connect the output of Extract Context Properties node with the input of Try(Variable Port) node Step 5. Perform In-Database processing1. Select the table Orders_table from the SQLite database with the DB TableSelector node2. Join the Orders_table data with the Inspection_Overall table using the DB Joinernode. Use the column Store_ID as the joining column and "Left Outer Join" as thejoining mode.3. Calculate the number of orders for each store across different age groups with theDB Pivot node. In addition to Store_ID, include Renewal Status and License Validityunder the Group tab. Use AgeGroup as the pivoting column, and aggregateOrderNumber by the method SUM.4. Read the processed data into KNIME Analytics Platform with the DB Reader node. Step 6. Export the Results1. Treat missing values in the data table withthe Replace Missing Values metanode.2. Write the treated table as a CSV fileStore_Report.csv under the Currentworkflow data area List of PDF filesin the folderConvert path to URIRead Licenserenewal PDFsNoSQL DatabaseRead MongoDB CollectionTransform MongoDB Documentsto TableDate ConversionUpdate the Inspection Tablein Data WarehouseOrdersJoin License Renewalwith Orders Data Failure messageSuccess message List Files/Folders Path to URI Tika ParserURL Input Extract LicenseStatus MongoDB Connector MongoDB Reader JSON to Table Joiner String to Date&Time Extract Validity Extract ContextProperties Try (VariablePorts) SQLite Connector DB Update DB Table Selector DB Joiner DB Pivot DB Reader CSV Writer Replace MissingValues Active BranchInverter Variable Creator Variable Creator Catch Errors(Var Ports) Logging Step 7. Logging the events1. When the execution is a success: Create a success status message with the Variable Creator node. A string flow variable Status carriesthe status Success.2. When the execution fails:- The Active Branch Inverter node activate the alternate branch of the workflow for a failure.- The alternate branch includes the Variable Creator node, producing a string flow variable Status carrying the status Failed3. Closing the Try-Catch loop. The flow variable connections from both success and failure branches terminate at the Catch Errors (VarPorts) node.4. Log events with the Logging Component component. The context properties, the execution status, and the execution time stamp arewritten to a log file Logger.xlsx under the current workflow data area. L2-DE Demo Workflow for Session 4Accessing Data from Various Sources, In-Database Processing, and Logging Learning objectives:- Extract structured, unstructured, and semi-structured data from different sources- Use different ways to load data into a database- Perform in-database processing of data- Log events to a log fileWorkflow description: A series of steps in this workflow outlines some basic concepts of data engineering:- Data is extracted from a collection of PDF documents (Step 1)- Semi-structured data is loaded from a MongoDB database (Step 2)- A database is updated with the information from the other sources (Step 3)- Perform in-database processing (Step 5)- Log events during the execution of the workflow (Steps 6 & 7) Step 1. Extract Textual Data from PDF Files1. Locate and list the PDF files with the List Files/Folders node. The folder Store_IT PDFsunder the data folder contains all the PDF documents. Specify that folder with Files in foldermode.2. Convert the folder path to URI, so that the PDF can be accessed with a URI in thedownstream nodes. Use the node Path to URI, specifying Path column as the input.3. Parse the PDFs and read the textual data in it using the Tika Parser URL Input node.Specify the URI column as the File path column. Select PDF and any other file formats ofinterest. In the output table, the column Content contains the text from the PDF documents.4. Extract the license status with the Extract License Status metanode. It outputs a table ofStore_ID and Renewal Status (approved or rejected). Step 2. Extract data from NoSQL Database1. Connect to the MongoDB database with the MongoDB Connector node. The details onthe connection setting are given in the first exercise of Session 4.2. Access the MongoDB Collection using the MongoDB Reader node. Connect the outputof MongoDB Connector node to the Input of MongoDB Reader node. Select Database : L2-DE_Session4 and Collection : License_Renewal. The output is in JSON format.3. Convert the JSON Format to KNIME Table format with the JSON to Table node4. Extract the license validity date with the Extract Validity metanode. The output tablecontains Store_ID and Licence Validity. Step 3. Update the Database Table1. Join the tables with the license status and the license validity with the Joiner node, with Store_ID asthe join column.2. Convert the string column License Validity to Date&Time data by the String to Date&Time node.3. Connect to the SQLite Database with the SQLite Connector node. The database filemy_central_database.sqlite, located under the data folder, should be specified under Path using theKNIME Protocol*.4. Update the Renewal Status and License Validity in the Inspection_Overall table in the SQLiteDatabase with the DB Update node. Select Renewal Status& License Validity as the columns to beupdated and select Store_ID as the identification column* You can specify the database file by knime://knime.workflow/../../data/my_central_database.sqlite Step 4. Extract the Context Properties for Logging1. Use the Extract Context Properties node, to read workflow context related properties, includingworkflow name and the logged in user. This information will be used for logging2. Connect the output of Extract Context Properties node with the input of Try(Variable Port) node Step 5. Perform In-Database processing1. Select the table Orders_table from the SQLite database with the DB TableSelector node2. Join the Orders_table data with the Inspection_Overall table using the DB Joinernode. Use the column Store_ID as the joining column and "Left Outer Join" as thejoining mode.3. Calculate the number of orders for each store across different age groups with theDB Pivot node. In addition to Store_ID, include Renewal Status and License Validityunder the Group tab. Use AgeGroup as the pivoting column, and aggregateOrderNumber by the method SUM.4. Read the processed data into KNIME Analytics Platform with the DB Reader node. Step 6. Export the Results1. Treat missing values in the data table withthe Replace Missing Values metanode.2. Write the treated table as a CSV fileStore_Report.csv under the Currentworkflow data area List of PDF filesin the folderConvert path to URIRead Licenserenewal PDFsNoSQL DatabaseRead MongoDB CollectionTransform MongoDB Documentsto TableDate ConversionUpdate the Inspection Tablein Data WarehouseOrdersJoin License Renewalwith Orders Data Failure messageSuccess message List Files/Folders Path to URI Tika ParserURL Input Extract LicenseStatus MongoDB Connector MongoDB Reader JSON to Table Joiner String to Date&Time Extract Validity Extract ContextProperties Try (VariablePorts) SQLite Connector DB Update DB Table Selector DB Joiner DB Pivot DB Reader CSV Writer Replace MissingValues Active BranchInverter Variable Creator Variable Creator Catch Errors(Var Ports) Logging

Nodes

Extensions

Links