Icon

03.2_​Missing_​value_​imputation_​on_​Spark

03.2_Missing_value_imputation_on_Spark_solution

The company tracks the usage of the website and stores the information about each session.
- Various data are collected, e.g., session start, duration, # clicks, etc., as well as the session satisfaction score (optional)
- The company calculates averaged statistics for each customer, e.g., total # visits, average satisfaction, etc., and updates the "statistics" table on the database
- Session satisfaction score column has missing values which need to be imputed, e.g., with machine learning predictions.

We access the usage data from Hive and personal data (anonymized & updated in sessions 1 & 2) and contracts data from the PostgreSQL database. We perform in-database processing, read the data into Spark, enrich the usage data with the personal and contract data to predict missing values better, and continue working with the relatively big usage data on Spark. We export the final status of the workflow. In the case some processes fail, we notify responsible people via an automated email.


Transform Session 3 ELT on Big DataExercise 03.2 Missing value imputation on Spark ExercisesAll the tasks can be found in the yellow annotations and the yellow components.Make sure you completed the previous exercises and executed the 03.0_Setup_Local_Big_Data_Environmentworkflow during your current KNIME session before running this workflow. 2 Impute missing valuesOpen the component to find theinstructionsFirst, connect to the database inAccess & Transformationcomponent Provide the credentialsto the database Execute up-streambefore configuration Access &Transformation Missing ValueImputation Transform Session 3 ELT on Big DataExercise 03.2 Missing value imputation on Spark ExercisesAll the tasks can be found in the yellow annotations and the yellow components.Make sure you completed the previous exercises and executed the 03.0_Setup_Local_Big_Data_Environmentworkflow during your current KNIME session before running this workflow. 2 Impute missing valuesOpen the component to find theinstructionsFirst, connect to the database inAccess & Transformationcomponent Provide the credentialsto the database Execute up-streambefore configuration Access &Transformation Missing ValueImputation

Nodes

Extensions

Links