

The company tracks the usage of the website and stores the information about each session.
- Various data are collected, e.g., session start, duration, # clicks, etc., as well as the session satisfaction score (optional)
- The company calculates averaged statistics for each customer, e.g., total # visits, average satisfaction, etc., and updates the "statistics" table on the database
- Session satisfaction score column has missing values which need to be imputed, e.g., with machine learning predictions.

We access the usage data from Hive and personal data (anonymized & updated in sessions 1 & 2) and contracts data from the PostgreSQL database. We perform in-database processing, read the data into Spark, enrich the usage data with the personal and contract data to predict missing values better, and continue working with the relatively big usage data on Spark. We export the final status of the workflow. In the case some processes fail, we notify responsible people via an automated email.

Transform Session 4 OrchestrationExercise 04.2 ELT Usage This workflow doesn't have any exercises. It already contains configured error handling and orchestration nodes. Their configuration is analogous to the configuration of the nodes in exercise 04.1_ETL_Customers.You can move to the exercise 04.3_Orchestration. There, you will trigger the execution of this workflow. NOTE. Credentials andparameters provided here will beused during this workflowdevelopment and will be ignoredwhen this workflow is called fromthe 04.3 Orchestration workflow.The credentials and parametersprovided in the caller workflowwill be used instead.Status: failureImport credentialsfrom the callerStatus: successTry (VariablePorts) Catch Errors(Var Ports) Variable Creator WorkflowService Input Variable Creator Missing ValueImputation Access &Transformation Saving DB CredentialsConfiguration ParametersConfiguration WorkflowService Output Transform Session 4 OrchestrationExercise 04.2 ELT Usage This workflow doesn't have any exercises. It already contains configured error handling and orchestration nodes. Their configuration is analogous to the configuration of the nodes in exercise 04.1_ETL_Customers.You can move to the exercise 04.3_Orchestration. There, you will trigger the execution of this workflow. NOTE. Credentials andparameters provided here will beused during this workflowdevelopment and will be ignoredwhen this workflow is called fromthe 04.3 Orchestration workflow.The credentials and parametersprovided in the caller workflowwill be used instead.Status: failureImport credentialsfrom the callerStatus: successTry (VariablePorts) Catch Errors(Var Ports) Variable Creator WorkflowService Input Variable Creator Missing ValueImputation Access &Transformation Saving DB CredentialsConfiguration ParametersConfiguration WorkflowService Output


