Icon

02.2 ELT on website usage data - transform big data on cloud and Spark

<p>The company tracks the usage of the website and stores the information about different actions during each session: login and logout times, opened pages, clicked buttons, as well as the session satisfaction score (optional) and wants to calculate statistics for each customer, e.g., total number of visits, average satisfaction, etc.</p><p><em>Note. Session satisfaction score column has missing values which can be imputed using machine learning predictions.</em></p><p>We access the website usage data from the local big data environment (set up in the exercise workflow 02.1) and personal data (anonymized &amp; updated in exercise workflow 01) and contracts data from a database. We then perform in-database processing, import data into Spark, enrich the website usage data with the personal and contract data to predict missing session satisfaction scores, and save the aggregated data.</p>

Nodes

Extensions

Links