Icon

02_​ETL_​Basics

Example Workflow for ETL Basics Operations

In this workflow, a number of ETL operations are performed on the sales2008-2011.csv dataset. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers.
The one-row description includes:
1. the customer unique ID
2. the total amount of money payed by the customer to the company
3. the countries the customer has been active in
4. the date of the first contract (this is always useful to estimate the customer loyalty)
5. the number of days between the first and the last purchase, that is the number of days the customer has been with the company
At the end, each one-row customer summary information is joined together with each contract data row from the original file and the resulting table is written to a CSV file in a "data" folder located in the workflow folder.



The goal of this workflow is to move from a series of contracts with different customers in different countries toa one-row summary description for each one of the customers.For more information see the workflow metadata. Find it here: View -> Description Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers sales records 2008-2011excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyy File Reader Column Filter GroupBy GroupBy Sorter Joiner Row Filter Row Filter Row Filter Row Filter Concatenate CSV Writer String to Date&Time The goal of this workflow is to move from a series of contracts with different customers in different countries toa one-row summary description for each one of the customers.For more information see the workflow metadata. Find it here: View -> Description Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers sales records 2008-2011excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyy File Reader Column Filter GroupBy GroupBy Sorter Joiner Row Filter Row Filter Row Filter Row Filter Concatenate CSV Writer String to Date&Time

Nodes

Extensions

Links