Icon

02_​ETL_​Basics Exercicio da internet

Example Workflow for ETL Basics Operations

On sales2008-2011.csv data set a number of ETL operations are performed. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to a one-row summary description for each one of the customers. The one-row description includes: the customer unique ID; the total amount of money payed by the customer to the company; the countries the customer has been active in; the date of the first contract (this is always useful to estimate the customer loyalty); and the number of days between the first and the last purchase, that is the number of days the customer has been with the company. At the end, each one-row customer summary information will be joined together with each contract data row from the original file and write the resulting table to a CSV file in a "data" folder located in the workflow folder.

Example Workflow for ETL Basics OperationsOn sales2008-2011.csv data set a number of ETL operations are performed. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to aone-row summary description for each one of the customers. The one-row description includes: 1. the customer unique ID 2. the total amount of money payed by the customer to the company 3. the countries the customer has been active in 4. the date of the first contract (this is always useful to estimate the customer loyalty) 5. the number of days between the first and the last purchase, that is the number of days the customer has been with the company At the end, each one-row customer summary information will be joined together with each contract data row from the original file and write the resulting table to a CSV file in a "data" folder located in the workflow folder. Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers sales records 2008-2011excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyyFile Reader Column Filter GroupBy GroupBy Sorter Joiner Row Filter Row Filter Row Filter Row Filter Concatenate CSV Writer String to Date&Time Example Workflow for ETL Basics OperationsOn sales2008-2011.csv data set a number of ETL operations are performed. Besides showing what ETL features are, the goal of this workflow is to move from a series of contracts with different customers in different countries to aone-row summary description for each one of the customers. The one-row description includes: 1. the customer unique ID 2. the total amount of money payed by the customer to the company 3. the countries the customer has been active in 4. the date of the first contract (this is always useful to estimate the customer loyalty) 5. the number of days between the first and the last purchase, that is the number of days the customer has been with the company At the end, each one-row customer summary information will be joined together with each contract data row from the original file and write the resulting table to a CSV file in a "data" folder located in the workflow folder. Move from a series ofcontracts to a one-rowsummary for eachcustomer. Extract only some customers sales records 2008-2011excludecolumn "card"For each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasefind which customersare active in which countrysort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*country = *USA*1000 < sum(amount) < 2000sum(amount) > 5000write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyyFile Reader Column Filter GroupBy GroupBy Sorter Joiner Row Filter Row Filter Row Filter Row Filter Concatenate CSV Writer String to Date&Time

Nodes

Extensions

Links