Icon

01_​KNIME_​Workshop_​exercise_​reset

Example Workflow for ETL Basics Operations
Exercise #1 - just execute!Please inspect the workflow and execute it! Example Exercise Workflow for ETL Basics Operations (adopted from KNIME example "02_ETL_Basics")Number of ETL tasks is asked to be performed on sales2008-2011.csv data set. Exercises are increasing in difficulty. Exercise #2 - build the workflow from given pieces!Please make the same flow as in excercise#1 but with only given nodes types below (hint: make copy-paste some already used above). Exercise #3 - Explore the data and transform it to insight you can use for decision making or recomendation.!Firsly prepare additional features (data) you probably will use in further analysis:0. Check the data format and change it to proper one (hint string to date&time)1. Find the price of each product (hint: groupby and math formula) and add this information to the data set (hint: join). does card flag influence the price of product? (hint: new group in groupby)2. Calculate sum of amount and average price paid by each customer - on all products alltogether) (hint: groupby and math formula), rank customers (1 = highest average price) and add thisinformation to data set (total amount, price and rank) already enhanced with task#1 information (product price). can you spot the outling customers?3. Extract the Year from data and calculate amount and quantity for the each following years (hint: extract date&time fileds). Can you explain significant decrease in last year?Now try to use the data you have created for generating valuable insight:4. Pivot the data so you have customer/rank (or avg price per customer)/country/total amount for customers in rows and years in columns - aggregate on sum of quantity / amount (or try even both).Sort if you want (by rank or total amount for example). - Which two customers should you focus on in the first place (which are decreasing in sales mostly in 2011)? - Is the insight you have found also country specific (applies to other customer from given country)?Now you have the insight which could be much in value... what next? sales records 2008-2011for each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasesort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*1000 < sum(amount) < 2000Node 30write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyysum(amount) > 5000 ANDcountry = *USA*clean beforeexportsales records 2008-2011sales records 2008-2011Node 75Node 76Node 77Node 78Node 79Node 80 File Reader GroupBy Sorter Joiner Row Filter Row Filter Concatenate CSV Writer String to Date&Time Rule-basedRow Filter Column Filter File Reader File Reader String to Date&Time Sorter GroupBy Rule-basedRow Filter ReferenceRow Filter CSV Writer Exercise #1 - just execute!Please inspect the workflow and execute it! Example Exercise Workflow for ETL Basics Operations (adopted from KNIME example "02_ETL_Basics")Number of ETL tasks is asked to be performed on sales2008-2011.csv data set. Exercises are increasing in difficulty. Exercise #2 - build the workflow from given pieces!Please make the same flow as in excercise#1 but with only given nodes types below (hint: make copy-paste some already used above). Exercise #3 - Explore the data and transform it to insight you can use for decision making or recomendation.!Firsly prepare additional features (data) you probably will use in further analysis:0. Check the data format and change it to proper one (hint string to date&time)1. Find the price of each product (hint: groupby and math formula) and add this information to the data set (hint: join). does card flag influence the price of product? (hint: new group in groupby)2. Calculate sum of amount and average price paid by each customer - on all products alltogether) (hint: groupby and math formula), rank customers (1 = highest average price) and add thisinformation to data set (total amount, price and rank) already enhanced with task#1 information (product price). can you spot the outling customers?3. Extract the Year from data and calculate amount and quantity for the each following years (hint: extract date&time fileds). Can you explain significant decrease in last year?Now try to use the data you have created for generating valuable insight:4. Pivot the data so you have customer/rank (or avg price per customer)/country/total amount for customers in rows and years in columns - aggregate on sum of quantity / amount (or try even both).Sort if you want (by rank or total amount for example). - Which two customers should you focus on in the first place (which are decreasing in sales mostly in 2011)? - Is the insight you have found also country specific (applies to other customer from given country)?Now you have the insight which could be much in value... what next? sales records 2008-2011for each customer:- total sum of amount- countries active in- customer since ...- time in days between first and last purchasesort byascending dateinner join of original datawith aggregationson Cust_IDcountry = *Germany*1000 < sum(amount) < 2000Node 30write to CSV fileconvert dates from String to DateTimeformat dd.MM.yyyysum(amount) > 5000 ANDcountry = *USA*clean beforeexportsales records 2008-2011sales records 2008-2011Node 75Node 76Node 77Node 78Node 79Node 80 File Reader GroupBy Sorter Joiner Row Filter Row Filter Concatenate CSV Writer String to Date&Time Rule-basedRow Filter Column Filter File Reader File Reader String to Date&Time Sorter GroupBy Rule-basedRow Filter ReferenceRow Filter CSV Writer

Nodes

Extensions

Links