Icon

02 Data Cleaning

02 Data Cleaning - Exercise (Solution)

This workflow shows a solution to a hands-on exercise in the L1-DW KNIME Analytics Platform for Data Wranglers: Basics course

Task 1: Filter rows1. Read the population, life expectancy, and travel advisory data by executing the providedreader nodes2. Filter the population data to countries in Europe3. Filter the life expectancy data to countries/regions where the overall life expectancy is atleast 80 years4. Filter the travel advisory data to the safest countries (Level 1: Exercise normalprecautions) Task 2: Filter columns1. Read the life expectancy data from the SQLite database by executing the providedworkflow2. Exclude the Rank column- Manually- By including only string and double type columns Task 3: Manipulate strings1. Read the travel advisories data by executing the provided CSV Reader node2. Replace the colons in the Level column by hyphen like this: "Level 1- Exercise normalprecautions"3. Extract the risk levels as numbers into a separate column. You can use, for example, thesubstr() function.4. Complete the tasks above with the Column Expressions node Task 4: Evaluate mathematical expressions and apply rules1. Read the population data by executing the provided Excel Reader node2. Create a new column "Change" by subtracting the population values in 2018 from thepopulation values in 20193. Convert the absolute population change values into percentages: Divide them by the populationvalues in 2018. Multiply the result by 100.4. Create a categorical column with two values "increasing"/"decreasing" based on whether thechange is positive or negative Demographics.sqliteLifeExpectancyExtract numericrisk level into a separate columndemographics.xlsx(population)demographics.xlsx(life_expectancy)Convert change topercentageAdd Direction columnbased on +/- changeThe same tasks as abovedemographics.xlsx(population)overall life expectancy >=80Countries withLevel 1: Exercise normal precautionsEuropeRemove Rank manuallyRemove Rank by column typetravel_advidories.csvReplace :by -travel_advidories.csvCreate Changecolumn SQLite Connector DB Table Selector String Manipulation Excel Reader Excel Reader Math Formula Rule Engine MISSING ColumnExpressions Excel Reader Row Filter Row Filter Row Filter DB Reader Column Filter Column Filter CSV Reader String Manipulation CSV Reader Math Formula Task 1: Filter rows1. Read the population, life expectancy, and travel advisory data by executing the providedreader nodes2. Filter the population data to countries in Europe3. Filter the life expectancy data to countries/regions where the overall life expectancy is atleast 80 years4. Filter the travel advisory data to the safest countries (Level 1: Exercise normalprecautions) Task 2: Filter columns1. Read the life expectancy data from the SQLite database by executing the providedworkflow2. Exclude the Rank column- Manually- By including only string and double type columns Task 3: Manipulate strings1. Read the travel advisories data by executing the provided CSV Reader node2. Replace the colons in the Level column by hyphen like this: "Level 1- Exercise normalprecautions"3. Extract the risk levels as numbers into a separate column. You can use, for example, thesubstr() function.4. Complete the tasks above with the Column Expressions node Task 4: Evaluate mathematical expressions and apply rules1. Read the population data by executing the provided Excel Reader node2. Create a new column "Change" by subtracting the population values in 2018 from thepopulation values in 20193. Convert the absolute population change values into percentages: Divide them by the populationvalues in 2018. Multiply the result by 100.4. Create a categorical column with two values "increasing"/"decreasing" based on whether thechange is positive or negative Demographics.sqliteLifeExpectancyExtract numericrisk level into a separate columndemographics.xlsx(population)demographics.xlsx(life_expectancy)Convert change topercentageAdd Direction columnbased on +/- changeThe same tasks as abovedemographics.xlsx(population)overall life expectancy >=80Countries withLevel 1: Exercise normal precautionsEuropeRemove Rank manuallyRemove Rank by column typetravel_advidories.csvReplace :by -travel_advidories.csvCreate ChangecolumnSQLite Connector DB Table Selector String Manipulation Excel Reader Excel Reader Math Formula Rule Engine MISSING ColumnExpressions Excel Reader Row Filter Row Filter Row Filter DB Reader Column Filter Column Filter CSV Reader String Manipulation CSV Reader Math Formula

Nodes

Extensions

Links