Icon

01 Data Cleaning

Documentation for Data Cleaning:In workflow 1, I first imported the CSV file that contains data on LifeExpectancies in different regions. Next, I removed the missing valuesusing a row filter. After that, I used a few other row filters for lifeexpectancy where the values are either above 75 or below 50. Alsoadded row filters specific years. One face that highlighted in thisworkflow is that only the African region had life-expectancy below 50 inthe year 2010. In workflow 2, I used column filters to better understand the dataset suchas I removed the “Thinness_ten_nineteen_years” and excluded the “int”columns.In workflow 3, I created a new column by using rule engine - lifeexpectancy (low/high/average) based on specific ranges. Next I usedmultiple row and column filters. Overall the highlights include: Africaregion has mostly “Average” life expectancy. Asia region has mostly“Average”and “High” life expectancies. North America region has mostly“High” life expectancy. Workflow 1 Workflow 2 Workflow 3 Life Expectancy Datafiltered out rows with missing Life ExpectancyLife Expectancy > 75Life Expectancy < 50Year 2010Life Expectancy < 50Year 2014Excluded columns:Thinness_ten_nineteen_yearsThinness_five_nine_yearsExcluded type Integer columnsLife Expectancy Datanew column "Life-Expectancy(High/Low/Avg)"Life Expectancy DataRegion = AsiaRegion = AfricaExcluded columns except: Life expectancies, country and regionRegion = North AmericaExcluded columns except: Life expectancies, country and regionExcluded columns except: Life expectancies, country and region CSV Reader Row Filter Row Filter Row Filter Row Filter Row Filter Row Filter Column Filter Column Filter CSV Reader Rule Engine CSV Reader Row Filter Row Filter Column Filter Row Filter Column Filter Column Filter Documentation for Data Cleaning:In workflow 1, I first imported the CSV file that contains data on LifeExpectancies in different regions. Next, I removed the missing valuesusing a row filter. After that, I used a few other row filters for lifeexpectancy where the values are either above 75 or below 50. Alsoadded row filters specific years. One face that highlighted in thisworkflow is that only the African region had life-expectancy below 50 inthe year 2010. In workflow 2, I used column filters to better understand the dataset suchas I removed the “Thinness_ten_nineteen_years” and excluded the “int”columns.In workflow 3, I created a new column by using rule engine - lifeexpectancy (low/high/average) based on specific ranges. Next I usedmultiple row and column filters. Overall the highlights include: Africaregion has mostly “Average” life expectancy. Asia region has mostly“Average”and “High” life expectancies. North America region has mostly“High” life expectancy. Workflow 1 Workflow 2 Workflow 3 Life Expectancy Datafiltered out rows with missing Life ExpectancyLife Expectancy > 75Life Expectancy < 50Year 2010Life Expectancy < 50Year 2014Excluded columns:Thinness_ten_nineteen_yearsThinness_five_nine_yearsExcluded type Integer columnsLife Expectancy Datanew column "Life-Expectancy(High/Low/Avg)"Life Expectancy DataRegion = AsiaRegion = AfricaExcluded columns except: Life expectancies, country and regionRegion = North AmericaExcluded columns except: Life expectancies, country and regionExcluded columns except: Life expectancies, country and region CSV Reader Row Filter Row Filter Row Filter Row Filter Row Filter Row Filter Column Filter Column Filter CSV Reader Rule Engine CSV Reader Row Filter Row Filter Column Filter Row Filter Column Filter Column Filter

Nodes

Extensions

Links