Icon

Combining Data Sets Solution

There has been no title set for this workflow's metadata.

There has been no description set for this workflow's metadata.

Combining Data PracticeThe African Socio Economic Database is an annual collection of various indicators across the African continent. Indicators are aggregated atthe country, regional, and continent-wide levels. These indicators include such things as the number of tourist arrivals, hydroelectric energyconsumption, and the annual rate of reforestation.The database includes four files:Data 2000-2009.csv and Data 2010-2019.csv: The tables that contain all the measure information. They are in a “tall and skinny” format,which will require reshaping.Country Codes.csv: A table that contains a mapping between country names and codes.Indicators.csv: A table that contains a list of all indicators and their codes.These files are found in the Combining Data Sets Practice folder.The goal of this exercise is to save a CSV file that satisfies the following requirements:• Utilizes these three Code and Indicator_Id values • NY.GDP.PCAP.KD 732 • SP.URB.TOTL 686 • SP.POP.TOTL 688• Utilizes records for individual countries only (i.e. no regional affiliations). A table is provided for use.• Combine the files Data 2000-2009 and Data 2010-2019 into a single table.• Using the above indicators and countries, filter the combined Data tables to show only the desired records.• Extract the year from the Date field.• Reshape the data so each record represents a unique country and year combination. Indicators should be represented as columns.• Calculate the proportion of each record’s population that is urban.• Determine the average urban proportion for each year in the data set• Relate each country and year with each year’s average urban proportion.• If a record’s urban proportion is greater than that year’s average, create a calculation that returns “Above Average.” Otherwise, return “BelowAverage.”• Finally, output a CSV file of the table.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA CONCATENATE TABLES CONVERT NUMBER TO STRING INPUT DATA JOIN TABLES INPUT DATA JOIN TABLES JOIN TABLES JOIN TABLES STRING TO DATE&TIME EXTRACT DATE&TIME FIELDS PIVOTING MATH FORMULA GROUPBY JOINER COLUMN EXPRESSIONS TABLE MANIPULATOR CSV WRITER Country CodesIndicatorsList of indicatorswe want to keepUse the created table to filter Indicators via an inner joinList of countries we wantUse the created tableto filter Country Codesvia an inner joinCreate an inner jointo filter Data bycountryData 2000-2009Data 2010-2019Create an inner jointo filter Data byour indicatorsIndicator_Idstring typeExtract the year fromDateConvert the Date fieldinto a date type so wecan extract its yearPivot the data set, group byCountry and Year,indicators as columnsCalculate the proportionof urban populationto the total populationGroup by Year,average pct urbanRename fieldsJoin on YearCreate a calculate thatcompares pct urbanagainst each year's avg.Node 34CSV Reader CSV Reader Table Creator Joiner Table Creator Joiner Concatenate Joiner CSV Reader CSV Reader Joiner Number To String Extract Date&TimeFields String to Date&Time Pivot Math Formula GroupBy Table Manipulator Joiner Column Expressions CSV Writer Combining Data PracticeThe African Socio Economic Database is an annual collection of various indicators across the African continent. Indicators are aggregated atthe country, regional, and continent-wide levels. These indicators include such things as the number of tourist arrivals, hydroelectric energyconsumption, and the annual rate of reforestation.The database includes four files:Data 2000-2009.csv and Data 2010-2019.csv: The tables that contain all the measure information. They are in a “tall and skinny” format,which will require reshaping.Country Codes.csv: A table that contains a mapping between country names and codes.Indicators.csv: A table that contains a list of all indicators and their codes.These files are found in the Combining Data Sets Practice folder.The goal of this exercise is to save a CSV file that satisfies the following requirements:• Utilizes these three Code and Indicator_Id values • NY.GDP.PCAP.KD 732 • SP.URB.TOTL 686 • SP.POP.TOTL 688• Utilizes records for individual countries only (i.e. no regional affiliations). A table is provided for use.• Combine the files Data 2000-2009 and Data 2010-2019 into a single table.• Using the above indicators and countries, filter the combined Data tables to show only the desired records.• Extract the year from the Date field.• Reshape the data so each record represents a unique country and year combination. Indicators should be represented as columns.• Calculate the proportion of each record’s population that is urban.• Determine the average urban proportion for each year in the data set• Relate each country and year with each year’s average urban proportion.• If a record’s urban proportion is greater than that year’s average, create a calculation that returns “Above Average.” Otherwise, return “BelowAverage.”• Finally, output a CSV file of the table.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA CONCATENATE TABLES CONVERT NUMBER TO STRING INPUT DATA JOIN TABLES INPUT DATA JOIN TABLES JOIN TABLES JOIN TABLES STRING TO DATE&TIME EXTRACT DATE&TIME FIELDS PIVOTING MATH FORMULA GROUPBY JOINER COLUMN EXPRESSIONS TABLE MANIPULATOR CSV WRITER Country CodesIndicatorsList of indicatorswe want to keepUse the created table to filter Indicators via an inner joinList of countries we wantUse the created tableto filter Country Codesvia an inner joinCreate an inner jointo filter Data bycountryData 2000-2009Data 2010-2019Create an inner jointo filter Data byour indicatorsIndicator_Idstring typeExtract the year fromDateConvert the Date fieldinto a date type so wecan extract its yearPivot the data set, group byCountry and Year,indicators as columnsCalculate the proportionof urban populationto the total populationGroup by Year,average pct urbanRename fieldsJoin on YearCreate a calculate thatcompares pct urbanagainst each year's avg.Node 34CSV Reader CSV Reader Table Creator Joiner Table Creator Joiner Concatenate Joiner CSV Reader CSV Reader Joiner Number To String Extract Date&TimeFields String to Date&Time Pivot Math Formula GroupBy Table Manipulator Joiner Column Expressions CSV Writer

Nodes

Extensions

Links