Icon

Calculations and Aggregating Data Solution

There has been no title set for this workflow's metadata.

There has been no description set for this workflow's metadata.

Calculations and Aggregating Data SolutionIn this lesson’s practice problem, we have a data set detailing United States county and state-level population estimates between 2010 and2019. Your end goal is to determine which state had the highest proportion of its counties lose population between 2010 and 2019.To find the solution, you will have to consider several things:• Ensure field types are set appropriately.• Notice that the data set contains records for whole states and counties within each state. Including whole states will cause difficulties.• Locations are given as County Name, State Name. You will need to devise a way to separate this into two separate columns: County andState.• Using the nodes learned in this lesson, you can do this with two String Manipulators. There is, however, a more elegant solution that we havenot introduced in this course.• Reshape the data set to more easily calculate the population change between 2010 and 2019.• Create a rule that indicates if a county’s population increased or decreased.• Summarize each state, counting how many counties were in each change category.• Reshape the data again so each record represents a state and columns count the number of counties in each category.• Calculate the proportion of each state’s counties that decreased in population.• Sort the data by proportion in descending order.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA ROW FILTER STRING MANIPULATION STRING MANIPULATION PIVOTING MATH FORMULA RULE ENGINE GROUPBY PIVOTING MATH FORMULA SORTER The two String Manipulation nodes above use functions to extract Countyand State values from Location. This is relatively easy for simple stringslike Location, but can get vrey challenging for more complex cases.Another option worth researching is the Cell Splitter node, which can splita string into two or more new columns based on a delimiter character—inthis case a comma. Give it a shot! % Change:(2019 - 2010) / (2010) % Decreased Counties:(# Decrease) /(# Decrease + # Increase) Exclude SummaryLevel = 40Extract county from LocationExtract state from LocationGroup by County, StateYears as columnsCalculate 2010 to 2019% changeCreate rule to label counties as "Decrease" or "Increase"Group by State, Change CategoryCount number of countiesPlace Change Category on columnsCalculate each state'sproportion of decreasesSort by proportion in descending orderPopulation EstimatesRow Filter String Manipulation String Manipulation Pivot Math Formula Rule Engine GroupBy Pivot Math Formula Sorter Table Reader Calculations and Aggregating Data SolutionIn this lesson’s practice problem, we have a data set detailing United States county and state-level population estimates between 2010 and2019. Your end goal is to determine which state had the highest proportion of its counties lose population between 2010 and 2019.To find the solution, you will have to consider several things:• Ensure field types are set appropriately.• Notice that the data set contains records for whole states and counties within each state. Including whole states will cause difficulties.• Locations are given as County Name, State Name. You will need to devise a way to separate this into two separate columns: County andState.• Using the nodes learned in this lesson, you can do this with two String Manipulators. There is, however, a more elegant solution that we havenot introduced in this course.• Reshape the data set to more easily calculate the population change between 2010 and 2019.• Create a rule that indicates if a county’s population increased or decreased.• Summarize each state, counting how many counties were in each change category.• Reshape the data again so each record represents a state and columns count the number of counties in each category.• Calculate the proportion of each state’s counties that decreased in population.• Sort the data by proportion in descending order.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA ROW FILTER STRING MANIPULATION STRING MANIPULATION PIVOTING MATH FORMULA RULE ENGINE GROUPBY PIVOTING MATH FORMULA SORTER The two String Manipulation nodes above use functions to extract Countyand State values from Location. This is relatively easy for simple stringslike Location, but can get vrey challenging for more complex cases.Another option worth researching is the Cell Splitter node, which can splita string into two or more new columns based on a delimiter character—inthis case a comma. Give it a shot! % Change:(2019 - 2010) / (2010) % Decreased Counties:(# Decrease) /(# Decrease + # Increase) Exclude SummaryLevel = 40Extract county from LocationExtract state from LocationGroup by County, StateYears as columnsCalculate 2010 to 2019% changeCreate rule to label counties as "Decrease" or "Increase"Group by State, Change CategoryCount number of countiesPlace Change Category on columnsCalculate each state'sproportion of decreasesSort by proportion in descending orderPopulation EstimatesRow Filter String Manipulation String Manipulation Pivot Math Formula Rule Engine GroupBy Pivot Math Formula Sorter Table Reader

Nodes

Extensions

Links