Icon

Calculations and Aggregating Data Practice

There has been no title set for this workflow's metadata.

There has been no description set for this workflow's metadata.

Calculations and Aggregating Data PracticeIn this lesson’s practice problem, we have a data set detailing United States county and state-level population estimates between 2010 and2019. Your end goal is to determine which state had the highest proportion of its counties lose population between 2010 and 2019.To find the solution, you will have to consider several things:• Ensure field types are set appropriately.• Notice that the data set contains records for whole states and counties within each state. Including whole states will cause difficulties.• Locations are given as County Name, State Name. You will need to devise a way to separate this into two separate columns: County andState.• Using the nodes learned in this lesson, you can do this with two String Manipulators. There is, however, a more elegant solution that we havenot introduced in this course.• Reshape the data set to more easily calculate the population change between 2010 and 2019.• Create a rule that indicates if a county’s population increased or decreased.• Summarize each state, counting how many counties were in each change category.• Reshape the data again so each record represents a state and columns count the number of counties in each category.• Calculate the proportion of each state’s counties that decreased in population.• Sort the data by proportion in descending order.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA OUTPUT DATA INPUT DATA ROW FILTERFilter out state-level records STRING MANIPULATIONUse substr and indexOffunctions to extract thecounty name from Location STRING MANIPULATIONUse replace function toextract the state name fromLocation and County PIVOTINGPivot the data to show eachcounty's yearly populationas separate columns MATH FORMULACalculate the % differencebetween each county's2019 and 2010 population RULE ENGINECreate a rule that labelseach county as either apopulation increase ordecrease GROUP BYCount the number ofcounties in each state bypopulation change label PIVOTINGPivot the data so eachrecord represents a singlestate and columns bypopulation change label MATH FORMULACalculate the proportion ofcounties with a decrease inpopulation for each state SORTERSort the data set indescending order byproportion. The first recordwill be the answer! DELETE TO REVEAL STEP-BY-STEP HINTS SolutionPopulation Estimates CSV Reader Table Reader Calculations and Aggregating Data PracticeIn this lesson’s practice problem, we have a data set detailing United States county and state-level population estimates between 2010 and2019. Your end goal is to determine which state had the highest proportion of its counties lose population between 2010 and 2019.To find the solution, you will have to consider several things:• Ensure field types are set appropriately.• Notice that the data set contains records for whole states and counties within each state. Including whole states will cause difficulties.• Locations are given as County Name, State Name. You will need to devise a way to separate this into two separate columns: County andState.• Using the nodes learned in this lesson, you can do this with two String Manipulators. There is, however, a more elegant solution that we havenot introduced in this course.• Reshape the data set to more easily calculate the population change between 2010 and 2019.• Create a rule that indicates if a county’s population increased or decreased.• Summarize each state, counting how many counties were in each change category.• Reshape the data again so each record represents a state and columns count the number of counties in each category.• Calculate the proportion of each state’s counties that decreased in population.• Sort the data by proportion in descending order.There are two versions of this practice problem:1) You are only given the input and output data sets. You must figure out which nodes to use, how to configure them, and how to organize yourworkflow in order to generate the desired output.2) In addition to the input and output data sets, we have provided an outline of which nodes to use at which steps to guide you along. Note thatthis is only one possible solution, as there are many ways a problem can be solved in KNIME. INPUT DATA OUTPUT DATA INPUT DATA ROW FILTERFilter out state-level records STRING MANIPULATIONUse substr and indexOffunctions to extract thecounty name from Location STRING MANIPULATIONUse replace function toextract the state name fromLocation and County PIVOTINGPivot the data to show eachcounty's yearly populationas separate columns MATH FORMULACalculate the % differencebetween each county's2019 and 2010 population RULE ENGINECreate a rule that labelseach county as either apopulation increase ordecrease GROUP BYCount the number ofcounties in each state bypopulation change label PIVOTINGPivot the data so eachrecord represents a singlestate and columns bypopulation change label MATH FORMULACalculate the proportion ofcounties with a decrease inpopulation for each state SORTERSort the data set indescending order byproportion. The first recordwill be the answer! DELETE TO REVEAL STEP-BY-STEP HINTS SolutionPopulation Estimates CSV Reader Table Reader

Nodes

Extensions

Links