Icon

Solution

There has been no title set for this workflow's metadata.

There has been no description set for this workflow's metadata.

PART 1: Split the input into a table with 22 columns and 110,495 rowsThere are many ways a problem can be solved in KNIME. If you solve this part using the nodescovered in the lesson, the first row of data should contain column headers such as playerID andyearID. If your column headers are not in the first row, try again—the metanode in Part 1b won'twork otherwise. This part can be solved using 3 nodes. PART 1b: Rename columns by connectingdata table to metanode. Open the metanodeto learn how this step is performed. Part 2: Calculate the total number of home runs (HR) by player. Filter this aggregation toreturn only the top-50 HR hitters.Then, use that list of 50 players to filter the main data table.This part can be solved using 4 nodes. Part 3: Aggregate the data to show each player's yearlyHR totals. The data table should only include threecolumns: playerID, yearID, and HR. Then, ensure thedata is sorted so that, for each player, the years are inascending order.This part can be solved using 2 nodes. Part 4: This section requires a Group Loop. We have set the loop up, so all you must dois provide the node(s) that go inside the loop.Calculate the cumulative sum of HR for each player throughout his career and how manyseasons he played. Rename these fields to "Cumulative HR" and "Player Season,"respectively.This part can be solved using 2 additional nodes. Part 5: Read in the table People.table. Create a new field called"Player Name" that combines the first and last name fields,separated by a space. Filter the columns to only show playerIDand Player Name.Join in this data set to the primary table.This part can be solved using 4 nodes. Part 6: Pivot the data so that each row is a Player Season andeach column is a Player Name. Sum the field Cumulative HR.When pivoting, make sure the column names are the pivotnames. (That is, the column name should be something like"Player A").This data set will feed into the Line Plot (Plotly) node. You shouldnot need to change any configuration settings if you followedeach step correctly. QUESTION:Which player had the highest cumulative HR total at the end ofhis 10th season? How many HRs had he hit by that point in hiscareer?Move the black box to reveal the answer!ANSWER:Albert Pujols, with 408 home runs. Batting recordsCreate a collection spliton "|"Ungroup thecollection intorowsSplit intocolumnsGet year-by-year HR totalsby playerEnsure yearsare sortedEach iterationcovers asingle playerCumulativeHR totalby player by yearEnd loopFind HR totalby player Sort HRtotal desc.Return top-50HR hittersUse an innerjoin as a filterPivot the dataCreate a line plotReturn onlyplayerID andPlayer Name fieldsJoin datatablesCreate a Player NamefieldPeople Prep Input Cell Splitter Ungroup Cell Splitter Rename Columns GroupBy Sorter Group Loop Start Moving Aggregator Loop End GroupBy Sorter Row Filter Joiner Column Rename(deprecated) Pivot Line Plot (Plotly) Column Filter Joiner String Manipulation Table Reader PART 1: Split the input into a table with 22 columns and 110,495 rowsThere are many ways a problem can be solved in KNIME. If you solve this part using the nodescovered in the lesson, the first row of data should contain column headers such as playerID andyearID. If your column headers are not in the first row, try again—the metanode in Part 1b won'twork otherwise. This part can be solved using 3 nodes. PART 1b: Rename columns by connectingdata table to metanode. Open the metanodeto learn how this step is performed. Part 2: Calculate the total number of home runs (HR) by player. Filter this aggregation toreturn only the top-50 HR hitters.Then, use that list of 50 players to filter the main data table.This part can be solved using 4 nodes. Part 3: Aggregate the data to show each player's yearlyHR totals. The data table should only include threecolumns: playerID, yearID, and HR. Then, ensure thedata is sorted so that, for each player, the years are inascending order.This part can be solved using 2 nodes. Part 4: This section requires a Group Loop. We have set the loop up, so all you must dois provide the node(s) that go inside the loop.Calculate the cumulative sum of HR for each player throughout his career and how manyseasons he played. Rename these fields to "Cumulative HR" and "Player Season,"respectively.This part can be solved using 2 additional nodes. Part 5: Read in the table People.table. Create a new field called"Player Name" that combines the first and last name fields,separated by a space. Filter the columns to only show playerIDand Player Name.Join in this data set to the primary table.This part can be solved using 4 nodes. Part 6: Pivot the data so that each row is a Player Season andeach column is a Player Name. Sum the field Cumulative HR.When pivoting, make sure the column names are the pivotnames. (That is, the column name should be something like"Player A").This data set will feed into the Line Plot (Plotly) node. You shouldnot need to change any configuration settings if you followedeach step correctly. QUESTION:Which player had the highest cumulative HR total at the end ofhis 10th season? How many HRs had he hit by that point in hiscareer?Move the black box to reveal the answer!ANSWER:Albert Pujols, with 408 home runs. Batting recordsCreate a collection spliton "|"Ungroup thecollection intorowsSplit intocolumnsGet year-by-year HR totalsby playerEnsure yearsare sortedEach iterationcovers asingle playerCumulativeHR totalby player by yearEnd loopFind HR totalby player Sort HRtotal desc.Return top-50HR hittersUse an innerjoin as a filterPivot the dataCreate a line plotReturn onlyplayerID andPlayer Name fieldsJoin datatablesCreate a Player NamefieldPeople Prep Input Cell Splitter Ungroup Cell Splitter Rename Columns GroupBy Sorter Group Loop Start Moving Aggregator Loop End GroupBy Sorter Row Filter Joiner Column Rename(deprecated) Pivot Line Plot (Plotly) Column Filter Joiner String Manipulation Table Reader

Nodes

Extensions

Links