Icon

Calculate Totals (Multiple Files)

This workflow shows how to calculate the totals of multiple files using KNIME Analytics Platform.

The aim is to concatenate the input tables (same structure) from five files and calculate the totals of the Olympic medal counts for the countries participating in the Olympic Summer Games from 1896 to 2020.

Let's walk through the different nodes involved in this operation:

To view each node's configuration, double-click or right-click > "Configuration" to open the configuration dialogue.


Excel Reader node:

1 - Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and also check the option "Mode: Files in folder".

2 - We know that all the files from the target folder have the same structure (standard format, data in the first sheet). By selecting the above configuration, we can easily read and concatenate all of them using the Excel Reader node.

Column Filter node:

We clean the input tables from unnecessary columns, which we don't need for the total calculation.

Row Aggregator node:

1 - We set the "Category column" to "country" to total the medals by country.
2 - We manually select the columns we want to aggregate the rows of, namely "gold", "silver", and "bronze".
3 - As the "Aggregation" method, we use the "Sum" option.
4 - We also want to calculate the grand totals by enabling the "Additional grand totals at second output port" flag.

Sorter node:

We sort the medal totals for each country by the number of gold medals in descending order.

Constant Value Column, Column Resorter node:

We create a row which contains the grand medal totals over all countries.

Concatenate node:

1 - We can control how the two input tables will be concatenated. Since our two input tables have the same structure, we use the intersection mode. There is the option of either using intersection or union columns mode.

2 - We choose the "Append suffix" option to handle duplicate row ids with the suffix "_dup". It means that if we have two rows with the id "Row0", one gets changed to "Row0_dup".

Excel Writer:

We create a new file called "Olympic_games_Medal_Countries_Totals_1896-2020" and save it in the workflow data area.

After executing the node, the file will open automatically.

As you can see from the output, we calculated each country's medal totals and appended the grand medal totals as the last row.

Nodes

Extensions

Links