Icon

Read Data from Different Sources

This workflow shows how to read, merge and manipulate data from different sources (Gsheet, excel file) using KNIME Analytics Platform.

We have prepared a public Google spreadsheet called "Olympic_medal_by_country" to do this. You can read and overwrite this spreadsheet by authenticating it through a Gmail account.

Medal by country data is missing for the 2020 Summer Olympic Games edition. Indeed, it is in an Excel file.

The aim is to have both datasets available in KNIME and then perform some transformation and manipulation tasks to finally update the public spreadsheet "Olympic_medal_by_country" with all the data.

We will see how to do it node by node:

To view the configuration for each node, either Double-click or right-click > "Configuration" to open the configuration dialogue.

Google Authenticator node:

1 -After opening the configuration window, we should choose "Interactive" as the Authentication type.

2 - Our "Scopes" option selection is "Standard". We choose "Google Sheets" from the menu since we only need to read and write to the Google sheet.

3- After filling in our login information, we can proceed by clicking the "Login" button, following the steps, and clicking "OK" to close the configuration window. Afterwards, please execute the node.

While in interactive mode, credentials are kept in memory and discarded upon exiting KNIME.

Google Sheets Connector and Reader nodes:

1 - Execute the Google Sheets Connector node to create a connection to Google Sheets.

2 - Also, execute as is the Google Sheets Reader node. It already has in memory the data from the public spreadsheet that we are using in our example.
If you want to use your own Google sheets or those shared with you, you only need to click "Select" and pick it up.

3 - "Select First Sheet" and "Has Column Header" is selected. After executing the node, we will have available in KNIME the Olympic medal data.

Excel Reader node:

1 - In the "File and Sheet" tab, the file name and the sheet to read from are already set up (the folder with the data is already included when you download the workflow)

2 - Within the "Data Area" tab, we have selected read the "whole sheet" since the spreadsheet already has a standard format (non-empty rows or columns, etc.).

String To Number node:

1 - Since when reading the Google Sheets, all columns are "converted" into strings, we need to convert some of them into numbers (Integer type) to match with the Excel file format.

Concatenate node:

1 - Both tables have the same structure, so the output will be the same regardless of how combined the input columns are.

2 - Same RowIDs are present in both tables. This time we need to add a suffix (_2020) to the RowIDs of the low input port to append them.

Row Aggregator node:

1 - The aggregation method is Sum, including only the "medals" columns ("gold", "silver", "bronze", and "total"), and the category column is the "country" column.

2 -We also flag the option "Additional "grand totals" at the second output port" to append them later in the spreadsheet.

3 - The result is the total medals by each country and the totals.


Google Sheets Updater node:

1 - The node is configurated to overwrite an existing sheet called "medals_updated_by_country" in the public spreadsheet "Olympic_medal_by_country".

2- The Google Sheet will automatically open after the execution of the node.

Nodes

Extensions

Links