Icon

Read Data from different Sources

<p><strong>Read Data from different Sources</strong></p><p>This workflow shows how to read, merge, and manipulate data from different sources (Google Sheets &amp; Excel file) using KNIME Analytics Platform.</p><p>For this example, we have prepared a public Google Sheets spreadsheet called "Olympic_medal_by_country". You can read and overwrite this spreadsheet by authenticating it through a Gmail account. However, the spreadsheet only contains the medal counts of the Summer Olympic Games 1896-2016, the medal counts of the 2020 Summer Olympic Games are saved in a separate Excel file.</p><p>The aim of this workflow is to access both datasets with KNIME, merge all data into one data table, perform some data transformation and manipulation tasks, and finally update the public spreadsheet "Olympic_medal_by_country".</p><p>💡 <em>To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.</em></p><p>Let's walk through the different nodes involved in this operation:</p><p><strong>Google Authenticator node:</strong></p><ul><li><p>In the configuration node, we select "Interactive" as Authentication type.</p></li><li><p>Under "Scopes of access" we select: Scope type: Standard; Scope/permission: Google Sheets as we need to read and write to Google Sheets.</p></li><li><p>To authenticate, we click the "Login" button in the configuration window and follow the steps in the pop-up window. While in interactive mode, credentials are kept in memory and discarded upon exiting KNIME.</p></li></ul><p><strong>Google Sheets Connector + Google Sheets Reader nodes:</strong></p><ul><li><p><strong>Google Sheets Connector node:</strong> This node has no configuration dialog. Simply execute the node to create the connection to Google Sheets.</p></li><li><p><strong>Google Sheets Reader node:</strong> Here, we select the spreadsheet we want to access ("Olympic_medal_by_country"). Make sure, "Select First Sheet" (Sheet 1) and "Has Column Header" are selected. If you want to access your own/a different Google Sheets spreadsheet, click "Select" and choose it.</p></li></ul><p>After executing these two nodes, we have the Olympic medal data available as KNIME data table.</p><p><strong>Excel Reader node:</strong></p><ul><li><p>Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.</p></li><li><p>In the "Data Area" tab, we select to read the "Whole sheet". This configuration allows us to have the sheet as it is. The intent is to respect its original structure.</p></li></ul><p><strong>String To Number node:</strong></p><ul><li><p>Some of the columns of the spreadsheet were parsed as strings when accessing them in KNIME. We convert them back into numbers (Type: integer) to match the table specifications of the Excel file.</p></li></ul><p><strong>Concatenate node:</strong></p><ul><li><p>Now that both tables have the same structure, we concatenate the Google Sheets spreadsheet and the Excel file.</p></li><li><p>To account for duplicate RowIDs, we select to "Append suffix". As suffix we set "_2020" which is appended to the RowIDs of the second input port.</p></li></ul><p><strong>Row Aggregator node:</strong></p><p>For each country, we calculate the total sum of medals won over all years. We define the following settings:</p><ul><li><p>Category column: country</p></li><li><p>Aggregation method: Sum</p></li><li><p>Columns to include in the aggregation: all integer columns</p></li><li><p>Weight column: none</p></li><li><p>Make sure to select "Additional "grand totals" at second output port" as we want to append them later to the spreadsheet</p></li></ul><p><strong>Sorter node:</strong></p><ul><li><p>We sort the aggregated country data (top output port of Row Aggregator node) by the total medal count ("total") in descending order.</p></li></ul><p><strong>Column Renamer node:</strong></p><ul><li><p>We rename the columns of the grand totals table (bottom output port of Row Aggregator node) as we want to append the grand totals to the aggregated country data and want to avoid duplicate column names.</p></li></ul><p><strong>Column Appender node:</strong></p><ul><li><p>We append the grand totals to the country data. We select "Generate new RowIDs".</p></li></ul><p><strong>Google Sheets Updater node:</strong></p><ul><li><p>We append the resulting dataset to a new sheet called "medals_updated_by_country" in the existing "Olympic_medal_by_country" spreadsheet. <strong><em>Note: </em></strong>Make sure to not overwrite the original data in "Sheet 1".</p></li><li><p>After executing the node, the file will open automatically.</p></li></ul>

Read Data from different Sources

This workflow shows how to read, merge, and manipulate data from different sources (Google Sheets & Excel file) using KNIME Analytics Platform.

For this example, we have prepared a public Google Sheets spreadsheet called "Olympic_medal_by_country". You can read and overwrite this spreadsheet by authenticating it through a Gmail account. However, the spreadsheet only contains the medal counts of the Summer Olympic Games 1896-2016, the medal counts of the 2020 Summer Olympic Games are saved in a separate Excel file.

The aim of this workflow is to access both datasets with KNIME, merge all data into one data table, perform some data transformation and manipulation tasks, and finally update the public spreadsheet "Olympic_medal_by_country".

💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Let's walk through the different nodes involved in this operation:

Google Authenticator node:

  • In the configuration node, we select "Interactive" as Authentication type.

  • Under "Scopes of access" we select: Scope type: Standard; Scope/permission: Google Sheets as we need to read and write to Google Sheets.

  • To authenticate, we click the "Login" button in the configuration window and follow the steps in the pop-up window. While in interactive mode, credentials are kept in memory and discarded upon exiting KNIME.

Google Sheets Connector + Google Sheets Reader nodes:

  • Google Sheets Connector node: This node has no configuration dialog. Simply execute the node to create the connection to Google Sheets.

  • Google Sheets Reader node: Here, we select the spreadsheet we want to access ("Olympic_medal_by_country"). Make sure, "Select First Sheet" (Sheet 1) and "Has Column Header" are selected. If you want to access your own/a different Google Sheets spreadsheet, click "Select" and choose it.

After executing these two nodes, we have the Olympic medal data available as KNIME data table.

Excel Reader node:

  • Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.

  • In the "Data Area" tab, we select to read the "Whole sheet". This configuration allows us to have the sheet as it is. The intent is to respect its original structure.

String To Number node:

  • Some of the columns of the spreadsheet were parsed as strings when accessing them in KNIME. We convert them back into numbers (Type: integer) to match the table specifications of the Excel file.

Concatenate node:

  • Now that both tables have the same structure, we concatenate the Google Sheets spreadsheet and the Excel file.

  • To account for duplicate RowIDs, we select to "Append suffix". As suffix we set "_2020" which is appended to the RowIDs of the second input port.

Row Aggregator node:

For each country, we calculate the total sum of medals won over all years. We define the following settings:

  • Category column: country

  • Aggregation method: Sum

  • Columns to include in the aggregation: all integer columns

  • Weight column: none

  • Make sure to select "Additional "grand totals" at second output port" as we want to append them later to the spreadsheet

Sorter node:

  • We sort the aggregated country data (top output port of Row Aggregator node) by the total medal count ("total") in descending order.

Column Renamer node:

  • We rename the columns of the grand totals table (bottom output port of Row Aggregator node) as we want to append the grand totals to the aggregated country data and want to avoid duplicate column names.

Column Appender node:

  • We append the grand totals to the country data. We select "Generate new RowIDs".

Google Sheets Updater node:

  • We append the resulting dataset to a new sheet called "medals_updated_by_country" in the existing "Olympic_medal_by_country" spreadsheet. Note: Make sure to not overwrite the original data in "Sheet 1".

  • After executing the node, the file will open automatically.

Data access

  • Authenticate & connect to Google Sheets and read data

  • Read file located in workflow data area

Data export

Append resulting table to new sheet in existing Google Sheet

To authenticate, you must click the "Login" button in the configuration window.

Execute the node without changing its configuration

Data processing

Execute the node without changing its configuration

Read Data from different Sources


This workflow demonstrates how to read, merge, and manipulate data from different data sources (Google Sheets spreadsheet & Excel file) and how to write it back into a remote destination (Google Sheets)

In this example, we access two datasets from two different sources:

  1. The medal counts of the 1896-2016 Summer Olympic Games (Google Sheets spreadsheet).

  2. The medal counts of the 2020 Summer Olympic Games (local Excel file).

After accessing the data, we combine the two datasets and aggregate the data to get the total medal counts by country as well as the grand totals. We merge everything into one table and lastly append the resulting table to a new sheet in the existing Google Sheets spreadsheet.

For a detailed overview of each node in this workflow, refer to the workflow description in the Info panel.


💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Connect toGoogle Sheets
Google Sheets Connector
Authenticate toGoogle Services
Google Authenticator
Read medal counts of SummerOlympics Games 1896 - 2016
Google Sheets Reader
Read medal counts of2020 Summer Olympics Games
Excel Reader
Calculate total sum ofmedals by country
Row Aggregator
Append grand totals
Column Appender
Append 2020medal counts
Concatenate
Adjust data type of somecolumns to to match tablespecs of Excel file
String to Number
Append new sheet to dataset:"medals_updated_by_country"
Google Sheets Updater
Column Renamer
Sorter

Nodes

Extensions

Links