Icon

VLOOKUP Function - Different Files

<p><strong>VLOOKUP Function - Different Files</strong></p><p>This workflow shows how to perform a VLOOKUP function in KNIME Analytics Platform. In this example we perform the VLOOKUP on one table, using a second table from a different file as the dictionary.</p><p>The goal is to access a dataset that contains information about athletes who participated in the Olympics Games. However, the dataset contains only the athletes' ID number, date of birth, and three-letter country code. We want to replace the athletes' ID number by their full name and the country code by the full country name via a VLOOKUP operation. For this purpose, we access a second dataset that contains the athlete-country dictionary, containing all athletes' ID numbers and their full names as well as all country codes and the full country names.</p><p>💡 <em>To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.</em></p><p>Let's walk through the different nodes involved in this operation:</p><p><strong>Excel Reader nodes:</strong></p><ul><li><p>Since the folder with the data is already included when you download the workflow, in the "File and Sheet" tab, we choose to "Read from" the "Current workflow data area" and select the dataset.</p></li><li><p>In the "Data Area" tab, we select to read the "Whole sheet" and unflag to skip "empty rows". This configuration allows us to read the sheet as it is. The intent is to respect its original structure.</p></li></ul><p><strong>Number to String node:</strong></p><ul><li><p>We convert the "athlete_id" column to string. This is required to properly perform the VLOOKUP later because a Value Lookup node requires the two columns that are compared to be of identical column types. </p></li></ul><p><strong>Value Lookup nodes:</strong></p><ul><li><p>With the first Value Lookup node, we add the athletes' names to the input data. As lookup column in the input data table, we define column "A"; as key column in the dictionary table, we use column "athlete_id". In the include/exclude panel, we only include the column "athlete" as this is the column that contains the athletes' full name.</p></li><li><p>With the second Value Lookup node, we add the full country names to the input data. As lookup column in the input data table, we define column "E"; as key column in the dictionary table, we use column "coutry_noc". In the include/exclude panel, we only include the column "country" as this is the column that contains the full country names.</p></li></ul><p><strong>Variable Creator + Cell Updater nodes:</strong></p><p>You might have noticed that the input data table does not contain proper column names. The column header is placed in Row 3. The two columns "athlete" and "country" which were added via the Value Lookup nodes are missing a header value.</p><ul><li><p><strong>Variable Creator node: </strong>We create two flow variables: "athlete-header" with the value "athlete" and "country-header" with the value "country". We will use these flow variables to replace the missing header values in the data table.</p></li><li><p><strong>Cell Updater nodes:</strong> The first Cell Updater node replaces the missing value in column "athlete", Row 3. The second Cell Updater node replaces the missing value in column "country", Row 3.</p></li></ul><p><strong>Table Updater node:</strong></p><ul><li><p>At the top port, we pass our original input data (without the lookup values) with the column "A" renamed into "athlete" and "E" to "country" to match the column names of the dictionary table. At the bottom port, we pass the sheet with the appended athlete names and country names in the columns "athlete" and "country".</p></li><li><p>The Table Updater node compares inputs and updates the content of matching cell in the top input table. A cell is matching if they share the same column name and RowID.</p></li></ul><p>We have now updated our input data table so that the "athlete_id" column was replaced by the athletes' full names and the "country_noc" column was replaced by the full country name.</p><p><strong>Excel Writer node:</strong></p><ul><li><p>We append the dataset to a new sheet called "Sheet_1_modified" in the existing Excel file located in the workflow data area.</p></li><li><p>We unflag the "Write column headers" checkbox to maintain the original table structure.</p></li><li><p>After executing the node, the file will open automatically.</p></li></ul><p>As you can see from the output, we have the same sheet structure but now with country names and athlete names instead of country codes and athlete IDs.</p>

Data access

  1. Bio information of athletes

  2. A dictionary table containing athlete-country information

Data export

Append updated table to new sheet in existing dataset

Data manipulation

VLOOKUP

  1. Add athlete names to input data

  2. Add full country names to input data

Update headers

In the input data, the column headers are not in the first row but in Row 3. We add column values to the "athlete" and "country" columns.

VLOOKUP Function - Different Files


This workflow demonstrates how to perform a VLOOKUP on one table, using a second table from a different file as the dictionary.

In this example, we access two Excel files:

  1. The bio information of athletes who participated in the Olympic Games.

  2. A spreadsheet containing two dictionaries: athlete ID - athlete name and country code - country name.

The bio information sheet does not contain the athletes' full names and only the country codes instead of the full country name. The goal is to replace the athletes' ID numbers by their full name and the country code by the full country name via a VLOOKUP operation. To respect the original structure of the bio information sheet, we also need to update the header column which, in this case, is placed in Row 3 of the table. Lastly, we write the updated table to a new sheet in the existing Excel file.

For a detailed overview of each node in this workflow, refer to the workflow description in the Info panel.


💡 To view each node's configuration, select the node and see the configuration pane on the right side of the workflow editor.

Append athlete namesto input data
Value Lookup
Read athlete-countrydictionary
Excel Reader
New header names
Variable Creator
Replace missing value in"country" with new header
Cell Updater
Read bio informationof athletes
Excel Reader
Convert "athlete_id" columnto string to matchcolumn type of input table
Number to String
Append full countrynames to input data
Value Lookup
Rename "A" to "athleteand "E" to "country"
Column Renamer
Append new sheet to dataset:"Sheet_1_modified"
Excel Writer
Replace missing value in"athlete" with new header
Cell Updater
Update the values incolumn "A (#1)" and "E (#1)"
Table Updater

Nodes

Extensions

Links