Icon

VLOOKUP function (Different Files)

VLOOKUP Function (Different Files)

This workflow shows how to perform a VLOOKUP function on a table using a table from a different file as dictionary, in KNIME Analytics Platform.

The workflow inserts the country names and athlete names on the Olympic Athlete Bio table based on the Olympic country and athlete dictionary table.

Let's walk through the different nodes involved in this operation:

To view the configuration for each node either Double-click or right-click > "Configuration" to open the configuration dialogue.

Input table preparation:

Excel Reader node (Olympic Athlete Bio table):

1 - In the "File and Sheet" tab, the file name and the sheet to read from are already set up (the folder with the data is already included when you download the workflow)

2 - Within the "Data Area" tab, we selected read the "whole sheet" and unflagged the "Use values in row" option as well as the skip "Empty rows" option. This configuration allows us to have the sheet as it is with an alphabetical column naming. The intent is to respect its original structure.

Column Renamer node:

We rename the columns "A" and "E" to "A (#1)" and "E (#1)". Since we later want to update the values in the columns "A" and "E" with the values in the columns "A (#1)" and "E (#1)".

Dictionary preparation:

Excel Reader node (Athlete and country dictionary):

1 - In the "File and Sheet" tab, the file name and the sheet to read from are already set up (when you download the workflow, it's also included the folder with the data).

2 - Within the "Data Area" tab, we have selected read the "whole area" and unflagged the skip "Empty rows" option. This configuration allows us to have the sheet as it is. The intent is to respect its original structure.

Column Renamer:

Since we want to change the values in the columns "A" (athlete_id) to athlete names and "E" (country_noc) to country names in the input table, we need to rename "athlete" to "A" and "country" to "E".

Number To String node:

Since the "athlete_id" column is an integer column we need to convert it to string to match the column type in the input table. We do that by manually selecting only the "athlete_id" column in the configuration dialogue.

Apply LOOKUP functions:

Value Lookup node:

1 - As a Lookup column (data table), we are taking column "A" from the table where we want to append the full athlete names containing the athlete Ids.
2 - Our key column (dictionary table) is "athlete_id". It contains the athlete Ids.

3 - Finally, we select column "A" from the right table of the sheet (our dictionary table) with the full athlete names to be appended.
A new column is appended by default at the end of the table, and since we already have a "A" column, the new column will be named automatically "A (#1)".

Analogously we do the same procedure for the second Value Lookup node.

Variable Creator and Cell Updater nodes:

Since in the original input table the column headers are not in the first row, we use the Cell Updater nodes to update the column with the title "athlete_id" and "country_noc" to "athlete" and "country" respectivley. We do that by defining two variables called "athlete-header" and "country-header" in the Variable Creator node. Than we configure the right cells in the Cell Updater nodes and put the new values inside the flow variable input ports.

Table Updater node:

On top, we pass our original sheet, without the lookup values, with the column "A" renamed into "A (#1)" and "E" to "E (#1)". At the bottom, we pass the sheet with the appended country names in the column "A (#1)" and "E (#1)".

The Table Updater node compares inputs and updates the cells of the top table in column "A (#1)" and "E (#1)".

Excel Writer:

We will append a new sheet called "Sheet_1_modified" in the original excel file (Notice that if you run it several times it will get overwritten). We also unflag the "Write column header" checkbox to maintain the original table structure.
As you can see from the output, we have the same sheet structure but now with country names and athlete names instead of country_noc and athlete_id.

Nodes

Extensions

Links