Joiner

This node combines two tables similar to a join in a database. It combines each row from the top input port with each row from the bottom input port that has identical values in selected columns. Rows that remain unmatched can also be output.

Options

Joiner settings

Join Columns
Select the columns from the top input ('left' table) and the bottom input ('right' table) that should be used for joining. Each pair of columns defines an equality constraint of the form A = B. For two rows to be joined, the row from the left input table must have the same value in column A as the row from the right input table in column B. Row keys can be compared to row keys or regular columns, in which case the row key will be interpreted as a string value.
Match all: If selected, two rows must agree in all selected join columns to be matched.
Match any: If selected, two rows must agree in at least one selected join column to be matched.
Compare values in join columns: If value and type is selected, two rows match only if their join columns agree both in value and type, e.g., integer values will never match long values. If make integer types compatible is selected, cells are compared by their long value representation, if available. For instance, integer cells will will match long cells with equal values. If string representation is selected, the join columns are converted to string prior to comparison.
Include in output
Select which rows are included in the join result.
Matching rows: If selected, the joined rows are included in the output. Deactivating this can be used to find only the rows that do not have a join partner in the other table.
Left unmatched rows: Whether to include the rows from the left table for which no row in the right table has the same values in the selected join columns. For example, including only matches and left unmatched rows corresponds to a left outer join in database terms.
Right unmatched rows: Whether to include unmatched rows from the right input table. For example, including only right unmatched rows corresponds to a right antijoin in database terms.
Output options
Select the format in which the join results are output and whether hiliting is enabled.
Split join result into multiple tables: If selected, the node will produce three output tables instead of one. The top output port contains the joined rows (matches only), the middle output port contains the unmatched rows from the left input table, and the bottom port contains the unmatched rows from the right input table. Note that empty tables will be produced for join result types (i.e., matches, left unmatched rows, right unmatched rows) that are not selected for inclusion in the output.
Merge join columns: If active, the join columns of the right input table are merged into their join partners of the left input table. The merged column is named like the left join column if one of its join partners in the right table has the same name. If the join partners have different names, the merged column is named in the form left column=right column.
For instance, when joining a table with columns A, B, and C as left input table with a table that has columns X, A, and Z using the join predicates A=A, A=X, and C=Z, the resulting output table would have columns A, B, C=Z. Note how the column A in the output table contains the value of the column A in the left table, which is also the value of the column X in the right table, as required by the join conditions A=X.
The value of a merged join column for an unmatched row is taken from whichever row has values. For instance, when outputting an unmatched row from the right table in the above example with values x, a, and z, the resulting row in format A, B, C=Z has values x, ?, z.
When merge join columns is off, the row is instead output as ?, ?, ?, x, a, z.
Hiliting enabled: If selected, hiliting rows in the output will hilite the rows in the left and right input tables that contributed to that row. Equally, when hiliting a row in one of the input tables, all rows that the input row contributed to are hilited.
Disabling this option reduces the memory footprint of the joiner, the disk footprint of the workflow, and may speed up the execution in cases where main memory is scarce.
Row Keys
How to generate the keys of the combined output rows.
Concatenate original row keys with separator: The generated row key concatenates the row keys of the input rows, e.g., a row joining rows with key Row3 and Row17 is assigned the key Row3_Row17.
Assign new row keys sequentially: Combined rows are assigned row keys in the order they are produced, e.g., the first row in the join result is assigned row key Row0, the second row is assigned key Row1, etc.
Keep row keys: Only available if the join criteria guarantee the equality of matching rows' keys. Combined rows are assigned the row key of the input rows. To guarantee equality, a) the join criteria assert row key equality and b) if the output is a single table, it can not contain unmatched rows from both left and right input. Note that this a sufficient but not a necessary condition to guarantee equal and unique row keys. However, for performance reasons, keeping row keys is only enabled when the conditions hold.

Column selection

Column Selection (Top Input (left table) and Bottom Input (right table))
Include: Select columns that are included in the combined rows.
Exclude: Select columns that are discarded in the combined rows.
Duplicate column names
The option configures how column names clashes are handled, in case a column name appears both in the left input table and the right input table.
Do not execute: The node will display a warning and won't execute if there are duplicate column names in the left and right input tables.
Append suffix: Append the specified suffix to the duplicate column names in the right input table.

Performance

Output order
Arbitrary output order: The execution time of the join can be improved if the output does not have to be sorted. The produced order can vary between any two executions of the joiner and depends on how much main memory is available during the join operation.
Sort by row offset in left table, then right table: The combined rows in the output are sorted according to the offsets of the contributing rows. Consider a row R=(L1, R1) in the output that combines rows L1 and R1 from the left and right input tables, respectively. R comes before another row S=(L2, R2) in the output if L1 comes before L2 in the left table. If two rows in the output both involve the same row from the left table, they are ordered according to the order of the right contributing row.
If the results are output in a single port, the matching rows are first output, then the unmatched rows from the left table, and finally the unmatched rows from the right table.
This output order is the same as in the previous joiner implementation.
Miscellaneous
Maximum number of open files: Controls the number of temporary files that can be created during the join operation and possibly subsequent sorting operations. More temporary files may increase performance, but the operating system might impose a limit on the maximum number of open files.

Input Ports

Icon
Left input table
Icon
Right input table

Output Ports

Icon
Either all results or the result of the inner join (if the unmatched rows are output in separate ports)
Icon
Unmatched rows from the left input table (top input port). Inactive if "Output unmatched rows to separate ports" is deactivated.
Icon
Unmatched rows from the right input table (bottom input port). Inactive if "Output unmatched rows to separate ports" is deactivated.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.