Duplicate Row Filter

This node identifies duplicate rows. Duplicate rows have identical values in certain columns. The node chooses a single row for each set of duplicates ("chosen"). You can either remove all duplicate rows from the input table and keep only unique and chosen rows or mark the rows with additional information about their duplication status.

Options

Duplicate detection

Choose columns for duplicates detection: Allows the selection of columns identifying the duplicates. Columns not selected are handled under "Row selection" in the "Advanced" tab.

Duplicate handling

Duplicate rows

Remove duplicate rows: Removes duplicate rows and keeps only unique and chosen rows.
Keep duplicate rows: Appends columns with additional information to the input table.

Row chosen in case of duplicate

First: The first row in sequence is chosen.
Last: The last row in sequence is chosen.
Minimum of: The first row with the minimum value in the selected column is chosen. In case of strings, the row will be chosen following lexicographical order. Missing values are sorted after the maximum value.
Maximum of: The first row with the maximum value in the selected column is chosen. In case of strings, the row will be chosen following lexicographical order. Missing values are sorted before the minimum value.

Performance

Compute in memory: Advanced setting that, if selected, computation is sped up by utilizing working memory (RAM). The amount of required memory is higher than for a regular computation and also depends on the amount of input data.
Retain row order: Advanced setting that, if selected, the rows in the output table are guaranteed to have the same order as in the input table.
Update domains of all columns: Advanced setting to enable recomputation of the domains of all columns in the output tables such that the domains' bounds exactly match the bounds of the data in the output tables.

Input Ports

: The data table containing potential duplicates.

Output Ports

: Either the input data without duplicates or the input data with additional columns identifying duplicates.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202506181431

On NodePit since: 2025-07-02

Last update: 2025-07-06

KNIME versions: Since v4.0

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!