Sometimes you may find that certain rows in your tables are duplicated one or more times. This may be due to many reasons, including bad data, combining tables through joins and concatenations, or some other analytic process.
Regardless of the reason, it is often the case that you do not want duplicate records. That is where the Duplicate Row Filter node comes in: it can automatically remove or flag rows whose values are duplicates of another row's.
The Duplicate Row Filter's configuration allows you to select which columns to check for duplicates. By default, all columns are selected, but you may include any subset of columns as per your specific needs.
On the Advanced tab, you can choose whether to remove or just flag duplicate rows. Further, there are options on which rows are removed: the first, last, minimum of, or maximum of. Finally, you can elect to retain the current row order, although this may lead to slower processing.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.