kn_example_duplicates

Remove rows with duplicate values

I set up a workflow to demonstrate how this could be done

- use group by to calculate how many duplicates there are (note: KNIME should introduce a generic COUNT(*) function - I had to use a variable)
- if the count is larger then 1 it is a duplicate
- left join it back to the original data
- sort the data by ID and other variables if you want to keep one of the duplicates
- use the LAG column to identify which line is a 2nd, 3rd occurrence of a duplicate
- make a rule to keep just a single line of each ID
- alternative: just remove all duplicates

Nodes

Rule Engine2 ×
Rule-based Row Filter2 ×
Column Filter1 ×
Column Rename1 ×
GroupBy1 ×
Show all 9 nodes

Extensions

FeatureKNIME Base nodes
FeatureKNIME Javasnippet

kn_​example_​duplicates

Nodes

Extensions

Links

Download

kn_example_duplicates