Icon

3 Duplicate Rows Workflow

Duplicate Row Filter

This workflow is based on the adult.csv data set. Try it out to:
1. Remove duplicates
- keep the first or last appearance of the duplicates
- keep the row of duplicates that has a maximum or minimum value regarding a specific feature
2. Flag duplicates
- add a column that flags rows as unique, duplicate or chosen
- add a column that displays the RowID of the (representative) chosen row for each duplicate
- add both columns for the two flag types that were mentioned before

This workflow shows how to remove or flag duplicates in the data set and the different options todefine which row to keep using the Duplicate Row Filter node.For more detailed information see the workflow metadata. Find it here: View -> Description Simply removingduplicates from dataset Flagging Duplicates Final Output with DoublesRemoved (CSV File) Read the adult.csvdatakeep the first appearanceof the duplicatesAdd both flag typesNode 10 File Reader DuplicateRow Filter DuplicateRow Filter CSV Writer This workflow shows how to remove or flag duplicates in the data set and the different options todefine which row to keep using the Duplicate Row Filter node.For more detailed information see the workflow metadata. Find it here: View -> Description Simply removingduplicates from dataset Flagging Duplicates Final Output with DoublesRemoved (CSV File) Read the adult.csvdatakeep the first appearanceof the duplicatesAdd both flag typesNode 10 File Reader DuplicateRow Filter DuplicateRow Filter CSV Writer

Nodes

Extensions

Links