Icon

Duplicate Row Filter

Sometimes you may find that certain rows in your tables are duplicated one or more times. This may be due to many reasons, including bad data, combining tables through joins and concatenations, or some other analytic process.

Regardless of the reason, it is often the case that you do not want duplicate records. That is where the Duplicate Row Filter node comes in: it can automatically remove or flag rows whose values are duplicates of another row's.

The Duplicate Row Filter's configuration allows you to select which columns to check for duplicates. By default, all columns are selected, but you may include any subset of columns as per your specific needs.

On the Advanced tab, you can choose whether to remove or just flag duplicate rows. Further, there are options on which rows are removed: the first, last, minimum of, or maximum of. Finally, you can elect to retain the current row order, although this may lead to slower processing.

Duplicate Row FilterSometimes you may find that certain rows in your tables are duplicated one or moretimes. This may be due to many reasons, including bad data, combining tables throughjoins and concatenations, or some other analytic process.Regardless of the reason, it is often the case that you do not want duplicate records.That is where the Duplicate Row Filter node comes in: it can automatically remove orflag rows whose values are duplicates of another row's.The Duplicate Row Filter's configuration allows you to select which columns to checkfor duplicates. By default, all columns are selected, but you may include any subset ofcolumns as per your specific needs.On the Advanced tab, you can choose whether to remove or just flag duplicate rows.Further, there are options on which rows are removed: the first, last, minimum of, ormaximum of. Finally, you can elect to retain the current row order, although this maylead to slower processing. By selecting all columns, only those rows whereevery value matches the value of some other row areremoved. This is the most common configuration ofthe Duplicate Row Filter. Night Heron Data, 2023 This example flags duplicate records. Notice howduplicate records are labeled "chosen" or "duplicate."These flags are controlled by the Row Selectionoption under the Advanced tab. Here, the first row ofa duplicate is chosen and all subsequent rows aremarked as "duplicate." Input some data with duplicatesadded at the endRemove allduplicate rowsFlag allduplicate rows Table Creator DuplicateRow Filter DuplicateRow Filter Duplicate Row FilterSometimes you may find that certain rows in your tables are duplicated one or moretimes. This may be due to many reasons, including bad data, combining tables throughjoins and concatenations, or some other analytic process.Regardless of the reason, it is often the case that you do not want duplicate records.That is where the Duplicate Row Filter node comes in: it can automatically remove orflag rows whose values are duplicates of another row's.The Duplicate Row Filter's configuration allows you to select which columns to checkfor duplicates. By default, all columns are selected, but you may include any subset ofcolumns as per your specific needs.On the Advanced tab, you can choose whether to remove or just flag duplicate rows.Further, there are options on which rows are removed: the first, last, minimum of, ormaximum of. Finally, you can elect to retain the current row order, although this maylead to slower processing. By selecting all columns, only those rows whereevery value matches the value of some other row areremoved. This is the most common configuration ofthe Duplicate Row Filter. Night Heron Data, 2023 This example flags duplicate records. Notice howduplicate records are labeled "chosen" or "duplicate."These flags are controlled by the Row Selectionoption under the Advanced tab. Here, the first row ofa duplicate is chosen and all subsequent rows aremarked as "duplicate." Input some data with duplicatesadded at the endRemove allduplicate rowsFlag allduplicate rows Table Creator DuplicateRow Filter DuplicateRow Filter

Nodes

Extensions

Links