Exercise3

Chapter 7/Exercise 3In this exercise, we implement a Table Row to Variable loop to clean a dataset from duplicates and missing values.We have access to a dataset (wrong_sales_file.txt) which contains sales records. However, the dataset is incorrect as for some sales it contains two entries: An older record containing a few missing values and a more recent record with all values correctly filled in. The column "load_date" indicates the date of record creation. The goal of this workflow is to clean the dataset by removing the duplicates and missing values. Hence, for each order number, we want to remove the older record and keep only the most recent one.Note. There are easier ways to clean data from duplicates, for example, using a Duplicate Row Filter or GroupBy node. However, in this workflow we implement a Table Row to Variable loop to demonstrate how it works.

Nodes

Extensions

FeatureKNIME Base nodes

Exercise3

Workflow: Chapter 7/Exercise 3

Nodes

Extensions

Links

Download