Icon

02 Cleaning and Standardization

Cleaning and Standardization - Exercise

This workflow shows a hands-on exercise in the L1-DS Introduction to KNIME Analytics Platform for Data Scientists - Basics course

Task 1: Row Filtering1. Read the adult.csv file by executing the CSV Reader node2. Filter out rows where the marital status is missing3. Extract rows where - the marital status is divorced- the marital status is never married and age is between 20 and 40 (both included)- the workclass starts with "S" Task 2: Column Filtering1. Read the adult_education.table file by executing the Table Reader node2. Exclude the "education-num" column- manually- by including only string type columns Task 3: Data Transformation1. Work with the adult.csv data again and create a new column "work-status" with thevalue "full-time" if the weekly working hours are >=40 and "part-time" otherwise2. Replace the hyphen in "United-States" by a space character in the "native-country"column3. Create a new column "year-of-birth" by substracting the age number from 1994,which is the year when the data were collected 4. OPTIONAL: Replicate the tasks 3 & 4 with the Column Expressions node adult.csvRead adult_education.tablemarital status missingNever married DivorcedWorkclass[0] == 'S'age 20-40manuallystr only CSV Reader Table Reader Row Filter Row Filter Row Filter Row Filter Row Filter Column Filter Column Filter Task 1: Row Filtering1. Read the adult.csv file by executing the CSV Reader node2. Filter out rows where the marital status is missing3. Extract rows where - the marital status is divorced- the marital status is never married and age is between 20 and 40 (both included)- the workclass starts with "S" Task 2: Column Filtering1. Read the adult_education.table file by executing the Table Reader node2. Exclude the "education-num" column- manually- by including only string type columns Task 3: Data Transformation1. Work with the adult.csv data again and create a new column "work-status" with thevalue "full-time" if the weekly working hours are >=40 and "part-time" otherwise2. Replace the hyphen in "United-States" by a space character in the "native-country"column3. Create a new column "year-of-birth" by substracting the age number from 1994,which is the year when the data were collected 4. OPTIONAL: Replicate the tasks 3 & 4 with the Column Expressions node adult.csvRead adult_education.tablemarital status missingNever married DivorcedWorkclass[0] == 'S'age 20-40manuallystr only CSV Reader Table Reader Row Filter Row Filter Row Filter Row Filter Row Filter Column Filter Column Filter

Nodes

Extensions

Links