Icon

02_​Missing_​Value_​Handling_​solution

Missing Value Handling - solution
Exercise: Missing Value Handling1) The "Garage Yr Blt" column (year when the garage was built) has many missing values. This means probably that these houses don'thave a garage. Therefore:- Create a new feature "Garage" that gets the value 0 if the "Garage Yr Blt" feature is missing and 1 otherwise (Rule Engine node)2) The dataset contains at least two columns with many missing values: "Lot Frontage" (linear feet of street connected to the property) and"Mas Vnr Area" (masonry veneer area).- Set the missings values in these columns to zero (Missing Value node)- Moreover, replace the missing values in the "Garage Yr Blt" column with the column mean (Missing Value node)- Other missing values are few. Remove the remaining rows containing missing values (Missing Value node)3) Apply the same missing value handling to the test set (Rule Engine and Missing Value (Apply) nodes) Create Garage column (RuleEngine node) Handle missing values (Missing Value node)- Remove row with string or integer missing values- Set "Lot frontage" and "Mas Vnr Area" missing values to 0- Impute "Garage Yr Blt" missing value to the column mean Apply the above missing value handling to the test set (Rule Engine and MissingValue (Apply) nodes) Optional:- Handle the NAs as missing values (Column AutoType Cast node)- Remove columns with more than 30% missingvalues (Missing Value Column Filter node) Handle remaining missing values (MissingValue node)- Set the numeric missing values to the columnmean and the string to the most frequent value - Handle the as missing values in the test set(Column Auto Type Cast node)- Remove the discarded columns from the test set(Reference Column Filter node) Apply the same missing value handling to thetest set (Missing Value (Apply) node) NAsas missings30 % NAs- Lot Frontage & Mas Vnr Area: set to 0- Garage Yr Blt: replace with mean- String & Number: remove row- String: mode- Number: meanRead AmesHousing.csv Column AutoType Cast Missing ValueColumn Filter Rule Engine Missing Value Missing Value Missing Value(Apply) ReferenceColumn Filter Missing Value(Apply) Rule Engine CSV Reader Preprocessing Column AutoType Cast Exercise: Missing Value Handling1) The "Garage Yr Blt" column (year when the garage was built) has many missing values. This means probably that these houses don'thave a garage. Therefore:- Create a new feature "Garage" that gets the value 0 if the "Garage Yr Blt" feature is missing and 1 otherwise (Rule Engine node)2) The dataset contains at least two columns with many missing values: "Lot Frontage" (linear feet of street connected to the property) and"Mas Vnr Area" (masonry veneer area).- Set the missings values in these columns to zero (Missing Value node)- Moreover, replace the missing values in the "Garage Yr Blt" column with the column mean (Missing Value node)- Other missing values are few. Remove the remaining rows containing missing values (Missing Value node)3) Apply the same missing value handling to the test set (Rule Engine and Missing Value (Apply) nodes) Create Garage column (RuleEngine node) Handle missing values (Missing Value node)- Remove row with string or integer missing values- Set "Lot frontage" and "Mas Vnr Area" missing values to 0- Impute "Garage Yr Blt" missing value to the column mean Apply the above missing value handling to the test set (Rule Engine and MissingValue (Apply) nodes) Optional:- Handle the NAs as missing values (Column AutoType Cast node)- Remove columns with more than 30% missingvalues (Missing Value Column Filter node) Handle remaining missing values (MissingValue node)- Set the numeric missing values to the columnmean and the string to the most frequent value - Handle the as missing values in the test set(Column Auto Type Cast node)- Remove the discarded columns from the test set(Reference Column Filter node) Apply the same missing value handling to thetest set (Missing Value (Apply) node) NAsas missings30 % NAs- Lot Frontage & Mas Vnr Area: set to 0- Garage Yr Blt: replace with mean- String & Number: remove row- String: mode- Number: meanRead AmesHousing.csv Column AutoType Cast Missing ValueColumn Filter Rule Engine Missing Value Missing Value Missing Value(Apply) ReferenceColumn Filter Missing Value(Apply) Rule Engine CSV Reader Preprocessing Column AutoType Cast

Nodes

Extensions

Links