Icon

05_​Missing_​Value_​Handling_​solution

Missing Value Handling - Solution

Introduction to Machine Learning Algorithms course - Session 4
Solution to exercise 2
Handle missing values in the data by
- Setting them to a fixed value (zero)
- Generating a dummy column based on missing values in another column
- Replacing them with the column mean or the most frequent value in the column
- Looking for different missing value patterns in the data
- Filtering out columns that have many missing values



Exercise: Missing Value Handling1) The "Garage Yr Blt" column (year when the garage was built) has many missing values. This means probably that these houses don't have a garage. Therefore:- Create a new feature "Garage" that gets the value 0 if the "Garage Yr Blt" feature is missing and 1 otherwise (Rule Engine node)2) The dataset contains at least two columns with many missing values: "Lot Frontage" (linear feet of street connected to the property) and "Mas Vnr Area" (masonry veneer area).- Set the missings values in these columns to zero. (Missing Value node)- Moreover, replace the missing values in the "Garage Yr Blt" column with the column mean- Other missing values are few. Remove the remaining rows containing missing values.3) Apply the same missing value handling to the test set (Missing Value (Apply) and Rule Engine nodes) Create Garage column ( Rule Enginenode) Handle missing values ( Missing Value node) -Remove row with string or integer missing values- Set "Lot frontage" and "Mas Vnr Area" missing values to 0- Impute "Garage Yr Blt" missing value to the column mean Optional:-Handle the NAs missing values (Column Auto Type Cast node)-Remove columns with more than 30% missing values (Missing Value ColumnFilter node) Handle remaining missing values ( Missing Valuenode) -Set the numeric missing values to the column mean and the stringto the most frequent value Apply the above missing value handling to the test set (Missing value apply and Rule Engine nodes) Remove the discarded columns from the test set (Reference column Filter node) Apply the same missing value handling to the test set (MissingValue (Apply) node) NAsas missings30 % NAs0 to twocolumnsRead AmesHousing.csv Column AutoType Cast Missing ValueColumn Filter Rule Engine Missing Value Missing Value Missing Value(Apply) ReferenceColumn Filter Missing Value(Apply) Rule Engine CSV Reader Preprocessing Exercise: Missing Value Handling1) The "Garage Yr Blt" column (year when the garage was built) has many missing values. This means probably that these houses don't have a garage. Therefore:- Create a new feature "Garage" that gets the value 0 if the "Garage Yr Blt" feature is missing and 1 otherwise (Rule Engine node)2) The dataset contains at least two columns with many missing values: "Lot Frontage" (linear feet of street connected to the property) and "Mas Vnr Area" (masonry veneer area).- Set the missings values in these columns to zero. (Missing Value node)- Moreover, replace the missing values in the "Garage Yr Blt" column with the column mean- Other missing values are few. Remove the remaining rows containing missing values.3) Apply the same missing value handling to the test set (Missing Value (Apply) and Rule Engine nodes) Create Garage column ( Rule Enginenode) Handle missing values ( Missing Value node) -Remove row with string or integer missing values- Set "Lot frontage" and "Mas Vnr Area" missing values to 0- Impute "Garage Yr Blt" missing value to the column mean Optional:-Handle the NAs missing values (Column Auto Type Cast node)-Remove columns with more than 30% missing values (Missing Value ColumnFilter node) Handle remaining missing values ( Missing Valuenode) -Set the numeric missing values to the column mean and the stringto the most frequent value Apply the above missing value handling to the test set (Missing value apply and Rule Engine nodes) Remove the discarded columns from the test set (Reference column Filter node) Apply the same missing value handling to the test set (MissingValue (Apply) node) NAsas missings30 % NAs0 to twocolumnsRead AmesHousing.csv Column AutoType Cast Missing ValueColumn Filter Rule Engine Missing Value Missing Value Missing Value(Apply) ReferenceColumn Filter Missing Value(Apply) Rule Engine CSV Reader Preprocessing

Nodes

Extensions

Links