Icon

06_​Outlier_​Detection_​solution

Outlier Detection - Solution
Exercise: Outlier DetectionSome houses might be special cases in terms of size, price, and the year when they were sold or built. Let's clean the data from these houses in order to build a better model!1) Remove houses that have a sales price lying outside the interquartile range of all sales prices (Numeric Outlier node)- Select the "SalePrice" column- Set the interquartile range parameter to 1.52) Optional: Remove the 5 % of the houses that are the most extreme in terms of size (Normalizer and Rule-based Row Filter nodes)- Normalize the "Lot Area" column using z-score- Filter out houses whose normalized lot size is outside the range [-1.96, 1.96] Numeric Outliers Optional: Outliers in Distribution Tails Remove numeric outliersin SalePriceZ-score+/- 1.96 as thresholdRead AmesHousing.csv+/- 1.96 as threshold Numeric Outliers Normalizer Rule-basedRow Filter Normalizer (Apply) Numeric Outliers(Apply) Preprocessing CSV Reader Rule-basedRow Filter Exercise: Outlier DetectionSome houses might be special cases in terms of size, price, and the year when they were sold or built. Let's clean the data from these houses in order to build a better model!1) Remove houses that have a sales price lying outside the interquartile range of all sales prices (Numeric Outlier node)- Select the "SalePrice" column- Set the interquartile range parameter to 1.52) Optional: Remove the 5 % of the houses that are the most extreme in terms of size (Normalizer and Rule-based Row Filter nodes)- Normalize the "Lot Area" column using z-score- Filter out houses whose normalized lot size is outside the range [-1.96, 1.96] Numeric Outliers Optional: Outliers in Distribution Tails Remove numeric outliersin SalePriceZ-score+/- 1.96 as thresholdRead AmesHousing.csv+/- 1.96 as threshold Numeric Outliers Normalizer Rule-basedRow Filter Normalizer (Apply) Numeric Outliers(Apply) Preprocessing CSV Reader Rule-basedRow Filter

Nodes

Extensions

Links