Icon

Linear_​Regression_​Solution

Linear Regression

Linear regression: predict house price.

- Partition data into training and test set
- Train a linear regression model
- Apply the trained model to the test set
- Handle missing values
- Evaluate the model performance with the Numeric Scorer node

URL: Guide to Intelligent Data Science https://www.datascienceguide.org/

Exercise: Linear Regression

In this exercise we will predict the price of an house in Ames (Iowa, USA) given a number of features (size, neighborhood, heating...) using Linear Regression.

  1. Read dataset AmesHousing_simple.csv. It contains information about houses sold in Ames (only numerical values) as well as the SalePrice.

  2. Add Partitioning node to File Reader output.

    • Top port should have 70 % of the rows

    • Draw randomly such rows

  3. Add Linear Regression Learner to top output port of Partitioning node

    • Select price column to be learned.

    • Execute the node and open its scatter plot view. Which column is most correlated to the price (column selection tab)?

  4. Add Regression Predictor

    • Predict test set (remaining 30% rows) by simply connecting the remaining unconnected output ports

  5. Remove rows with missing prediction

  6. Add Numeric Scorer to Regression Predictor Output

    • Reference Column: the column you learned.

    • Predicted Column: the new column created by the predictor node


Linear Regression Learner
Color Manager
Regression Predictor
Scatter Plot
Numeric Scorer
Missing Value
Read AmesHousing_simple.csv
CSV Reader
Table Partitioner

Nodes

Extensions

Links