Icon

Linear_​Regression_​Solution

Linear Regression

Linear regression: predict house price.

- Partition data into training and test set
- Train a linear regression model
- Apply the trained model to the test set
- Handle missing values
- Evaluate the model performance with the Numeric Scorer node


Linear Regresion: price prediction Exercise: Linear RegressionIn this exercise we will predict the price of an house in Ames (Iowa, USA) given a number of features (size, neighborhood, heating...) using Linear Regression.1) Read dataset AmesHousing_simple.csv. It contains information about houses sold in Ames (only numerical values) as well as the SalePrice.2) Add Partitioning node to File Reader output - Top port should have 70 % of the rows - Draw randomly such rows3) Add Linear Regression Learner to top output port of Partitioning node - Select price column to be learned - Execute the node and open its scatter plot view. Which column is most correlated to the price (column selection tab)?4) Add Regression Predictor - Predict test set (remaining 30% rows) by simply connecting the remaining unconnected output ports5) Remove rows with missing prediction6) Add Numeric Scorer to Regression Predictor Output - Reference Column: the column you learned - Predicted Column: the new column created by the predictor node AmesHousing_simple.csvdataset Partitioning Numeric Scorer Linear RegressionLearner RegressionPredictor File Reader Missing Value Linear Regresion: price prediction Exercise: Linear RegressionIn this exercise we will predict the price of an house in Ames (Iowa, USA) given a number of features (size, neighborhood, heating...) using Linear Regression.1) Read dataset AmesHousing_simple.csv. It contains information about houses sold in Ames (only numerical values) as well as the SalePrice.2) Add Partitioning node to File Reader output - Top port should have 70 % of the rows - Draw randomly such rows3) Add Linear Regression Learner to top output port of Partitioning node - Select price column to be learned - Execute the node and open its scatter plot view. Which column is most correlated to the price (column selection tab)?4) Add Regression Predictor - Predict test set (remaining 30% rows) by simply connecting the remaining unconnected output ports5) Remove rows with missing prediction6) Add Numeric Scorer to Regression Predictor Output - Reference Column: the column you learned - Predicted Column: the new column created by the predictor node AmesHousing_simple.csvdatasetPartitioning Numeric Scorer Linear RegressionLearner RegressionPredictor File Reader Missing Value

Nodes

Extensions

Links