Icon

P2.1.1 - Linear Regression - Instructions

Regression Models
Practical 2.1.1 - Linear Regression

Learning objective: In this exercise you'll learn how to predict the price of a house in Ames (Iowa, USA) given a number of features: size, neighborhood, heating...


Workflow description: This workflow uses a dataset that describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. One of the columns is the overall condition ranking, with values between 1 and 10.


You'll find the instructions to the exercises in the yellow annotations.

Step 1. Exploratory analysis Use the 'Statistics' and 'Statistics View' nodes to explore the data. Which are your main observations? Consider what to do with missing values and use 'Missing Value' and 'Missing Value (Apply)' nodes to handle.
Step 2. Partitioning Add Partitioning node to CSV Reader output port: Top port should have 70 % of the rows Draw randomly such rows. Delete records with missing values first.
Data Preparation

Step 4.Regression Predictor Add Regression Predictor node: Predict test set (remaining 30% rows) by simply connecting the remaining unconnected output ports
Step 5. Model evaluation

Add Numeric Scorer node to the Regression Predictor output port:

  • Reference Column: the column you learned

  • Predicted Column: the new column created by the predictor node


Step 3. Linear Regression Learner Add Linear Regression Learner node to top output port of Partitioning node: Select price column to be learned Execute the node. Which column is most correlated to the price (column selection tab)?
CSV Reader
Node 53
Statistics
Node 52
Statistics View
Housing dataset
CSV Reader
Regression Predictor
70% training 30% testing
Table Partitioner
Node 54
Missing Value
Node 55
Missing Value (Apply)
Linear Regression Learner
Linear Correlation
Numeric Scorer

Nodes

Extensions

Links