Icon

08 Regression Model - Solution

Solution to an exercise for training a model for numeric prediction.

Train and apply a linear regression model. Evaluate the performance with numeric scoring metrics.

CHECK YOUR ANSWERS:
a. The model explains about 20% of the variance of the weekly working hours (R-squared)
b. The mean absolute error of the model is about 8 hours



Exercise: Linear Regression and Numeric Scoring Metrics1) Read the adult_joined.table file by executing the Table Reader and Missing Value nodes2) Partition the data into a training set (75 %) and test set (25 %). Draw randomly.3) Train a linear regression model on the training set to predict the weekly working hours. Use all other columns but the "ID"column for the prediction.4) Apply the model to the test set5) Evaluate the performance of the linear regression model with the Numeric Scorer node. Which proportion of the varianceof the weekly working hours does the model explain? How many hours is the mean absolute error of the model? The proportion of the varianceexplained is represented by the R^2metric, here about 20 %.The mean absolute error metricreports the average error in hours,here about 8 hours.NOTE: due to random partitioning,these values might slightly change atevery execution Top: train set (75%)Bottom: test set (25%)Random samplingTrain the modelto predict hours per weekApply the modelto the test setEvaluate modelperformanceRead data adult_joined.table Partitioning Linear RegressionLearner RegressionPredictor Numeric Scorer Table Reader Missing Value Exercise: Linear Regression and Numeric Scoring Metrics1) Read the adult_joined.table file by executing the Table Reader and Missing Value nodes2) Partition the data into a training set (75 %) and test set (25 %). Draw randomly.3) Train a linear regression model on the training set to predict the weekly working hours. Use all other columns but the "ID"column for the prediction.4) Apply the model to the test set5) Evaluate the performance of the linear regression model with the Numeric Scorer node. Which proportion of the varianceof the weekly working hours does the model explain? How many hours is the mean absolute error of the model? The proportion of the varianceexplained is represented by the R^2metric, here about 20 %.The mean absolute error metricreports the average error in hours,here about 8 hours.NOTE: due to random partitioning,these values might slightly change atevery execution Top: train set (75%)Bottom: test set (25%)Random samplingTrain the modelto predict hours per weekApply the modelto the test setEvaluate modelperformanceRead data adult_joined.tablePartitioning Linear RegressionLearner RegressionPredictor Numeric Scorer Table Reader Missing Value

Nodes

Extensions

Links