Icon

BH

Boston Housing excel file:

506 rows 14 columns

Target column: MEDV

I. Standard regression: one independent variable LSTAT and one predictor MEDV. The entire data set is the training set

Decription:

The Boston Housing Dataset

The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. The following describes the dataset columns:

  • CRIM - per capita crime rate by town

  • ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

  • INDUS - proportion of non-retail business acres per town.

  • CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

  • NOX - nitric oxides concentration (parts per 10 million)

  • RM - average number of rooms per dwelling

  • AGE - proportion of owner-occupied units built prior to 1940

  • DIS - weighted distances to five Boston employment centres

  • RAD - index of accessibility to radial highways

  • TAX - full-value property-tax rate per $10,000

  • PTRATIO - pupil-teacher ratio by town

  • B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

  • LSTAT - % lower status of the population

  • MEDV - Median value of owner-occupied homes in $1000's

II. Same as above, but this time we split the data into a training set (70%) and a test set (30%). there is a Table Partitioner node in the metanode, as well as Linear Regression Learner and Regression predictor.

To view the contents of the metanode right click metanode choose Metanode
and next choose Open Metanode or Ctrl Alt Enter. To return to
workflow view, click BH on the top bar.

III. Multivariable Linear regression

All 13 independent predictors are taken into account.

IV. Polynomial regression.

Check the Adjusted R^2 result for different values ​​of Polynomial degree (polynomial Regression learner node) and corresponding values ​​of Number of Predictors in the Numeric Scorer node

V. Nonlinear transformation

The expression node transforms MEDV to LN(MEDV) so we have a fit of the nonlinear function ln(MEDV) = w0+w1*LSTAT. It is equivalent to fitting the function MEDV= exp(w0+w1*LSTAT) which behaves well for large values ​​of LSTAT.

Excel Reader
Linear Regression Learner
Regression Predictor
Regression Predictor
Numeric Scorer
Regression Line Plotter
Table Partitioner
Regression Line Plotter
Numeric Scorer
my meta
Metanode
Regression Predictor
Numeric Scorer
Column Filter
Numeric Scorer
Numeric Scorer
Expression
Linear Regression Learner
Numeric Scorer
Polynomial Regression Learner
Linear Regression Learner
Numeric Scorer
Regression Predictor
Math Formula

Nodes

Extensions

Links