Linear Regression Learner

This Node Is Deprecated — This version of the node has been replaced with a new and improved version. The old version is kept for backwards-compatibility, but for all new workflows we suggest to use the version linked below.
Go to Suggested ReplacementLinear Regression Learner

Performs a multivariate linear regression. Select in the dialog a target column (combo box on top), i.e. the response. The two lists in the center of the dialog allow you to include only certain columns which represent the (independent) variables. Make sure the columns you want to have included being in the right "include" list. See article in wikipedia about linear regression for an overview about the topic.
If the optional PMML inport is connected and contains preprocessing operations in the TransformationDictionary those are added to the learned model.


To select the target column. Only columns with numeric data are allowed.
To specify the independent columns the should be included in the regression model. Numeric and nominal data can be included, whereby for nominal data dummy variables are automatically created as described in section Categorical variables in regression.
Predefined Offset Value
By default, the regression model includes a constant term. Selecting this option the given constant term is used. The value works like a user defined intercept.
Missing Values in Input Data
Define wether missing value in the input are ignored or whether the node execution should fail on missing values.
Scatter Plot View
Specify the rows that shall be available as data points in the scatter plot view.

Input Ports

Table on which to perform regression.
Optional PMML port object containing preprocessing operations.

Output Ports

Model to connect to a predictor node.
Coefficients and statistics of the linear regression model.


Linear Regression Result View
Displays the estimated coefficients and error statistics.
Linear Regression Scatterplot View
Displays the input data along with the regression line in a scatterplot. The y-coordinate is fixed to the response column (the column that has been approximated) while the x-column can be chosen among the independent variables with numerical values. Note: If you have multiple input variables, this view is only an approximation. It will fix the value of each variable that is not shown in the view to its mean. Thus, this view generally only makes sense if you only have a few input variables.


  • No workflows found



You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.