Logistic Regression Learner

This Node Is Deprecated — This version of the node has been replaced with a new and improved version. The old version is kept for backwards-compatibility, but for all new workflows we suggest to use the version linked below.
Go to Suggested ReplacementLogistic Regression Learner

Performs a multinomial logistic regression. Select in the dialog a target column (combo box on top), i.e. the response. The two lists in the center of the dialog allow you to include only certain columns which represent the (independent) variables. Make sure the columns you want to have included being in the right "include" list. See article in wikipedia about logistic regression for an overview about the topic. This particular implementation uses an iterative optimization procedure termed Fisher's scoring in order to compute the model.
If the optional PMML inport is connected and contains preprocessing operations in the TransformationDictionary those are added to the learned model.

Potential Errors and Error Handling

The computation of the model is an iterative optimization process that requires some properties of the data set. This requires a reasonable distribution of the target values and non-constant, uncorrelated columns. While some of these properties are checked during the node execution you may still run into errors during the computation. The list below gives some ideas what might go wrong and how to avoid such situations.
  • Insufficient Information This is the case when the data does not provide enough information about one or more target categories. Try to get more data or remove rows for target categories that may cause the error. If you are interested in a model for one target category make sure to group the target column before. For instance, if your data contains as target categories the values "A", "B", ..., "Z" but you are only interested in getting a model for class "A" you can use a rule engine node to convert your target into "A" and "not A".
  • Violation of Independence Logistic Regression is based on the assumption of statistical independence. A common preprocessing step is to us a correlation filter to remove highly correlated learning columns. Use a "Linear Correlation" along with a "Correlation Filter" node to remove redundant columns, whereby often it's sufficient to compute the correlation model on a subset of the data only.
  • Separation Please see this article about separation for more information.

Options

Target
Select the target column. Only columns with nominal data are allowed. The reference category is empty if the domain of the target column is not available. In this case the node determines the domain values right before computing the logistic regression model and chooses the last domain value as the targets reference category.
By default the target domain values are sorted lexicographically in the output, but you can enforce the order of the target column domain to be preserved by checking the box.
Note, if a target reference column is selected in the dropdown, the checkbox will have no influence on the coefficients of the model except that the output representation (e.g. order of rows in the coefficient table) may vary.
Values
Specify the independent columns that should be included in the regression model. Numeric and nominal data can be included.
By default the domain values (categories) of nominal valued columns are sorted lexicographically, but you can check that the order from the column domain is used. Please note that the first category is used as a reference when creating the dummy variables.

Input Ports

Icon
Table on which to perform regression. The input must not contain missing values, you have to fix them by e.g. using the Missing Values node.

Output Ports

Icon
Model to connect to a predictor node.
Icon
Coefficients and statistics of the logistic regression model.

Popular Successors

Views

Logistic Regression Result View
Displays the estimated coefficients and error statistics. Note, that the estimated coefficients are not reliable when the standard error is high.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.