Performs a multinomial logistic regression. Select in the dialog a
target column (combo box on top), i.e. the response.
The solver combo box allows you to select which solver should be used for the problem
(see below for details on the different solvers).
lists in the center of the dialog allow you to include only certain
columns which represent the (independent) variables.
Make sure the columns you want to have included being in the right
See article in wikipedia about
for an overview about the topic.
Important Note on Normalization
The SAG solver works best with z-score normalized data.
That means that the columns are normalized to have zero mean and a standard deviation of one.
This can be achieved by using a normalizer node before learning.
If you have very sparse data (lots of zero values), this normalization will destroy the sparsity.
In this case it is recommended to only normalize the dense features to exploit the sparsity during
the calculations (SAG solver with lazy calculation).
Note, however, that the normalization will lead to different coefficients and statistics of those (standard error, z-score, etc.).
Hence if you want to use the learner for statistics (obtaining the mentioned statistics) rather than machine learning (obtaining a classifier),
you should carefully consider if normalization makes sense for your task at hand.
If the node outputs missing values for the parameter statistics, this is very likely caused by insufficient normalization and you will have
to use the IRLS solver if you can't normalize your data.
The solver is the most important choice you make as it will dictate which algorithm is used to solve the problem.
Iteratively reweighted least squares This solver uses an iterative optimization approach which is also
sometimes termed Fisher's scoring, to calculate the model. It works well for small tables with only view columns
but fails on larger tables. Note that it is the most error prone solver because it can't calculate a model if the
data is linearly separable (see Potential Errors and Error Handling for more information).
This solver is also not capable of dealing with tables where there are more columns than rows because it does not
Stochastic average gradient (SAG) This solver implements a variant of stochastic gradient descent which tends to
converge considerably faster than vanilla stochastic gradient descent. For more information on the algorithm see
the following paper. It works well for large tables and also tables with
more columns than rows. Note that in the later case a regularization prior other than "uniform" must be selected.
The default learning rate of 0.1 was selected because it often works well but ultimately the optimal learning rate always
depends on the data and should be treated as a hyperparameter.
Learning Rate/Step Size Strategy
Only relevant for the SAG solver.
The learning rate strategy provides the learning rates for the gradient descent.
When selecting a learning rate strategy and initial learning rate keep in mind that there is always a trade off
between the size of the learning rate and the number of epochs that are required to converge to a solution.
With a smaller learning rate the solver will take longer to find a solution but if the learning rate is too large
it might skip over the optimal solution and diverge in the worst case.
Fixed The provided step size is used for the complete training. This strategy works well for the SAG solver,
even if relatively large learning rates are used.
Line Search Experimental learning rate strategy that tries to find the optimal learning rate for the SAG solver.
The SAG solver optimizes the problem using
maximum a posteriori estimation
which allows to specify a prior distribution for the coefficients of the resulting model.
This form of regularization is the Bayesian version of other regularization approaches such as Ridge or LASSO.
Currently the following priors are supported:
Uniform This prior corresponds to no regularization at all and is the default. It essentially means that all values
are equally likely for the coefficients.
Gauss The coefficients are assumed to be normally distributed. This prior keeps the coefficients from becoming
too large but does not force them to be zero. Using this prior is equivalent to using ridge regression (L2) with
a lambda of 1/prior_variance.
Laplace The coefficients are assumed to follow a Laplace or double exponential distribution. It tends to produce
sparse solutions by forcing unimportant coefficients to be zero. It is therefore related to the LASSO (also known as
Potential Errors and Error Handling
The computation of the model is an iterative optimization process that requires some properties of the data set.
This requires a reasonable distribution of the target values and non-constant, uncorrelated columns. While
some of these properties are checked during the node execution you may still run into errors during the
computation. The list below gives some ideas what might go wrong and how to avoid such situations.
Insufficient Information This is the case when the data does not provide enough information about
one or more target categories. Try to get more data or remove rows for target categories that may cause
the error. If you are interested in a model for one target category make sure to group the target
column before. For instance, if your data contains as target categories the values "A", "B", ..., "Z" but
you are only interested in getting a model for class "A" you can use a rule engine node to convert your
target into "A" and "not A".
Violation of Independence Logistic Regression is based on the assumption of statistical independence.
A common preprocessing step is to us a correlation filter to remove highly correlated learning columns.
Use a "Linear Correlation" along with a "Correlation Filter" node to remove redundant columns, whereby often
it's sufficient to compute the correlation model on a subset of the data only.
Separation Please see this article
about separation for more information.