Naive Bayes Learner

The node creates a Bayesian model from the given training data. It calculates the number of rows per attribute value per class for nominal attributes and the Gaussian distribution for numerical attributes. The created model could be used in the naive Bayes predictor to predict the class membership of unclassified data. The node displays a warning message if any columns are ignored due to unsupported data types. For example Bit Vector columns are ignored when the PMML compatibility flag is enabled since they are not supported by the PMML standard.

Options

Classification Column
The class value column.
Default probability
A probability of zero for a given attribute/class value pair requires special attention. Without adjustment, a probability of zero would exercise an absolute veto over a likelihood in which that probability appears as a factor. Therefore, the Bayes model incorporates a default probability parameter that specifies a default (usually very small) probability to use in lieu of zero probability for a given attribute/class value pair. The default probability is used if the attribute is:
  • nominal and was not seen by the learner
  • continuous and its probability is smaller than the default probability
Minimum standard deviation
Specify the minimum standard deviation to use for observations without enough (diverse) data. The value must be at least 1e-10.
Threshold standard deviation
Specify the threshold for standard deviation. The value must be positive. If this threshold is not met, the minimum standard deviation value is used.
Maximum number of unique nominal values per attribute
All nominal columns with more unique values than the defined number will be skipped during learning.
Ignore missing values
By default the node uses the missing value information to improve the prediction result. Since the PMML standard does not support this option and ignores missing values this option is disabled if the PMML compatibility option is selected and missing values are ignored.
Create PMML 4.2 compatible model
Select this option to create a model which is compliant with the PMML 4.2 standard. The PMML 4.2 standard ignores missing values and does not support bit vectors. Therefore bit vector columns and missing values are ignored during learning and prediction if this option is selected.

Even if this option is not selected the node creates a valid PMML model. However the model contains KNIME specific information to store missing value and bit vector information. This information is used in the KNIME Naive Bayes Predictor to improve the prediction result but ignored by any other PMML compatible predictor which might result in different prediction results.

Input Ports

Icon
Training data

Output Ports

Icon
Learned naive Bayes model. The model can be used to classify data with unknown target (class) attribute. To do so, connect the model out port to the "Naive Bayes Predictor" node.
Icon
Data table with attribute statistics e.g. counts per attribute class pair, mean and standard deviation.

Views

Naive Bayes Learner View
The view displays the learned model with the number of rows per class attribute. The number of rows per attribute per class for nominal attributes and the Gaussian distribution per class for numerical attributes.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.