Nominal Probability Distribution Creator

Creates a column containing a probability distribution either from numeric columns or a single string column. In case of numeric columns, one or more columns that contain probability values can be picked. The probability values must be non-negative and must sum up to 1.
In case of a string column, one single column can be selected. The probability distribution of the string column produces a one-hot encoding of the string column. In order to do this, the column must have a valid domain, i.e., the possible values of the column must be known. You can use a Domain Calculator to calculate these values if they are not present. Each of the possible values will be treated as a separate class, i.e., the number of distinct values in the string column will be the number of classes in the created probability distribution. The string value of a cell will have a probability of 1 whereby all the other possible string values of the column will have a probability of 0. The same output can be achieved by creating a probability distribution of the numeric output columns of the One to Many node applied to the same string column.

Options

Numeric Columns

Numeric Column Selection: Move the columns that contain the probability values to the "Include" list.
Allow probabilities that sum up to 1 imprecisely: If enabled, the probabilities must not sum up to 1 precisely. This might be helpful if there are, e.g., some rounding errors in the probability values. A number of decimal digits can be specified that defines the precision as explained below.
Precision (number of decimal digits): Defines the precision that the sum of the probabilities must have by restricting the number of decimal digits that must be precise. The sum is accepted if abs(sum - 1) <= 10^(-precision) , e.g., if the sum is 0.999, it is only accepted with a precision of <=2. The lower the specified number, the higher is the tolerance.
Invalid Probability Distribution Handling: Specify how to treat invalid probabilities. Invalid means, e.g., negative probabilities or probabilities that do not sum up to 1 (with respect to the specified precision). If Fail is selected, the node will fail. Otherwise, the node just gives a warning and puts missing values in the output for the corresponding rows.

String Columns

String Column Selection: A single string column can be picked from the dropdown menu.

General

Output column name: Specify the name of the created column.
Remove included columns: If selected, the included numeric columns or the picked string column will be removed from the output.
Missing Value Handling: Specify how to treat a missing value in one of the input columns. If Fail is selected, the node will fail. If Ignore is selected, the node just gives a warning and puts missing values in the output for the corresponding rows. If Treat as zero is selected, the missing value will be treated as 0.

Input Ports

: Data with columns containing probability values or a column containing string values.

Output Ports

: Input data with an appended column that contains the nominal probability distribution.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

05_Weak_Supervision_on_the_Adult_datasetKNIME Hub

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202506181431

On NodePit since: 2025-07-02

Last update: 2025-07-17

Tags: Streamable

KNIME versions: Since v4.1

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!