Nominal Probability Distribution Creator

Creates a column containing a probability distribution either from numeric columns or a single string column. In case of numeric columns, one or more columns that contain probability values can be picked. The probability values must be non-negative and must sum up to 1.
In case of a string column, one single column can be selected. The probability distribution of the string column produces a one-hot encoding of the string column. In order to do this, the column must have a valid domain, i.e., the possible values of the column must be known. You can use a Domain Calculator to calculate these values if they are not present. Each of the possible values will be treated as a separate class, i.e., the number of distinct values in the string column will be the number of classes in the created probability distribution. The string value of a cell will have a probability of 1 whereby all the other possible string values of the column will have a probability of 0. The same output can be achieved by creating a probability distribution of the numeric output columns of the One to Many node applied to the same string column.

Options

Numeric Columns

Numeric Column Selection
Move the columns that contain the probability values to the "Include" list.
Allow probabilities that sum up to 1 imprecisely
If enabled, the probabilities must not sum up to 1 precisely. This might be helpful if there are, e.g., some rounding errors in the probability values. A number of decimal digits can be specified that defines the precision as explained below.
Precision (number of decimal digits)
Defines the precision that the sum of the probabilities must have by restricting the number of decimal digits that must be precise. The sum is accepted if abs(sum - 1) <= 10^(-precision) , e.g., if the sum is 0.999, it is only accepted with a precision of <=2. The lower the specified number, the higher is the tolerance.
Invalid Probability Distribution Handling
Specify how to treat invalid probabilities. Invalid means, e.g., negative probabilities or probabilities that do not sum up to 1 (with respect to the specified precision). If Fail is selected, the node will fail. Otherwise, the node just gives a warning and puts missing values in the output for the corresponding rows.

String Columns

String Column Selection
A single string column can be picked from the dropdown menu.

General

Output column name
Specify the name of the created column.
Remove included columns
If selected, the included numeric columns or the picked string column will be removed from the output.
Missing Value Handling
Specify how to treat a missing value in one of the input columns. If Fail is selected, the node will fail. If Ignore is selected, the node just gives a warning and puts missing values in the output for the corresponding rows. If Treat as zero is selected, the missing value will be treated as 0.

Input Ports

Icon
Data with columns containing probability values or a column containing string values.

Output Ports

Icon
Input data with an appended column that contains the nominal probability distribution.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.