Creates a column containing a probability distribution either from
numeric columns or a single string column.
In case of numeric columns, one or more columns that contain
probability values can be picked. The probability values must be
non-negative and must sum up to 1.

In case of a string column, one single column can be selected. The
probability distribution of the string column produces a one-hot
encoding of the string column. In order to do this, the column must
have a valid domain, i.e., the possible values of the column must be known.
You can use a *Domain Calculator* to calculate these values if they
are not present. Each of the possible values will be treated as a separate class,
i.e., the number of distinct values in the string column will be the
number of classes in the created probability distribution. The
string value of a cell will have a probability of 1 whereby all the
other possible string values of the column will have a probability of 0. The
same output can be achieved by creating a probability distribution of the numeric
output columns of the *One to Many* node applied to the same string
column.

- Numeric Column Selection
- Move the columns that contain the probability values to the "Include" list.
- Allow probabilities that sum up to 1 imprecisely
- If enabled, the probabilities must not sum up to 1 precisely. This might be helpful if there are, e.g., some rounding errors in the probability values. A number of decimal digits can be specified that defines the precision as explained below.
- Precision (number of decimal digits)
- Defines the precision that the sum of the
probabilities must have by
restricting the number of decimal digits that must be precise.
The
sum
is accepted if
*abs(sum - 1) <= 10^(-precision)*, e.g., if the sum is 0.999, it is only accepted with a precision of <=2. The lower the specified number, the higher is the tolerance. - Invalid Probability Distribution Handling
- Specify how to treat invalid probabilities.
Invalid means, e.g.,
negative probabilities or probabilities that do
not sum up to 1 (with
respect to the specified precision). If
*Fail*is selected, the node will fail. Otherwise, the node just gives a warning and puts missing values in the output for the corresponding rows.

- String Column Selection
- A single string column can be picked from the dropdown menu.

- Output column name
- Specify the name of the created column.
- Remove included columns
- If selected, the included numeric columns or the picked string column will be removed from the output.
- Missing Value Handling
- Specify how to treat a missing value in one of the input columns. If
*Fail*is selected, the node will fail. If*Ignore*is selected, the node just gives a warning and puts missing values in the output for the corresponding rows. If*Treat as zero*is selected, the missing value will be treated as 0.

- This node has no views

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:

v4.7

A zipped version of the software site can be downloaded here.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**