Missing Value

This node helps handle missing values found in cells of the input table. The first tab in the dialog (labeled "Default") provides default handling options for all columns of a given type. These settings apply to all columns in the input table that are not explicitly mentioned in the second tab, labeled "Individual". This second tab permits individual settings for each available column (thus, overriding the default). To make use of this second approach, select a column or a list of columns which needs extra handling, click "Add", and set the parameters. Click on the label with the column name(s), will select all covered columns in the column list. To remove this extra handling (and instead use the default handling), click the "Remove" button for this column.
Options marked with an asterisk (*) will result in non-standard PMML, which uses extensions that cannot be read by other tools than KNIME.

Options

Missing Value Handler Selection
Select and configure the missing value handler to be used for data types or columns. Handlers that do not produce valid PMML 4.2 are marked with an asterisk (*).

Average Interpolation*
This missing value handler replaces missing values with the average value of the previous and next encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoids flooding of the main memory. This should be used with caution, as it is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Fix Value (Double)
Replaces missing values with a double given by the user. This missing value handler produces valid PMML 4.2.

Fix Value (Integer)
Replaces missing values with an integer number given by the user. This missing value handler produces valid PMML 4.2.

Fix Value (String)
Replaces missing values with a string given by the user. This missing value handler produces valid PMML 4.2.

Fix Value (Long)
Replaces missing values with a long given by the user. This missing value handler produces valid PMML 4.2.

Fix Value
No description provided.

Linear Interpolation*
This missing value handler replaces missing values with the linear interpolation between the previous and next encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoids flooding of the main memory. This should be used with caution, as it is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Maximum
Finds the column's largest value and replaces all missing values with it. This missing value handler produces valid PMML 4.2.

Mean
Calculates the mean value of all non-missing cells in a column and replaces the missing values with this mean. This missing value handler produces valid PMML 4.2.

Median
Finds the column's median value and replaces all missing values with it. For large tables this might be computationally expensive because the table needs to be sorted to find the median. This missing value handler produces valid PMML 4.2.

Minimum
Finds the column's smallest value and replaces all missing values with it. This missing value handler produces valid PMML 4.2.

Most Frequent Value
Calculates the most frequent value in a column and replaces the missing values with it. This missing value handler produces valid PMML 4.2.

Moving Average*
Calculates the mean of all values that are within the window given by the lookahead and lookbehind and replaces missing values with this mean. This missing value handler does not produce standard PMML 4.2! The number of cells to take into account before and after the current cell can be set using the options lookbehind and lookahead respectively.

Next*
This missing value handler replaces missing values with the next encountered non-missing value in the column it is configured for. When dealing with tables that have a large number of rows but not too many columns that need missing value replacement, the option to use disk backed statistics avoids flooding of the main memory. This should be used with caution, as it is generally much slower than in-memory statistics. This missing value handler does not produce standard PMML 4.2!

Previous*
This missing value handler replaces missing values with the last encountered non-missing value in the column it is configured for. This missing value handler does not produce standard PMML 4.2!

Remove Row*
This missing value handler removes rows that have a missing value in the column it is configured for. This missing value handler does not produce standard PMML 4.2!

Rounded Mean
Calculates the mean value of all non-missing cells in a column and replaces the missing values with this mean. This missing value handler produces valid PMML 4.2.

Input Ports

Icon
Table with missing values

Output Ports

Icon
Table with replaced missing values
Icon
Table with PMML documenting the missing value replacement

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.