Spark H2O MOJO Predictor (Isolation Forest)

This node applies an Isolation Forest MOJO to an incoming Spark DataFrame/RDD in order to detect anomalies/outliers. The output of the node will consist of the input and, depending on the settings, one or two appended columns. One is the prediction which contains normalized anomaly score. The higher the score, the more likely it is an anomaly. The other (optionally) appended column contains the mean length of the predicted decision tree paths of each observation. The shorter, the more likely it is an anomaly.

Options

General Settings

Enforce presence of all feature columns
If checked, the node will fail if any of the feature columns used for learning the MOJO is missing. Otherwise, a warning will be displayed and the missing columns are treated as NA by the MOJO predictor.
Fail if a prediction exception occurs
If checked, the node will fail if the prediction of a row fails. Otherwise, a missing value will be the output.
Treat unknown categorical values as missing values
By default, H2O does not handle the case that a categorical feature column contains a value that was not present during model training. If this option is enabled, H2O will convert these values to NA, i.e. treat them as missing values. If this option is disabled, the node will either fail or missing values will be in the output depending on the setting "Fail if a prediction exception occurs".

Anomaly Detection Settings

Prediction column name
Change the name of the prediction column.
Append column containing mean length
Select to append an extra column that contains the mean length of the predicted decision tree paths for each observation.
Mean length column name
Change the name of the created column that contains the mean length.

Spark Settings

Upload MOJO dependency
If checked, the MOJO dependency (genmodel jar file) will be uploaded to the cluster. Otherwise depend on cluster side provided dependency.

Input Ports

Icon
The MOJO. Its model category must be anomaly detection.
Icon
Spark DataFrame/RDD for prediction. Missing values will be treated as NA .

Output Ports

Icon
Spark DataFrame/RDD containing the predicted (normalized) anomaly score and, if selected, the mean length.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.