Spark Scorer

Compares two columns by their attribute value pairs and shows the confusion matrix, i.e. how many rows of which attribute and their classification match. The dialog allows you to select two columns for comparison; the values from the first selected column are represented in the confusion matrix's rows and the values from the second column by the confusion matrix's columns. The output of the node is the confusion matrix with the number of matches in each cell. Additionally, the second out-port reports a number of accuracy statistics such as True-Positives, False-Positives, True-Negatives, False-Negatives, Recall, Precision, Sensitivity, Specificity, F-measure, as well as the overall accuracy and Cohen's kappa.


First column
The first column represents the real classes of the data.
Second column
The second column represents the predicted classes of the data.
Sorting strategy
Whether to sort the labels lexically or numerically.
Reverse order
Reverse the order of the elements.
Use name prefix
The scores (i.e. accuracy, error rate, number of correct and wrong classification) are exported as flow variables with a hard coded name. This option allows you to define a prefix for these variable identifiers so that name conflicts are resolved.

Input Ports

Arbitrary input Spark DataFrame/RDD with at least two columns to compare.

Output Ports

The confusion matrix.
The accuracy statistics table.


Confusion Matrix
Displays the confusion matrix in a table view.




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.