Conformal Scorer

Compares predictions made by Conformal Classifier with actual values. There are two types of scoring:

  • Efficiency – the ratio of single label classification (right or wrong). Calculated as Single class predictions / Total
    See Additional prediction information option section for parameters description.
  • Validity - counts the fraction of correct predictions. If a record belongs to a mixed class containing the correct value it is considered to be correct. Calculated as Total_match/Total
    See Additional prediction information option section for parameters description.

Options

Target column
A column that contains the real classes of the data.
Classes
A column that contains predictions produced by Conformal Classifier. Could be both collection or string column type.
Additional prediction information
Includes additional columns with some prediction metrics.
  • Exact match – number of correct predictions that belong to one class, and not belong to any mixed class.
  • Soft match - number of correct predictions that belong to one of the mixed classes.
  • Total match – Exact_match + Soft_match.
  • Error – number of predictions that do not include the target class.
  • Total – total number of records that belongs to the current target class.
  • Single class predictions - number of predictions resulted in a single class
  • Null predictions - number of null predictions
Additional efficiency metrics
Adds additional columns with efficiency metrics. The metrics are taken from the paper "Criteria of efficiency for set-valued classification" by Vovk et al.
  • The S (“sum”) criterion measures efficiency by the average of the sum of p-values. Smaller values are preferable.
  • The N (“Number”) criterion uses the average size of the prediction sets. Smaller values are preferable.
  • The U (“unconfidence”) criterion uses the average unconfidence over the test sequence, where the unconfidence for a test object is the second largest p-value. Smaller values are preferable.
  • The F (“fuzziness”) criterion uses the average fuzziness where the fuzziness for a test object is defined as the sum of all p-values apart from a largest one. Smaller values are preferable.
  • The M (“multiple”) criterion uses the percentage of objects in the test sequence for which the prediction set at the given significance level is multiple, i.e., contains more than one label. Smaller values are preferable.
  • The E (“excess”) criterion uses the average (over the test sequence, as usual) amount the size of the prediction set exceeds 1. Smaller values are preferable.
  • The OU (“observed unconfidence”) criterion uses the average observed unconfidence over the test sequence, where the observed unconfidence for a test example is the largest p-value for the false labels. Smaller values are preferable for this test.
  • The OF (“observed fuzziness”) criterion uses the average sum of the p-values for the false labels, smaller values are preferable.
  • The OM (“observed multiple”) criterion uses the percentage of observed multiple predictions in the test sequence, where an observed multiple prediction is defined to be a prediction set including a false label. Smaller values are preferable.
  • The OE (“observed excess”) criterion uses the average number of false labels included in the prediction sets at the given significance level; smaller values are preferable.

Input Ports

Icon
Table with ranked predictions and classes.

Output Ports

Icon
The accuracy statistics table

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.