Rule Engine (Dictionary)

Applies rules from a rules table to a data table. The rules follow the Rule Engine rules, though for PMML RuleSets stricter rules apply (no column reference in the outcome, cannot use regular expressions, 3-valued logic). If no rules match, the default value specified in the PMML tab is used, or missing when no default value was specified.
It takes a list of user-defined rules from the second input port (from the selected column(s)) and tries to match them to each row in the input table. If a rule matches, its outcome value is added into a new column. The first matching rule in order of definition determines the outcome.

Each rule is represented by a row, new line characters are replaced by spaces, even in string constants. To add comments, start a line in a (condition) cell with // (comments can not be placed in the same line as a rule). Anything after // will not be interpreted as a rule. Rules consist of a condition part (antecedent), which must evaluate to true or false, and an outcome (consequent, after the => symbol) which is put into the new column if the rule matches.

The outcome of a rule may be any of the following: a string (between quotes " or /), a number, a boolean constant, a reference to another column or the value of a flow variable value. The type of the outcome column is the common super type of all possible outcomes (including the rules that can never match). If no rule matches, the outcome is a missing value unless a default value is specified.

Columns are given by their name surrounded by $, numbers are given in the usual decimal representation. Note that strings must not contain (double-) quotes. Flow variables are represented by $${TypeCharacterAndFlowVarName}$$. (Column references are not supported for PMML outputs.) The TypeCharacter should be 'D' for double (real) values, 'I' for integer values and 'S' for strings.

The logical expressions can be grouped with parentheses. The precedence rules for them are the following: NOT binds most, AND , XOR and finally OR the least. Comparison operators always take precedence over logical connectives. All operators (and their names) are case-sensitive.

The ROWID represents the row key string, the ROWINDEX is the index of the row (first row has 0 value), while ROWCOUNT stands for the number of rows in the table. (These are not available for PMML.)

Some example rules (each should be in one row):

// This is a comment
$Col0$ > 0 => "Positive"
When the values in Col0 are greater than 0, we assign Positive to the result column value (if no previous rule matched).
$Col0$ = "Active" AND $Col1$ <= 5 => "Outlier"
You can combine conditions.
$Col0$ LIKE "Market Street*" AND 
    ($Col1$ IN ("married", "divorced") 
        OR $Col2$ > 40) => "Strange"
$Col0$ MATCHES $${SFlowVar0}$$ OR $$ROWINDEX$$ < $${IFlowVar1}$$ =>
    $Col0$
With parentheses you can combine multiple conditions. The result in the second case comes from one of the columns.
$Col0$ > 5 => $${SCol1}$$
The result can also come from a flow variable.

The following comparisons result true (other values are neither less, nor greater or equal to missing and NaN values):

  • ? =,<=,>= ?
  • NaN =,<=,>= NaN

Options

Settings

Rules column
Name of the column in the second input table containing the rules or just the conditions if an outcome column is also selected. In case the rules are available during configuration, the type of the outcome column is shown at the right end of this configuration row.
=>
In case your rules are in two separate columns (condition and outcome), this should be the column containing the outcome values. (Ignored for the comment -starting with // - conditions. In case it contains missing value for non-comment conditions, the output type will be String!) For String columns the outcome should not be quoted (by " or by / )
Treat values starting with $ as references
When checked, the values in the string -outcome- column starting with $ are not treated as string constants with that value, but are tried to be parsed as references to flow variables, columns or table properties.
Append column
Name of the newly appended column, which contains the outcome of the rules.
Replace column
The column to replace
Errors
The errors found in the input table parsing the rules column
Warnings
The warnings found in the input table parsing the rules column

PMML

Enable PMML RuleSet generation
When checked, PMML mode evaluation is used and fails if the input cannot be translated to PMML.
Hit selection
Possible values:
  • First hit - the outcome of the first matching rule will be used
  • Maximal matching weighted sum - select all matching rules, sum the weight for all outcomes, select the highest sum's outcome
  • Highest matching weight - select from all matching rules the highest weight (regardless of order) and use its outcome
Default value
The default value (default score) to be applied when no rules matched.
Default confidence value
The default confidence of the rule when it is not specified by the confidence column in the rule table.
Rule confidence column
Specifies confidence of the rules based on the values in the selected column.
Default weight value
The default rule weight to be used when it is not specified by the weight column.
Rule weight column
Specifies weight values for the rules based on the values in the selected column.
Confidence column name
Computes confidence values for the output table.
Provide statistics
When checked, recordCount (and if Validation column is selected, also nbCorrect ) is computed for the PMML output.
Validation column
The column which is containing the correct prediction for the input (test/validation) table. (When <none> selected, no nbCorrect value will be computed.)

Input Ports

Icon
Input data
Icon
Rules to apply

Output Ports

Icon
Table containing the computed column
Icon
Possibly missing PMML port containing the rules in PMML RuleSet format

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.