Create Hierarchy

Node for creating generalization hierarchies. There are four different types of hierarchies:

  • Date-based;
  • Interval-based;
  • Order-based;
  • Mask-based;
Selection of hierarchy type depends on data types of attributes.

Options

Masking-based hierarchies

Masking is a flexible mechanism that can be applied to many types of attributes and is especially suitable for alphanumeric codes, such as ZIP codes.

Alignment
Alignment direction - left or right side of the string.
Masking
Masking direction - starts masking from left or right side of the string.
Padding character
All values are adjusted to the same length by adding padding characters.
Masking character
A character for masking.

Interval-based hierarchies

Interval is a regular method of generalization for values with a ratio scale, such as integers or decimals. First, the sequence of intervals can be defined on the left side of the view. At the next step subsequent levels consisting of groups of intervals from the previous level are specified. Each group combines a given number of elements from the previous level. Any sequence of intervals or groups is automatically repeated to cover the complete range of the attribute.

Aggregate Function
In order to create labels for intervals, each element must be associated with an aggregation function. The following aggregation functions are supported:
  • Set: a set representation of input values is returned.
  • Prefix: a set of prefixes of the input values is returned. There is a parameter that allows defining the length of these prefixes.
  • Common-prefix: returns the biggest common prefix.
  • Bounds: returns the first and the last elements of the set.
  • Interval: an interval between the minimum and maximum values is returned.
  • Constant: returns a pre-defined constant value.
Range
Interval-based hierarchies might define ranges of the bins. Any value in input table out of the range defined by "minimum value" or "maximum value" will produce an error message. This can be used to implement sanity checks. Any value between the minimum or maximum values and the "bottom coding" or "top coding" values will be top- or bottom-coded. If values fall into an interval stretching from the bottom coding or top coding limit to the "snap" limit, it will be extended to the bottom or top coding limit. Within the remaining range intervals will be repeated.
Interval
Intervals are defined by a minimum (inclusive) and maximum (exclusive) bound.
Group
Groups are defined by their size. Bins at the previous levels are created automatically to fit the number of groups at the next levels.

Order-based hierarchies

Order-based hierarchies follow a similar idea as interval-based hierarchies, but they can be applied to attributes with ordinal scale. In addition to the types of attributes covered by interval-based hierarchies this includes strings, using their lexicographical order, and ordinals.

Order
First of all attributes within the domain should be ordered as defined by the user or the data type. After that ordered values can be grouped using a mechanism similar to the one used for interval-based hierarchies. The mechanism can be used for creating semantic hierarchies from a pre-defined meaningful ordering of the domain of a discrete variable.

Date-based hierarchies

Date-based hierarchies are used by specifying the granularity of output data at increasing generalization levels.

Bottom coding from
Any value below this date will be bottom-coded.
Top condig from
Any value above this date will be top-coded.
Granularity
Granularity of output data. Please note that it is important to specify granularity levels that form an hierarchy (e.g. day-of-week can typically not be followed by week-of-year, because the same day-of-week can be generalized to different weeks of a year). When this constraint is violated an error message will be thrown during the anonymization process in Hierarchical Anonymization node.
Format
Format pattern. Please refer to SimpleDateFormat documentation for available options.

Input Ports

Icon
Input table
Icon
Hierarchy Configuration

Output Ports

Icon
Input data table unchanged
Icon
Hierarchy preview
Icon
Hierarchy Configuration

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.