Hierarchical Anonymization

Node for anonymizing sensitive personal data. The underlying tools used by the node is based on ARX Data Anonymization Tool

Options

Columns

Type

Attribute type. Possible options:

0 | Identifying
1 | Quasi-identifying
2 | Sensitive
3 | Insensitive

(Either index or name could be used in flow variables)

File

Hierarchy file (*.ahs).

Mode

Transformation mode options:

0 | Generalization
1 | Microaggregation
2 | Clustering and microaggregation

(Either index or name could be used in flow variables)

Weight

Attribute weight. Value in range [0.0, 1.0]. Default is 0.5. The attributes with lesser weights will be anonymized more and vice versa for attributes with higher weights.

Minimum

Minimum fixed generalization level.

Maximum

Maximum fixed generalization level.

Function

Attribute processing function. Possible options:

0 | Arithmetic mean
1 | Geometric mean
2 | Median
3 | Interval
4 | Mode

(Either index or name could be used in flow variables)

Ignore Missing Data

Defines if the generalization function ignores missing data or not.

Anonymization Config

Number of threads

Number of partitions (threads). Input data will be split into a number of partitions to run in different threads simultaneously. Might decrease the time of data anonymization, but lead affect the quality of anonymization.

Partition by column

Partition table by specified column. When unchecked - table will be partitioned into Number of threads parts of equal size. For string columns the table will be partitioned by distinct values for this column, an error will be raised in case there are more distinct values than specified Number of threads. For decimal and Date&Time columns - range of possible values will be split into Number of threads of equal length intervals.

Suppression limit

Define the suppression limit, which is the maximal number of records that can be removed from the input dataset (in fraction). Value between 0.0 and 1.0.

Approximate: assume practical monotonicity

The option "Approximate" can be enabled to compute an approximate solution with potentially significantly reduce execution times. The solution is guaranteed to fulfill the given privacy settings, but it might not be optimal regarding the data utility model specified.

Re-identification Risk Threshold

Thresholds for the highest risk of any record. Used for measuring re-identification risks for three different attacker models: (1) the prosecutor scenario, (2) the journalist scenario and (3) the marketer scenario.

Add Class column to output table

Option for including additional column representing equivalence class - a set of records which are indistinguishable regarding the specified quasi-identifying variables.

Omit rows with missing cells

Exclude rows with 'missing cells' from the input table. Throw an error if table contains missing cell when option is disabled.

Omit identifying columns

Exclude 'identifying' columns from the result table.

Heuristic Search Enabled

Defines whether a heuristic search strategy is used.

Limited number of steps

The heuristic search algorithm will terminate after the given number of transformations have been checked.

Limited time [ms]

The heuristic search algorithm will terminate after the given number of milliseconds.

Utility measure

The model for quantifying data quality which will be used as an optimization function during the anonymization process.

Measure

Possible options:

0 | Average equivalence class size
1 | Discernability
2 | Height
3 | Loss
4 | Non-uniform entropy
5 | Precision
6 | Ambiguity
7 | Normalized non-uniform entropy
8 | KL-Divergence
9 | Publisher payout (prosecutor)
10| Publisher payout (journalist)
11| Entropy-based information loss
12| Classification accuracy

(Either index or name could be used in flow variables)

Generalization/Suppression Factor

Value between 0 (generalization) and 1 (suppression) specifying whether generalization or suppression should be preferred when transforming data.

Enable precomputation

Precomputation is switched on when, for each quasi-identifier, the number of distinct data values divided by the total number of records in the dataset is lower than the configured Precomputation threshold.

Precomputation threshold

Value between 0.0 and 1.0.

Aggregate Function

Aggregation function will be used to compile the estimates obtained for the individual attributes of a dataset into a global value. Possible options:

0 | SUM
1 | MAX
2 | ARITHMETIC_MEAN
3 | GEOMETRIC_MEAN
4 | RANK

(Either index or name could be used in flow variables)

Population

Population model is used by K-Map privacy model and for estimating re-identification risks. Note: Privacy models based on population uniqueness assume that the dataset is a uniform sample of the population. If this is not the case, results may be inaccurate.

Region

One of the regions with predefined population size.

Population size

Population size could be entered manually.

Privacy Models

Privacy Models: Configure privacy models. Refer documentation for details.

Research sample

None

Do not specify research sample.

All

Use entire input table as sample subset.

Random selection

Selecting records by random sampling.

Probability

Random sampling probability. Value between 0.0 and 1.0.

Query selection

Selecting records by querying the dataset

Query

The query syntax is as follows: fields and constants must be enclosed in single quotes. The following operators are supported: >, >=, <, <=, =, or, and, ( and ). Example:

'age'<'40' and 'gender'='M'

mode

Flow variable holding sample selection mode. Possible values:

0 | NONE
1 | ALL
2 | RANDOM
3 | QUERY

Input Ports

: Input data table
: Hierarchy Configuration

Output Ports

: Result table with anonymized data
: Statistics table
: Suppressed records
: Attribute Risks
: Statistics converted to flow variables. If partitioning is enabled only first row of statistic table is used

Popular Predecessors

Popular Successors

Views

Interactive View: Transformation View (JS): Select transformation from available options.

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Redfield Privacy Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

Plugin provider: Redfield AB

Plugin version: 0.3.26

On NodePit since: 2025-07-02

Last update: 2025-07-06

KNIME versions: v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

Hierarchical Anonymization

Options

Columns

Anonymization Config

Privacy Models

Research sample

Input Ports

Output Ports

Popular Predecessors

Popular Successors

Views

Workflows

Links

Developers

Installation