Presidio Anonymizer

This node implements Presidio's Anonymizer, which allows to anonymize English text data. It uses pseudonymization, which makes it possible to reinsert the personal information into the anonymized data with the Presidio Deanonymizer node.

The node anonymizes the data of a specified string column of the input table by replacing all occurrences of the selected PII entity types with abstract placeholders. If it is possible for the selected types, the information can be replaced with randomly generated information of the same type. You can choose whether the anonymized data replaces the original data or is appended in a new column.

Per default, this node detects the PII entities before anonymizing them. Since Presidio may mistakenly detect words as PII, it is possible to connect a table that has the output columns of the Presidio Analyzer node to the dynamic port. The Presidio Anonymizer will then only anonymize the entities stored in that table.

Warning: Presidio can help identify sensitive/PII data in un/structured text. However, because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information. Therefore, always evaluate the quality of detections and take appropriate measures if necessary.

Options

Data

Input column

Select the string column that contains the data for PII anonymization.

Entity type column

Select the column that contains the types of the entities that will be anonymized.

Start column

Select the column that contains the index of the first character of each entity.

End column

Select the column that contains the index of the last character of each entity.

Score column

Select the column that contains the certainty of Presidio for the detection.

Row column

Select the column that contains the row of the original table in which the PII entity was detected.

PII Anonymization

Entity types

Select the PII entity types that will be anonymized.

Available options:

  • Credit card: recognizes credit card numbers
  • Crypto wallet: recognizes crypto wallet numbers
  • Date & time: recognizes absolute or relative dates or periods or times smaller than a day
  • Email address: recognizes email addresses
  • IBAN code: recognizes IBAN codes
  • IP address: recognizes IP addresses
  • Location: recognizes names of locations
  • Medical license: recognizes common medical license numbers
  • Nationality, Religion, Political Orientation (NRP): recognizes nationality, religion, and political group affiliation of a person
  • Person: recognizes full person names
  • Phone number: recognizes telephone numbers
  • URL: recognizes URLs
Anonymization mode

Select whether the anonymizer should use abstract placeholders or randomly generated information to replace PII entities.

Available options:

  • Abstract: replaces PII entities with an abstract identifier
  • Random: replaces PII entities with random information of the same entity type
Random seed

Provide the random seed used to generate replacement values.

Output

Output column

Select whether the anonymized data should replace the original data or be appended to the table in a new column.

Available options:

  • Replace: replaces original data with the resulting data
  • Append: places the resulting data in a new column
Output column name

Provide the name of the new column containing the anonymized data.

Input Ports

Icon

The input table containing a string column.

Icon

If a table with the output columns of the Presidio Analyzer is provided, only the PII entities in that table will be anonymized in the input table.

Output Ports

Icon

The output table containing the anonymized data.

Icon

A mapping between original PII entities and replacements.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.