Presidio Analyzer

This node implements Presidio's Analyzer, which allows to detect Personal Identifiable Information (PII) in English text data.

The node analyzes the data of a specified string column of the input table for specified PII entity types. It adds the detected entities to the input table by appending the following columns:

  • Entity Type: the entity type of the detected PII entity
  • Entity: the piece of text that is recognized as PII
  • Start: the index of the first character of the entity in the text
  • End: the index of the last character of the entity in the text
  • Score: the certainty of Presidio for the detection
  • Row: the original row ID from the input table

Rows with multiple entities will be ungrouped so that each row contains one entity.

Further information on the Presidio Analyzer can be found on the Microsoft Presidio website.

Warning: Presidio can help identify sensitive/PII data in un/structured text. However, because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information. Therefore, always evaluate the quality of detections and take appropriate measures if necessary.

Options

Input column

Select the string column that contains the data for PII detection.

Entity types

Select the PII entity types that will be detected.

Available options:

  • Credit card: recognizes credit card numbers
  • Crypto wallet: recognizes crypto wallet numbers
  • Date & time: recognizes absolute or relative dates or periods or times smaller than a day
  • Email address: recognizes email addresses
  • IBAN code: recognizes IBAN codes
  • IP address: recognizes IP addresses
  • Location: recognizes names of locations
  • Medical license: recognizes common medical license numbers
  • Nationality, Religion, Political Orientation (NRP): recognizes nationality, religion, and political group affiliation of a person
  • Person: recognizes full person names
  • Phone number: recognizes telephone numbers
  • URL: recognizes URLs

Input Ports

Icon

The input table containing a string column.

Output Ports

Icon

The output table containing six additional columns for the detection results.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.