PII Masking Processor

The PII Masking Processor automatically detects and marks Personally Identifiable Information (PII) in data.

Below are the default types of PII that will be identified, along with examples:

Default Masked PII Types

  • Email: john.doe@example.com*****@*****.com

  • US Phone (Formatted): (123) 456-7890(***) ***-****

  • UK Phone Number: +44 20 7946 0958+44 ** **** ****

  • URL: https://www.example.com/pathhttps://*****.com/****

  • Hostname: server.example.com*****.example.com

  • Street Address: 123 Main St, Springfield, IL*** Main St, *******, **

  • Zip Code: 60601*****

  • IPv4: 192.168.1.1***.***.*.*

  • IPv6: 2001:db8::ff00:42:8329****:****::****:****:****

  • SSN (Spaces): 123 45 6789*** ** ****

  • SSN (Dashes): 123-45-6789***-**-****

By default, these PII types will be masked to ensure data privacy and security.


Enable the PII Masking functionality under the Schema Type tab in any data source.

Enable masking on source component

Also, a PII Masking processor will be automatically added in the pipeline flow on the canvas. The processor will have details of the columns selected for PII Masking from the incoming data of the source file.

Notes:

  • The columns that have been detected as PII Masked will appear highlighted in blue color under the detected current schema tab.

  • Option to enable/disable PII Masking is available under (each PII Masking column) gear icon.

  • Supported file formats for PII Masking are CSV, Parquet, JSON, Avro, and XML.


PII Masking Processor Configuration

Under the Select Output field, the columns that have been enabled for PII Masking in the schema will be available in the drop-down list.

Select Output Field and provide character under the Add Masking Character column to mask the details of the schema. The Mask Type options are mentioned below:

FieldDescription
AllSelects all the characters for masking.
Alternate CharacterSelects alternative character for masking.
Head Characters

Select characters from the beginning of the data in the selected column for masking.

User needs to provide the number of characters that needs to be masked from the beginning.

Trailing CharactersSelect characters from the end of the string (right most part of the string) of the data in the selected column for masking. User needs to provide the number of characters that needs to be masked from end of the string.
Top