Skip to main content

PII Detection

A guide to understanding strategies to detect when PII is shared with 3rd parties.

Luiza Gircoveanu avatar
Written by Luiza Gircoveanu
Updated this week

Overview

PII (Personally Identifiable Information) includes any information that can be used to identify an individual, either on its own or in combination with other data. (full name, personal identification code, passport number, email address, etc.).

Various laws and regulations protect PII to prevent misuse and ensure privacy compliance.

It is not uncommon for IP Addresses, geolocation, phone numbers, and other sensitive data to be passed to analytics platforms, ad platforms, and others through tags and variables. For example, data may be displayed in the URL and the analytics captures the whole URL to track which pages were visited.

ObservePoint can identify if some types of PII are shared with 3rd parties because we capture all network requests sent from from a users browser.

Strategy

To strategically approach the problem of detecting technologies that are collecting PII, we need to understand when, and on which pages, your website may have access to sensitive data.

There are 2 common sources for sensitive data sharing in 3rd party tags or network requests that we need to consider:

  • website form interactions

  • authenticated users interactions

Website Forms

With website forms, it's pretty obvious why this can be a source of unauthorized PII collection. Forms prompt website visitors to provide information to a company and upon completion of different form milestones, the data is sent through a network request. Oftentimes one of the destinations for that data is a platform that marketing users to target and personalize their messaging to users.

ObservePoint can help you identify form inputs that request sensitive information and identify which technologies reference the form inputs upon submission.

Authenticated Users

As authenticated (logged in) users interact with a website, the website uses their authenticated state to personalize their experience and grant them access to information is stored in their database. In industries like financial services, healthcare, e-commerce, and others, user data needs to be referenced to provide value to customers.

The challenge here is that this data is visible to users and therefore able to be referenced by 3rd party JavaScript and HTTP requests.

If you can provide credentials to ObservePoint, we can authenticate and crawl pages and report on sensitive information passed to URL query parameters, Tags, and other network requests.

Implementation

Audits

Creating a basic Audit will allow you to test for unauthorized data captured by anonymous users, but there are only two independently identifiable forms of data that can be shared in this instance, geolocation (latitude and longitude coordinates) and IP Address.

This is already a great insight because your website probably has many anonymous visitors, but in the event that you acquire test credentials, we recommend creating authenticated Audits because, when a user is logged in, technologies will have much more access to sensitive information.

Then you can take the Audit results and analyze URL, tag variable, and other network request data in our reporting.

Reports

Users can create a custom Tag Variable report and filter the Tag Variable Value column using several operators including contains, equals, and a regular expression.

This allows you to target specific values representing your authenticated test user in the reporting e.g. first name, last name, social security number, phone number, and other PII. It also allows you to search for IP Address and geolocation patterns.

To help you get started, we have created a report in our template gallery called Tags Capturing IP Addresses and Geolocation that identifies all Tags that capture geolocation and IP Address data from ObservePoint traffic. This allows us to simulate users and determine which technologies are referencing and sending this sensitive data over HTTP requests, potentially to unauthorized 3rd parties.


This report template can be expanded to filter for either specific text like a test user's first name, last name, phone number, or social security number or more generic PII patterns.

We recommend you use authenticated user data collected in the data layer to inform what filters and regular expressions you might apply as filters.

Example PII Pattern Regular Expressions:

  • Email: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}

  • Phone (US): (?:\+1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?)\d{3}[-.\s]?\d{4}

  • SSN (US): \b\d{3}-?\d{2}-?\d{4}\b

  • Driver’s License (generic alphanumeric 6–12 chars): \b[A-Z0-9]{6,12}\b

  • Street Address (simple pattern): \b\d{1,5}\s+[A-Za-z0-9.\s]+(?:Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Lane|Ln|Drive|Dr)\b

  • ICD-10 Code: \b[A-TV-Z][0-9][A-Z0-9](?:\.[A-Z0-9]{1,4})?\b

  • Medical Record Number (6–10 digits): \b\d{6,10}\b

  • Health Insurance Claim Number (HICN): \b[A-Za-z0-9]{1,11}\b

  • NPI (National Provider Identifier): \b[1-9]\d{9}\b

  • Credit Card (13–16 digits, flexible separators): \b(?:\d[ -]*?){13,16}\b

Note: PII regular expression patterns can match other data passed in Tag variables which is why it's important to analyze the results and refine the regular expression to ensure you don't get any false positives.

We strongly recommend you use an LLM to build and test regular expressions against real data sets.

Journeys

We recommend creating Journeys for any forms that ask for sensitive information.

These forms are one of the most likely sources to share sensitive information with third parties upon submission. Since Tag and variable from Journeys is not yet available in our new account wide reporting, you'll need to apply Tag and Variable Rules on form actions with conditions for PII values.

If you have additional questions about how to effectively leverage ObservePoint to detect personally identifying information being shared with 3rd parties, contact your success manager or our support team.

Did this answer your question?