Overview
This check scans your Google Analytics network requests to ensure they do not contain sensitive data such as email addresses, phone numbers, or health-related identifiers. Google’s terms of service strictly prohibit the collection of PII (e.g., names, SSNs, physical addresses) and PHI (e.g., medical record numbers, specific diagnoses).
Accidental collection often occurs when a website includes this information in URL query parameters, which are then automatically captured by Google Analytics "page_view" events.
Why it is important
Collecting PII or PHI is one of the most serious risks to an analytics implementation:
Account Termination: Google reserves the right to terminate an entire Google Analytics property—and all its historical data—if it detects the collection of PII.
Legal and Financial Liability: Exposure of PII/PHI can lead to massive fines under regulations like GDPR (Europe), CCPA (California), and HIPAA (United States).
Data Contamination: Once PII is recorded in Google Analytics, it cannot be easily "deleted" from specific reports without deleting large chunks of your data history.
Brand Reputation: Privacy breaches erode user trust and can lead to significant PR damage.
Implementation
The most effective way to identify leaks is to analyze the specific data being sent in your tags using Tag-Variable Reports combined with Regex filters.
Run a Comprehensive Audit: Configure an audit to crawl your site, with a specific focus on high-risk areas like checkout flows, registration forms, and user-profile pages.
Utilize Tag-Variable Reports: Instead of looking at the tags in isolation, use Tag-Variable reports to aggregate every value sent to Google Analytics across your entire site. This provides a bird’s-eye view of all keys (e.g.,
ep.user_email,dl,dt) and their associated values.Apply Regex Filters: To catch leaks, apply Regular Expression (Regex) filters to the variable values in your reports. Use specific patterns defined by your legal and compliance teams to flag sensitive data. Common examples include:
Email Addresses:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$Credit Card Patterns (Luhn-adjacent):
\b(?:\d[ -]*?){13,16}\bUS Phone Numbers:
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
Remediation
If PII or PHI is discovered in your analytics hits, take the following steps to secure your data:
Cleanse the URL: Work with developers to move user data from query parameters (e.g.,
[email protected]) to thePOSTbody or an internal database.Redact in GTM: Use Custom JavaScript Variables or GTM's URL Variable with the "Strip Query Parameter" feature to remove sensitive keys before they reach the Google Analytics tag.
Update Google Analytics Data Redaction: Enable the built-in "Data Redaction" feature in the Google Analytics Admin panel (under Data Streams) to automatically mask email addresses and specific URL parameters.
Fix Form Method: Ensure that website forms use the
POSTmethod instead ofGET. TheGETmethod appends form data directly to the URL, making it visible to Google Analytics.Audit "Thank You" Pages: Ensure that confirmation pages do not display a user's full name or address in the URL structure.
Conclusion
Protecting user privacy is a foundational responsibility of any data analyst. PII and PHI leaks are often unintentional, but the consequences—from account deletion to legal action—are severe. By using ObservePoint to proactively monitor your Google Analytics Tags for sensitive data, you can maintain compliance and ensure your analytics property remains a safe and reliable business asset.
