In the EU, digital analytics data is practically always personal data.
Even if you do not store names or email addresses, analytics systems rely on cookies, device identifiers, or IP-based signals. Under GDPR, that is enough to make the data personal.
So the compliance question is not whether you process personal data. You almost certainly do.
The real question is different:
- Are you sending too much, too detailed information to systems that do not need it?
- And are you unintentionally processing special categories of personal data?
In most organisations, data leakage in analytics is not deliberate. It results from default configurations, unreviewed website design decisions, and overly broad tag management setups.
These risks usually originate from three structural issues:
- Website design that exposes personal data in URLs and titles
- Careless tag management and DOM scraping
- Overly generous dataLayer implementations
Below are ten common leakage patterns observed in practice.
1. Personal data in URL parameters
The most frequent problem is personal data embedded directly in URLs. I have encountered this numerous times, in different industries.
Examples:
?name=John+Smith?email=john.smith@example.com?phone=0401234567
Most analytics tools collect full URLs by default. Once stored, these parameters spread to reports, dashboards, exports, and data warehouses.
If the URL contains personal data, your analytics system stores it. And not only your analytics tools: URLs are seen and logged by, for example, internet infrastructure, browser extensions, and advertising platforms.
2. Pseudonymous identifiers and click IDs in URLs
Even when explicit identifiers are avoided, URLs often contain:
- Customer IDs
- Contract or case numbers
- CRM lead IDs
gclid,fbclid,msclkid- HubSpot tracking parameters
These are pseudonymous identifiers. On their own, they may not directly identify someone. Combined with other systems, they often can.
Click IDs such as gclid or fbclid are meaningful only within their original advertising ecosystems. A gclid belongs in Google Ads. A fbclid belongs in Meta.
Outside their origin platform, they are usually just noise.
For example, GA4 cannot meaningfully use a fbclid. It simply stores it unless you explicitly filter it out. The same applies to many other analytics platforms.
Under GDPR, pseudonymous data remains personal data if it can be linked back to an individual using additional information.
If a parameter provides no analytical value in your platform, storing it increases risk without improving insight. On the contrary: storing it often makes analysis more difficult.
3. Special categories hidden in URLs
Some websites encode sensitive information directly in their URL structure:
/conditions/depression-treatment/support/hiv-testing/membership/political-party-x- URLs describing financial distress or addiction support
If these page views are tied to cookies or device identifiers, you may be processing special categories of personal data.
You may not think of this as “health data processing.”
But if an identifiable browser is consistently linked to specific sensitive content, the dataset becomes high-risk.
4. Personal data in page titles
Analytics platforms usually store page titles alongside URLs.
If titles include:
- Patient names
- Property addresses
- Case numbers
- Logged-in users’ names
…that information is transmitted and stored automatically.
Titles are rarely reviewed from a privacy perspective, yet they are part of the standard analytics payload.
5. Referrer URLs leaking information
Referrer fields can contain personal data from previous pages.
Examples:
- Internal search result pages with query parameters
- Redirect URLs containing identifiers
- Third-party sites embedding user information in links
Unless explicitly filtered, referrer URLs are collected and stored automatically.
You may therefore collect personal data that did not originate on the current page.
6. Internal site search queries
Users type highly personal information into internal search fields:
- Their own name
- Order numbers
- Email addresses
- Medical symptoms
- Financial problems
If raw search terms are captured without filtering, your analytics system accumulates unstructured personal data at scale.
In some cases, this may also include special category data.
7. Automatic capture of form values
Modern tag managers and some analytics scripts can automatically track:
- Form interactions
- Input field names
- Input field values
- Element attributes
If misconfigured, this may result in collecting:
- Email addresses
- Phone numbers
- Free-text messages
- Complaint descriptions
Often this happens because enhanced measurement or auto-event tracking was enabled without a detailed audit of what is actually being sent.
The result is excessive and unnecessary data collection.
8. Overexposing user data via the dataLayer
Developers sometimes push full user objects into the dataLayer for convenience:
- Name
- Phone number
- Internal identifiers
Once this data is present in the dataLayer, every tag on the page can access it. They probably don’t process the data, but they can access it.
Most of the time, these identifiers are hashed using SHA256. But, of course, they can still be used as keys to identify people.
Because of this analytics tools, advertising pixels, and marketing automation scripts may all receive more personal data than required.
This conflicts with GDPR principles of data minimisation and purpose limitation.
9. Integrations multiply exposure
Analytics tools are rarely isolated.
They connect to:
- Advertising platforms
- Marketing automation systems
- Data warehouses
If detailed personal data enters analytics, integrations may distribute it further.
A small implementation mistake can therefore propagate across multiple systems, increasing legal, operational, and reputational risk.
10. Click tracking and DOM scraping
Tag managers allow you to capture:
- Click text
- Link URLs
- Visible labels
- Surrounding DOM content
This often bypasses structured governance.
If click tracking sends:
- “Call John Smith” as link text
mailto:john.smith@example.comlinks- Download links containing invoice numbers
- Profile links containing personal names
…you are scraping personal data directly from the page and transmitting it to analytics tools.
The issue is not the click event itself. It is the uncontrolled extraction of detailed content.
Why this matters
Because analytics data is usually personal data in the EU, the key compliance question is proportionality.
Are you collecting only what is necessary for defined analytical purposes?
Or are you transmitting detailed information simply because your tools allow it?
More critically:
Are you storing and processing special categories of personal data without realising it?
If health-related pages, political content, or religious affiliations can be linked to persistent identifiers, your analytics implementation may qualify as high-risk processing.
Analytics is often described as “just statistics.” In reality, it is structured behavioural data tied to identifiable users.
Not knowing what flows into your analytics systems does not reduce responsibility.
What to do next
Start with visible things:
- Export full URLs, referrers, and event parameters from your analytics platform.
- Search for email patterns, long numeric identifiers, names, and sensitive keywords.
- Review which URL parameters are collected and whether they are genuinely needed.
- Audit click tracking and DOM-based variables in your tag manager.
- Examine your dataLayer specification and remove direct or unnecessary identifiers.
In many cases, the root cause lies in website architecture.
- If personal data appears in URLs, titles, or search parameters, the long-term fix is to redesign those elements.
- If tags are scraping uncontrolled page content, stop relying on raw DOM extraction and tighten governance.
- If too much information is exposed in the dataLayer, reduce it to strictly defined, purpose-specific fields.
Then address infrastructure.
Moving to server-side tracking does not automatically solve privacy issues, but it gives you control over what is forwarded to third parties. It allows you to strip unnecessary parameters and enforce data minimisation rules before data leaves your environment.
In summary:
- Fix the website design so personal data is not exposed in URLs or titles.
- Migrate to server-side tracking to gain control over outgoing data flows.
- Minimise personal data at every step of the analytics pipeline.
Effective analytics is not about collecting more data. It is about deliberately collecting less, and only what is necessary for clearly defined purposes.