10 common ways your digital analytics implementation leaks personal data

In the EU, digital analytics data is practically always personal data.

Even if you do not store names or email addresses, analytics systems rely on cookies, device identifiers, or IP-based signals. Under GDPR, that is enough to make the data personal.

So the compliance question is not whether you process personal data. You almost certainly do.

The real question is different:

Are you sending too much, too detailed information to systems that do not need it?
And are you unintentionally processing special categories of personal data?

In most organisations, data leakage in analytics is not deliberate. It results from default configurations, unreviewed website design decisions, and overly broad tag management setups.

These risks usually originate from three structural issues:

Website design that exposes personal data in URLs and titles
Careless tag management and DOM scraping
Overly generous dataLayer implementations

Below are ten common leakage patterns observed in practice.

1. Personal data in URL parameters

The most frequent problem is personal data embedded directly in URLs. I have encountered this numerous times, in different industries.

Examples:

?name=John+Smith
?email=john.smith@example.com
?phone=0401234567

Most analytics tools collect full URLs by default. Once stored, these parameters spread to reports, dashboards, exports, and data warehouses.

If the URL contains personal data, your analytics system stores it. And not only your analytics tools: URLs are seen and logged by, for example, internet infrastructure, browser extensions, and advertising platforms.

2. Pseudonymous identifiers and click IDs in URLs

Even when explicit identifiers are avoided, URLs often contain:

Customer IDs
Contract or case numbers
CRM lead IDs
gclid, fbclid, msclkid
HubSpot tracking parameters

These are pseudonymous identifiers. On their own, they may not directly identify someone. Combined with other systems, they often can.

Click IDs such as gclid or fbclid are meaningful only within their original advertising ecosystems. A gclid belongs in Google Ads. A fbclid belongs in Meta.

Outside their origin platform, they are usually just noise.

For example, GA4 cannot meaningfully use a fbclid. It simply stores it unless you explicitly filter it out. The same applies to many other analytics platforms.

Under GDPR, pseudonymous data remains personal data if it can be linked back to an individual using additional information.

If a parameter provides no analytical value in your platform, storing it increases risk without improving insight. On the contrary: storing it often makes analysis more difficult.

3. Special categories hidden in URLs

Some websites encode sensitive information directly in their URL structure:

/conditions/depression-treatment
/support/hiv-testing
/membership/political-party-x
URLs describing financial distress or addiction support

If these page views are tied to cookies or device identifiers, you may be processing special categories of personal data.

You may not think of this as “health data processing.”

But if an identifiable browser is consistently linked to specific sensitive content, the dataset becomes high-risk.

4. Personal data in page titles

Analytics platforms usually store page titles alongside URLs.

If titles include:

Patient names
Property addresses
Case numbers
Logged-in users’ names

…that information is transmitted and stored automatically.

Titles are rarely reviewed from a privacy perspective, yet they are part of the standard analytics payload.

5. Referrer URLs leaking information

Referrer fields can contain personal data from previous pages.

Examples:

Internal search result pages with query parameters
Redirect URLs containing identifiers
Third-party sites embedding user information in links

Unless explicitly filtered, referrer URLs are collected and stored automatically.

You may therefore collect personal data that did not originate on the current page.

6. Internal site search queries

Users type highly personal information into internal search fields:

Their own name
Order numbers
Email addresses
Medical symptoms
Financial problems

If raw search terms are captured without filtering, your analytics system accumulates unstructured personal data at scale.

In some cases, this may also include special category data.

7. Automatic capture of form values

Modern tag managers and some analytics scripts can automatically track:

Form interactions
Input field names
Input field values
Element attributes

If misconfigured, this may result in collecting:

Email addresses
Phone numbers
Free-text messages
Complaint descriptions

Often this happens because enhanced measurement or auto-event tracking was enabled without a detailed audit of what is actually being sent.

The result is excessive and unnecessary data collection.

8. Overexposing user data via the dataLayer

Developers sometimes push full user objects into the dataLayer for convenience:

Email
Name
Phone number
Internal identifiers

Once this data is present in the dataLayer, every tag on the page can access it. They probably don’t process the data, but they can access it.

Most of the time, these identifiers are hashed using SHA256. But, of course, they can still be used as keys to identify people.

Because of this analytics tools, advertising pixels, and marketing automation scripts may all receive more personal data than required.

This conflicts with GDPR principles of data minimisation and purpose limitation.

9. Integrations multiply exposure

Analytics tools are rarely isolated.

They connect to:

Advertising platforms
Marketing automation systems
Data warehouses

If detailed personal data enters analytics, integrations may distribute it further.

A small implementation mistake can therefore propagate across multiple systems, increasing legal, operational, and reputational risk.

10. Click tracking and DOM scraping

Tag managers allow you to capture:

Click text
Link URLs
Visible labels
Surrounding DOM content

This often bypasses structured governance.

If click tracking sends:

“Call John Smith” as link text
mailto:john.smith@example.com links
Download links containing invoice numbers
Profile links containing personal names

…you are scraping personal data directly from the page and transmitting it to analytics tools.

The issue is not the click event itself. It is the uncontrolled extraction of detailed content.

Why this matters

Because analytics data is usually personal data in the EU, the key compliance question is proportionality.

Are you collecting only what is necessary for defined analytical purposes?

Or are you transmitting detailed information simply because your tools allow it?

More critically:

Are you storing and processing special categories of personal data without realising it?

If health-related pages, political content, or religious affiliations can be linked to persistent identifiers, your analytics implementation may qualify as high-risk processing.

Analytics is often described as “just statistics.” In reality, it is structured behavioural data tied to identifiable users.

Not knowing what flows into your analytics systems does not reduce responsibility.

What to do next

Start with visible things:

Export full URLs, referrers, and event parameters from your analytics platform.
Search for email patterns, long numeric identifiers, names, and sensitive keywords.
Review which URL parameters are collected and whether they are genuinely needed.
Audit click tracking and DOM-based variables in your tag manager.
Examine your dataLayer specification and remove direct or unnecessary identifiers.

In many cases, the root cause lies in website architecture.

If personal data appears in URLs, titles, or search parameters, the long-term fix is to redesign those elements.
If tags are scraping uncontrolled page content, stop relying on raw DOM extraction and tighten governance.
If too much information is exposed in the dataLayer, reduce it to strictly defined, purpose-specific fields.

Then address infrastructure.

Moving to server-side tracking does not automatically solve privacy issues, but it gives you control over what is forwarded to third parties. It allows you to strip unnecessary parameters and enforce data minimisation rules before data leaves your environment.

In summary:

Fix the website design so personal data is not exposed in URLs or titles.
Migrate to server-side tracking to gain control over outgoing data flows.
Minimise personal data at every step of the analytics pipeline.

Effective analytics is not about collecting more data. It is about deliberately collecting less, and only what is necessary for clearly defined purposes.