Connecting dirty GA4 data to Claude is like playing with matches in a gas station

Many companies are now experimenting with AI-assisted analytics. The idea is simple: connect GA4 data to Claude, ChatGPT, Gemini, or an internal AI assistant, and ask it to analyse customer journeys, explain conversion problems, or find marketing opportunities.

On paper, this sounds useful. In practice, it can become a serious privacy and governance risk very quickly, especially if the analytics implementation has never been properly audited.

The problem is not AI itself. The problem is that many GA4 implementations are dirty, and AI can make existing data quality and privacy problems much more visible.

I have seen very sensitive data in analytics tools

Over the years, I have seen GA4 and other analytics implementations containing data that should never have been collected there. In many cases, the companies were completely unaware that the data even existed inside their analytics platform.

For example, I have seen GA4 reports with:

social security numbers
names and addresses
email addresses
phone numbers
passwords
car registration numbers
customer IDs linked to real individuals
free-text form inputs
internal support messages
search queries revealing medical concerns
data that can be used to infer health information

In some situations, analytics data has also revealed financial difficulties, political interests, religious affiliations, or other highly sensitive details.

Usually, nobody collected this data intentionally. It leaked into analytics through URLs, form fields, internal site search, custom dimensions, dataLayer implementations, CRM integrations, or poorly designed event tracking.

The reports still looked normal, and the dashboards still worked. Marketing teams continued using the data, so everyone assumed the implementation was under control.

AI changes the risk profile

Dirty analytics data has always been a problem, but AI makes the problem significantly bigger.

When people connect GA4 with the help of MCP or BigQuery tables directly to AI tools, they often expose much more raw data than they realise.

AI systems encourage people to explore data freely. Users ask broad questions, upload datasets, and copy results into documents, Slack threads, emails, and presentations.

Sensitive details that were previously hidden inside URLs or event parameters can suddenly become visible in summaries and answers. And they are analysed by AI tools.

Now, organisations are experimenting with:

AI-generated customer journey analysis
AI summaries of user behaviour
automated segmentation and profiling
AI copilots connected directly to marketing data warehouses

The technology itself is not the problem. The real issue is that many analytics implementations were never designed properly.

Without anyone realising, personal data like health information could be used for customer segmentation and user profiling…

This is why connecting dirty GA4 data directly to Claude is like playing with matches in a gas station. Everything may look calm until one overlooked field creates a serious privacy problem.

Do not connect raw analytics exports blindly

AI can absolutely improve analytics work. It can help analysts find patterns, summarise customer journeys, explain anomalies, and generate hypotheses faster than before.

A marketing data warehouse can create a safe layer between raw data collection and AI-assisted analysis.
Mikko Piippo

But the foundation has to be clean. Before connecting GA4 or analytics data to AI systems, companies should first review what kind of data they are actually collecting and storing.

At minimum, organisations should:

audit the analytics implementation carefully
review URLs, events, parameters, and custom dimensions
remove identifiable and sensitive data
build a proper marketing data warehouse instead of exposing raw analytics exports directly
review closely which tables, datasets, and fields AI systems are allowed to access
restrict access to raw exports
create clear governance rules
involve privacy and legal stakeholders early
prefer aggregated or curated datasets whenever possible

A marketing data warehouse can create a safe layer between raw data collection and AI-assisted analysis. But it only helps if the warehouse itself is designed carefully and if someone actively reviews which datasets AI systems are allowed to use.

AI does not fix bad data collection

AI does not magically sanitise analytics data. It does not automatically understand legal risks, and it does not know which fields should never have been collected in the first place.

If your GA4 implementation contains social security numbers, passwords, health-related search queries, or identifiable customer information, connecting it to AI will not make the situation better. In many cases, it will simply make the exposure wider, faster, and more difficult to control.

Before you let AI analyse your marketing data, make sure the data is safe, necessary, and properly governed.
Mikko Piippo

The lesson is simple: before you let AI analyse your marketing data, make sure the data is safe, necessary, and properly governed.

Otherwise, you are not scaling intelligence.

You are scaling exposure.