Very often, I use content language as a custom dimension in GA4 and other digital analytics tools. It is very easy to do as long as your HTML code defines the language correctly.
Understanding how users interact with different language versions of your site is still one of the most practical analyses you can do. This is particularly relevant in markets like Finland, where many sites operate in two or more languages.
In Universal Analytics, this was typically handled with content groupings or custom dimensions. In GA4, the same outcome is achieved using event parameters and custom dimensions. The logic is familiar, but the implementation is different.
Start with identifying the language
You need a reliable way to determine the language of each page. There are two common approaches.
If your site structure includes language in the URL, that is usually the simplest option. For example, /fi/ and /en/ paths can be used to extract the language directly. In Google Tag Manager, a small Custom JavaScript variable is enough:
function() {
return window.location.pathname.split("/")[1];
}
This works well when your URL structure is consistent and enforced.
If the language cannot be derived from the URL, the HTML lang attribute is the best alternative. Most well-implemented websites define it at the document level:
<html lang="en">
In this case, you do not need custom JavaScript. A DOM Element variable in GTM is cleaner:
- Selector:
html - Attribute:
lang
This approach is generally more maintainable and aligns with accessibility and SEO standards.
Send the language to GA4
GA4 does not have the same content grouping configuration as Universal Analytics. Instead, you send the language as an event parameter.
There are two ways to do this.
You can use the built-in parameter content_group. If you send your language value with this parameter, it will populate the “Content group” dimension in GA4 reports. This is quick to implement, but it behaves like a simple label rather than a structured grouping system.
The more robust approach is to define your own parameter, for example page_language. This keeps your data model explicit and avoids mixing different types of grouping logic into one field.
In Google Tag Manager, add the parameter to your GA4 Configuration tag so it applies to all events:
- Parameter name:
page_language - Value: your GTM variable (from URL or HTML)
If you want, you can also send both:
content_group= languagepage_language= language
This gives you flexibility in reporting while keeping your data model clean.
Side note on naming conventions
I prefer to group all parameters and dimensions related to a specific page using a consistent page_* naming pattern.
For example:
page_languagepage_authorpage_id
This makes it immediately clear what the dimension describes and helps keep the data model understandable, especially when multiple teams work with the same analytics setup.
Why a custom dimension is usually preferable
Although content_group is convenient, it has limitations.
It is not a true grouping feature in GA4. It does not support rules or multiple groupings. It simply stores whatever value you send. In practice, it behaves like a generic text field with a predefined name.
This often leads to overloading. The same field might be used for language, page type, or content category. Over time, this creates confusion and makes analysis harder.
A custom dimension such as page_language avoids this. It has a single, clearly defined purpose. This improves readability in reports and makes collaboration easier.
Custom dimensions also scale better. You can define separate dimensions for language, page type, and content category without conflicts. This aligns well with how GA4 Explorations are typically used.
Register the custom dimension
If you use a custom parameter like page_language, you must register it in GA4 before it appears in reports.
Go to Admin, then Custom definitions, and create a new dimension:
- Name: Page language
- Scope: Event
- Event parameter:
page_language
This step is required. GA4 will not show the data in reports otherwise, and it is not retroactive.
No registration is needed for content_group.
Analyse language performance
Once the data is flowing, you can use it in both standard reports and explorations.
In the “Pages and screens” report, you can switch the primary dimension to either Content group or Page language. This allows you to compare engagement, traffic, and conversions across languages.
Explorations provide more flexibility. You can combine language with traffic source, campaign, or device to understand how different audiences behave across language versions.
Keep the data consistent
Small implementation details have a large impact on usability.
Decide early whether you use short language codes like fi and en, or full locale codes like fi-FI. Mixing formats will make analysis harder. In many cases, simplifying everything to a single format is the best approach.
If needed, you can standardize values in GTM using a Lookup Table or Regex Table before sending them to GA4.
It is also worth sending the language parameter with more than just page views if you plan to analyse conversions by language.
The same approach can be used in Piwik PRO and Matomo
The exact same logic applies outside GA4.
In Piwik PRO and Matomo, you can capture the document language using either the URL or the HTML lang attribute and send it as a custom dimension. Both tools support custom dimensions at the visit or action level, which makes implementation straightforward.
In practice:
- Extract the language using JavaScript or a tag manager
- Store it in a custom dimension (e.g.
page_language) - Use it in reports to compare behaviour across languages
The main difference is that these tools still have a more traditional reporting model, closer to Universal Analytics. However, the underlying idea remains exactly the same.
Final thoughts
The core idea has not changed. You identify the language, capture it, and use it to segment your data.
In GA4 and other modern analytics tools, this is implemented through parameters and custom dimensions. If done cleanly, it becomes one of the most useful ways to understand how your content performs across different audiences.