Skip to content

Why Cross Domain Analytics is Difficult

It seems on the surface that it should be easy. You want to know if the user on your site, is the same as the user you saw on the other site you also own. Your developers may be pushing back, or you may have a solution that simply doesn’t work in certain scenarios.

Without going into the legal concerns let us review why is this technically difficult to get done correctly.

Why is it so hard?

What may not be evident at first glance, is the technical architecture of how the web works is changing. In particular, browsers are taking steps to make cross domain tracking more difficult due to the abuse of certain advertising and social media companies. In seeking to stop those abuses, more benign use cases get caught in the cross fire.

This in effect means you are trying to do this is fighting a battle against something that various browsers such as Safari and Firefox are trying to eliminate.

So what we end up with is a complex and misconfiguration prone process, which has edge cases which may not at all be considered when addressing the question of is the User on sitea.com the same User as on siteb.com.

Current Client Side Solutions

Both Adobe Analytics and Google Analytics have solutions that can be implemented as an extra step when configuring their respective analytics products.

How these solutions work, is using JavaScript to read some storage location (typically a first party cookie) then appending the relevant IDs to the outbound destination URL.

So for example – the outbound link to siteb.com would become:

site.com?clientId=A1234B

On the destination domain (siteb.com) reads the clientId from the URL, and rebuilds the state as a 1st party cookie on siteb.com prior to sending the analytics hit to the vendor. Because the ID matches on sitea.com and siteb.com the active session is preserved as a cross-domain session.

The solution in most cases also needs to work in reverse, with siteb.com undertaking the same process when directing outbound links to sitea.com

So Why not use 3rd party cookies?

Traditionally, 3rd party cookies were used for cross domain tracking. This is no longer viable for large market segments and will likely cease to become viable entirely by the end of 2022.

Webkit, the engine which powers Safari, as well as every browser on iPhone and iPad, has since release 14 blocked 3rd party cookies entirely on the iOS and iPadOS platform. Any solution which relies on 3rd party cookies will see the user as two different people and the sessions would be broken upon the cross-domain jump when using Safari or one of these operating systems.

Chrome has pledged to eliminate 3rd party cookies by the end of 2022.

Given that Webkit controls a full half of the mobile ecosystem in North America (and roughly 80% of all active Apple devices run iOS14) and Chrome controls upwards of 69% market share any solution built upon the technology which is actively being phased out by browsers would be of questionable value before ceasing to work entirely for the majority of users.

What Edge cases are there?

What’s in a URL?

The primary issue that I see, is the URL solution described above is the only solution deployed to address the tracking requirements. When the solution depended on cookies, it was deterministic, as the chances of a user having a specific cookie without having been to a specific domain where very low.

Now that cookies are not the bridge between domains, the only remaining link is probabilistic in nature. Chances are they came from sitea.com on their way to siteb.com, but there’s the chance that the URL was shared, and so now the session has multiple people, potentially from a number of locations all of which are potentially corrupting the data collection for the user you were actually interested in.

Further, the URL that contains the client id may be bookmarked which may be it’s own special problem if you for example, wanted to force a new client id into use.

You have to write non-standard logic to account for this and address this issue, often that deals with using timestamps, which can have their own special array of issues.

Brave, as a browser – goes a step further and systematically removes such identifiers from URLs during navigation breaking this solution entirely.

What’s with Referrals?

Some analytics platforms take a non-first party referral as the start of a new session. To avoid this you specifically need to tell those platforms that siteb.com shouldn’t start a new session when it appears as a referrer for a hit. Failing to do that will cause session fragmentation when the user passes between domains.

You’d have to do this for every domain which cross domain tracking is intended for.

What about iFrames?

The use of iFrames on a site when the iFrame source is a different domain creates an entire class of problems as they may have to deal with the above URL and Referrer issues, as well as account for protentional race conditions between the parent frame page and iFrame.

The Google documentation for this is fairly good. However not accounting for all the scenarios here (like the analytics logic loading slow, or failing to load on the parent frame) may result in a different session on the iFrame domain.

What about Intelligent Tracking Prevention?

In scenarios where sitea.com is a known tracker, as classified by ITP’s machine learning, the use of the query linker solution may result in the cookies being created via JavaScript having a duration of 24 hours maximum. So if the intention is to track the user cross session as well as cross domain, this edge case needs to be accounted for by pairing it with some custom cookie persistence layer.

Are there plans to make this easier for benign use cases?

You could say that until today (4/7/21) that there were. Google proposed to the W3C the concept of First Party Sets, in which a company could allow cross domain tracking between different domains it owns.

This proposal underwent TAG review today and was rejected – stating:

For the reasons outlined here, we consider the First Party Sets proposal harmful to the web in its current form. This proposal undermines the concept of origin, and we see origin as a load-bearing structural pillar of web architecture. There are strong objections by other implementers. See https://github.com/mozilla/standards-positions/issues/350 / Mozilla standards position and Webkit-dev position. Without strong multi-implementer consensus, we think making such a change to a piece of fundamental web architecture will additionally fragment the web platform.

We believe the pushback from other implementors is a strong message that reinforces our concerns that this proposal can result in detrimental effects to the greater web ecosystem. It is likely that this proposal only benefits powerful, large entities that control both an implementation and services.

https://github.com/w3ctag/design-reviews/blob/main/reviews/first_party_sets_feedback.md

So Google can attempt to rework the process, or attempt to go it alone at the risk of fragmentation of the web at large.

What should be taken from the above is there is no easy solution to support cross domain tracking on the horizon that should be depended on to make this scenario easier and in all likelihood it will continue to get harder as time goes on.

Updated 4/8/21: to remove a mistaken reference that the Brave browser would strip the linker parameters.

Published inBrowser UpdatesPrivacy