Is Server-Side Google Analytics the Answer? : Development & Analytics

Note: I am not a lawyer – consult legal counsel before making any changes or assistance in assessing compliance or liability.

Warning: I walk through a lot of legal documents in this post and explain how I feel they relate to the technical design of using server side tag management. If you don’t care about the details of the implementation or decision I would recommend skipping it. If you want the TLDR: Scroll to the Conclusion at the bottom.

Previously, I’ve discussed that Google Analytics has been found unlawful in Austria and France. Since then I’ve been seeing a lot of claims that deploying Google Analytics via a server side platform (such as Google Tag Manager’s Server Side container) will make data collection for Google Analytics compliant with GDPR. I am very skeptical about this, based on the below assessment of the Austrian decision and relevant German court case.

The Decision

The decision begins to explore the concept of personal data (to determine if GDPR applies) on page 27. Through the rest of the document, unless otherwise referenced, quoted text is taken from the decision.

Unique online identifiers (“unique identifier”), which both the browser or the device of the complainant and the first respondent (through the Google Analytics Account Identify the ID of the first respondent as the website operator); the address and the HTML title of the website and the sub-pages that the complainant visited; Information about the browser, operating system, screen resolution, language selection as well as the date and time of the website visit; the IP address of the device that the complainant used

The next several pages go on to evaluate the above against Art. 4 No. 1 GDPR.

It begins with the various IDs which are part of the default request:

With regard to the online identifiers, it should be recalled that the cookies “_ga” or “cid” (Client ID) and “_gid” (User ID) contain unique Google Analytics identification numbers and are stored on the device…

The rest of page 28 goes through the reasoning for determining if such IDs are personal data, but ultimately finds:

As an interim result, it should be noted that the Google Analytics identification numbers in question here can be personal data (in the form of an online identifier) in accordance with Art. 4 No. 1 GDPR.

So now we know that the various cookies placed by the JavaScript are in fact personal data per GDPR. We’ll come back to this, later.

The next few pages talk about how these IDs are combined with other data, but finds that the combination of the IDs with the other data makes users more identifiable, not less.

The fulfillment of the requirements of Art. 4 Z 1 GDPR becomes even more clearly recognizable if one takes into account that the identification numbers can be combined with other elements:
By a combination With all of these elements – i.e. unique identification numbers and the other information listed above, such as browser data or IP address – it is all the more likely that the complainant can be identified (see recital 30 GDPR). Such a combination makes the complainant’s “digital footprint” even more unique.

The IP Address anonymization issue isn’t actually settled by the decision, as it wasn’t a factor due to the site targeted by the compliant not having it enabled.

The respondents’ submissions relating to the “anonymization function of the IP address” can remain open since the respondents have admitted that this function was not implemented correctly (at the time at which the complaint was made) (cf., for example, the respondent’s statement of June 18, 2021 ).

Likewise, the question of whether an IP address isolated is considered a personal date, remain open, as this – as mentioned – with further elements (in particular the Google Analytics identification number) combined can be. In this context, it should be noted that according to the case law of the European Court of Justice, the IP address can represent a personal date (see the judgments of the European Court of Justice of June 17, 2021, C 597/19, margin no. 102, as well as of October 19, 2016, C 582 / 14, margin no.49) and this does not lose its character as personal data simply because the means of identifying it are with a third party.

Since the IP Address issue isn’t a factor in this case – one could reasonably assume that leveraging the anonymization feature isn’t going to change the ultimate outcome of the decision, because by default the various IDs and other browser fingerprint data are personal data that make a person more identifiable, not less.

Pages 29 – 31 explore if the personal data is specific enough to make someone actually identifiable.

In the present case, however, there is now particular Actors who have specialist knowledge that makes it possible, in the sense of the above, to establish a reference to the complainant and therefore to identify him.
bottom of page 29

It determines that based on Google’s response (Question 9) , that Google can ultimately identify him unless specific Google Account features are disabled. However the decision goes on to say it need only be possible to be identified, not that the identification has actually taken place.

In this context, reference should be made expressly to the unambiguous wording of Art. 4 Z 1 GDPR, which is addressed to a Be able is linked to (“can be identified”) and not to whether an identification is ultimately also carried out.

Pages 30 and 31 specifically mention the collection of US intelligence may also be able to identify the user, and that Google was unable to prove this wasn’t the case, but judged from the transparency reports Google publishes, that it does indeed provide information to the Government.

For sake of completeness this references the decision to end the Privacy Shield, which was based on an evaluation of the US FISA Section 702, Executive Order 12333 and Presidential Policy Directive 28.

The rest of the Austrian decision goes on to evaluate the data transfer to the USA (which Google detailed in their response) and if that was legal under GDPR, and it found that it wasn’t based on its analysis and again referencing the US laws.

Now that we know what we’re dealing with – let us examine if the concept that server side data collection is compliant with GDPR based on what we’ve established above.

Server Side

Routing Google Analytics through a Server Side container is possible and gives you a lot of control over what the specific pieces of data you send to Google are. Since you can’t make Google Analytics legally compliant due to Chapter V of the GDPR governing international data transfers, you have to adjust the request so dramatically that GDPR doesn’t apply because the data isn’t personal data (which as you can see from the above, can be high bar).

With that being said, there are specific pieces of information which have to be include in the request which we explore below.

Universal Analytics

For Universal Analytics there are specific required fields required as part of an analytics hit. In an exchange over Twitter, Simo Ahava was kind enough to point out the fields for me for page view tracking.

?v=1&t=pageview&dl=&cid=&tid=UA-12345-1

When we look at the Measurement Protocol Reference for Universal Analytics we see this relates to the following fields:

v relates to Protocol version
t relates to Hit Type
dl relates to Document Location URL (technically optional, but required for page view tracking)
cid relates to Client ID
tid releases to Tracking ID

However we know from the decision that the critical component here is the cid parameter. Thus we can reasonably conclude that while IP Address anonymization is likely good it doesn’t on its own address the finding of the decision.

Could we hash the client id? Sure. However that only turns the client id into pseudonymous data, which likely isn’t going to change the ultimate outcome of a determination under Art 4 of the GDPR.

Could we fully randomize the id or provide a static id based on the server? Yes. However as soon as you do that you destroy the session reporting of Google Analytics. Since Universal Analytics is built around the concept of a User Session, doing this is let me say.. sub-optimal.

Could we capture other data such as browser data or campaign performance? The more data you add the less likely this is compliant and I I go on to show the effort (or indeed the request) may not matter later on in the post.

Google Analytics 4

GA4 wasn’t evaluated in the decision, but using the decision as a basis for what may ultimately be acceptable we can reference the Measurement Protocol spec for GA4 to see what the minimum parameters required are for collection. For the Web channel these are:

Client ID / App Instance ID
Server Data
Event Data

If we pull apart the required fields, it’s likely the first bullet which is going to be relevant. We discussed the Client ID above under Universal Analytics, but let’s look at the App Instance ID.

The App Instance ID fulfills several functions, but the critical one in my opinion is:

Instance ID is unique across all app instances across the world, so your database can use it to uniquely identify and track app instances
https://developers.google.com/instance-id

With this being the case, I am skeptical it would be viable in light of the decision. I have to believe that this would be declared personal data under GDPR if it was evaluated by a DPA.

So because we’re back where we left off with Universal Analytics I have to end up believing that even if you somehow made the client id not personal data and modified the request to the point that the entire collection was anonymous enough, you’d lose such a large amount of the feature set promised from the solution that it likely wouldn’t be viable.

The CLOUD Act

Now that we’ve gone through the decision and what modifying the request for a server-side connection could look like, I need to mention the US CLOUD Act. You may remember above I referenced a German court case. In this case a University was barred from loading a consent manager because the Cloud hardware was owned by a US Company, and thus could be subject to the CLOUD Act.

Regarding the service “C[xxx]bot”, a consent manager of the Danish provider Cy. A/S, the applicant states that the same data as for the ‘G. Tag Manager” would be sent to C[xxx]bot. It is true that this service is offered by a company established in Denmark. However, the target domain consent.c[xxx]bot.com points to a server with an IP address pointing to the US-based cloud hosting company Ak. Technologies Inc. was registered. Although the server may be in the EU, the US company has access to it, so the US Cloud Act applies.
https://rewis.io/urteile/urteil/2tj-01-12-2021-6-l-73821wi/

The court decided that under the act the US intelligence services could compel the US company to turn over the data regardless of the fact that the data (and indeed the server) sit inside of the EU.

Under the Cloud Act, US government agencies could request personal data from US companies unilaterally, without a court order and without a mutual legal assistance treaty. This contradicts Articles 7, 8, 11 and 52 (1) GrCh and the interpretation of these norms by the ECJ, according to which official access to traffic data is only permitted if there is a suspicion of serious crime and is subject to the reservation of the judge or an independent authority. In contrast, the US legal situation allows the initial suspicion of any criminal offense to suffice. Thus, the respondent, as the person responsible, exposes the applicant’s personal data to the risk of unauthorized access, which constitutes a breach of confidentiality pursuant to Article 32(1)(b) GDPR.
https://rewis.io/urteile/urteil/2tj-01-12-2021-6-l-73821wi/

It then determined if Akamai would need to comply with the act.

Because Ak. As a US company, Technologies Inc. is subject to the US Cloud Act, a US federal law of February 6, 2018. According to this, US providers of electronic communications or remote computing services are obliged to disclose all data in their possession, custody or control (“possession, custody or control”), regardless of whether the data is inside or outside are stored in the USA (Title 18 USC § 2713) (cf. Kühling/Buchner/Schröder, 3rd edition 2020, GDPR Art. 48 para. 25).

So while the rest of this case goes on to explain what that ultimately means in context of the case, it found that loading or sending data to servers located in the EU may violate GDPR if the servers are also subject to the CLOUD act by virtue of the ownership of the hardware.

While mention of this law wasn’t applied to the Austria ruling, I have to wonder if we’re going to ultimately see the existence of the CLOUD act used to justify GDPR decisions going forward on a broader level.

Logically, modifying the request via a server side container prior to transmission to Google Analytics wouldn’t be sufficient if Google (or another US based Cloud Company) got the original client based request (and thus had access to information such as the IP Address, cookies, and browser fingerprint) on connection to the server container.

Under this scenario the only solution would be to leverage a docker container of Google Tag Manager Server Side and deploy it to a EU Owned Cloud service and modify the request so dramatically that the data flowing out to Google Analytics isn’t subject to GDPR because it doesn’t contain enough data to be considered personal data per Art 4 of the GDPR. It is not possible to change the endpoint of the data collection, and even if you could – the endpoint would also have to be on a EU owned cloud provider.

In Conclusion

What I’ve shown is that with enough effort you may be able to get a Google Analytics request anonymous enough to not be subject to GDPR, you forfeit a lot of the functionality of the report suite in the process. Ultimately all the effort may not matter once you additionally consider the CLOUD Act and what that means.

For an Enterprise spending six figures on a GA360 license that’s a lot of hoops to jump through in order to make an analytics package compliant with an international law dispute. Presumably they are spending that money because they want to leverage the Google Analytics feature set which you end up giving up a lot of in the process of making the data flow compliant to the GDPR. I could easily see that as a deal breaker for a lot of companies considering there are EU regional analytics packages which don’t require near as much effort / cost to get up and running.

I personally just don’t see a way to make Google Analytics work where the effort justifies the means in light of these cases. Even if Google is working on additional controls to help enable users, I don’t see how they can avoid being subject to the CLOUD act as a US based business. It may be my lack of legal knowledge showing but I really think we’re at an impasse until the EU and USA reach an agreement regarding data transfers.

I also need to point out this isn’t just a Google Analytics problem. It applies to a number of products and companies operating out of the USA, or operating on a US Company owned cloud provider in Europe. These first cases have been about Google Analytics, but if the French decision is being truthful, we’re going to see this expand to other vendors in the near future. At this point I think it’s just a matter of time.