Analytics in 2019 : Development & Analytics

This is a mirror of my original post on LinkedIn

The year 2019, it’s only July, but what it a year it’s been for analytics. Some of the major highlights follow.

On the Technical Front..

In April Safari revealed several changes (https://webkit.org/blog/category/privacy/) including Intelligent Tracking Prevention, which unlike Firefox’s use of Disconnect’s services, happens entirely in the browser via a Machine Learning Model. The downsize here is it breaks the existing JavaScript specification, and as a result Safari and other browsers can have vastly different experiences depending on how cookies are set and what the rest of the environment is like. Is your cookie going to last 2 years, 7 days or 24 hours? There’s no easy way to tell at a glance without running some tests to see what Safari thinks about it. I have written a lot about what this could mean here, however my quick take on it is this is a maintenance nightmare for developers whom may have legitimate reasons for setting cookies on the client such as shared hosting. This style of prevention absolutely can break same domain features. Safari is the default browser for Mac OS X and iOS.

In May Google has announced plans to embrace privacy with their SameSite cookies changes. (SameSite cookies explained | web.dev). I think their advertising business means to much for them to just outright disable tracking entirely. Google’s solution is attractive in two ways. First, it prevents unintended data leakage (which is current state today) and secondly it removes the primary attack vector for Cross Site Request Forgery attacks. (https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)) I am also happy to see they have submitted proposals to the W3C working groups to have this expanded cookie functionality added to the specification. The solution shouldn’t break same domain features, but may break cross domain features unless the vendors update their cookie attributes. This more fine-grained control will enable users to limit which cookies are set and processed, which can have effects on attribution, segmentation and the like. Chrome is the default browser on Chrome OS, and Android.

In June Firefox announced plans to enable Content Blocking by default (When it comes to privacy, default settings matter! – The Mozilla Blog) and as a result any entity (disconnectme/disconnect-tracking-protection disconnectme/disconnect-tracking-protection) or service (https://github.com/disconnectme/disconnect-tracking-protection/blob/master/services.json) which matches those files is terminated in the browser. This style of tracking prevention can legitimately break a lot of functionality on websites, even if used for non-tracking purposes, simply because it’s hosted by a blacklisted domain. This could break same domain features. The traffic exists but is effectively ‘unseen’ by services, generating a ‘Phantom’ load on the website not observable to the services being blocked.

Also in June Microsoft announced plans (https://blogs.windows.com/msedgedev/2019/06/27/tracking-prevention-microsoft-edge-preview/) to also have a list of domains to block resource loading, and storage access as well. The exact list is not yet public and the features are still in development. Like Firefox, depending on how it works and the exact defaults in place, this can break same domain features and the traffic would exist, but not be reflected in Analytics generating a ‘Phantom’ Load. Edge is the default browser on Microsoft Windows.

Since April all major browser vendors have all announced different plans for addressing privacy, each creating or failing to address one of the others edge cases so that a universal client side solution is what I will call a Hard Problem™ on the verge of heading toward non-solvable. Given this I expect the rise of server side tracking to once again take hold because to retain capability in attribution all of the communication to 3rd parties has to take place out of view of the client.

On the Legal Front..

Completely side stepping all the ways that analytics and attribution will break – we also have legal concerns which have direct impacts to how the tech is built and used.

The General Data Protection Regulation is still in effect in the EU, which many conditions on what data can be tracked and how. This is of particular concern to companies selling to EU customers.

Starting on January 1st, 2020, the California Consumer Privacy Act is also going to take effect, putting into place data collection rules for residents of California. California is one of the largest economy’s in the world, and thus the possible impact of this law can not be understated.

Our friends in the United Kingdom also have to deal with the Privacy and Electronic Communications regulations – which the ICO. (Information Comissioner’s Office) has released some guidance for (https://ico.org.uk/for-organisations/guide-to-pecr/guidance-on-the-use-of-cookies-and-similar-technologies). Two things as a American I found notable, was that analytics are not considered essential for running your business (https://ico.org.uk/for-organisations/guide-to-pecr/guidance-on-the-use-of-cookies-and-similar-technologies/how-do-we-comply-with-the-cookie-rules/#comply15) – thus you have to ask for consent, and that if you are on a social media platform, and that platform does things with cookies you are unaware of, you are still jointly liable (https://ico.org.uk/for-organisations/guide-to-pecr/guidance-on-the-use-of-cookies-and-similar-technologies/what-else-do-we-need-to-consider/).

Additionally to the GDPR and PERC – the UK also has the Data Protection Act of 2018.

The above list is not comprehensive to all the technical changes and laws which a company may be subject to. It is however a good taste of all the different things you have to consider now when you are collecting data depending on who your customers may be, where they reside, and what they are accessing your site with. The bar for accurate data and attribution is getting raised ever higher, and companies will have to make serious efforts to retain their accuracy levels while also being in legal compliance. I feel one thing is evident given this state of flux. Accurate data will get more expensive in terms of time and staffing and the timelines for deploying a solution will likely increase accordingly. I expect browsers will continue to tweak things and until their alignment occurs, development time will increase accordingly with all the different edge cases each one has introduced.

If I had one piece of advice to give it’d be to get a good development / analyst / legal team in place, and don’t collect any data your lawyer is unwilling to argue on your behalf in a court of law. It’s easier and cheaper to never collect or build something, then it is to rip it out later.