Skip to content

Why am I being asked to pay for development work to keep A/B testing?

This is a mirror of my original LinkedIn post

It has likely been a rough week for people, coming to terms with the changing landscape in light of Safari’s Intelligent Tracking Prevention (ITP) 2.3. Among the hardest hit are those caught in the cross-fire of Safari’s crusade. A shining example of which is the array of A/B Testing platforms sites use for optimization and improving the user experience.

To understand why, we have to back up a bit.

Earlier in 2019, most client side test vendors looked something like this.

The Website is loaded by the back end, and then on the user’s browser, contacts the test vendor for segmentation. Then the front end modifies the experience per the test segmentation and records analytics. In order to ensure that the device sees the same variation for the test for the duration, it commonly sets what is known as a cookie, which was commonly set for some long period of time, say 2 years. The important distinction here is – it’s a client side cookie (which is important later).

In the above diagram everything below the black line isn’t viewable by the front end, but it’s not doing anything special other than serving the web pages. This enables vendors to market their products as being easy to install (just one line of JavaScript on every page) because all of the critical functionality takes place above the black line, and the test vendor controls most of how that works via JavaScript.

So what changed?

Safari introduced ITP 2.1 in early 2019, and with that change, they limited any cookie set by the test vendor’s functionality above the black line to a 7 day max expiry. The impact to the above diagram would be as follows:

After 7 days from last visit, the user is considered ‘New’ and upon finding a new user, the platform segments them. This could be different from the segmentation originally assigned to them on the previous visit. If your test is still running, there’s an increased chance that the person is now being counted in multiple cells. Since they are ‘new’ they also count as a ‘new’ person in the test, throwing off your sample counts.

While some noise in a test is normal because of situations like cross-device, deletion of cookies, and so on. It’s quite something else to have an entire sizable segment of your traffic have this behavior. It affects runways, sample sizes, and analysis. If your test also needs to keep state for the experience that would also get reset, possibly causing unintended behavior in the browser segment.

In May Safari introduced ITP 2.2, doubling down on the cross-site tracking prevention – if the visit met specific criteria the outcome here would have been the cookie gets set for 24 hours So while the above design still “technically” works in that users come to the site, and get segmented – that experience is only assured for 24 hours from the time they stop visiting. One of the things worse than having a difference experience every week, is having one every 24 hours and 1 minute. Clearly not ideal.

Well the client side test vendors put a workaround in place. They had relied on cookies, those were no longer ‘safe’ but maybe they could use something else. Most vendors landed on an alternative location on the browser to prevent people from jumping cells, and this was known as localStorage. Still completely above the black line, still can be handled entirely by the testing platform with no additional client development required. That’s important to remember, despite several months since ITP 2.1 launched, end clients were largely shielded from any further implementation costs to ensure the platform continued to work.

Now September rolls around…And everything breaks.

Safari’s ITP 2.3 also now affects localStorage and every other non-cookie client writable storage location when the previous domain was a known tracker (such as in the case of most social platforms) and the URL contained query strings or fragments (which if you were on a social site, they’d append if you didn’t already have them. Facebook for example, does this with a ‘fbclid’ name/value pair on the end of the URL. Safari under these conditions will wipe storage out at a maximum of 7 days, and I fully expect that to come down.

So now the design just breaks on a number of edge cases to the point it’s likely not viable.

So what the client side vendors will need to do (and some have already started) is recommend something like this.

Hybrid design

Well that looks different eh?  

OK so what’s happening here is – the back end loads the site on the front end. The front end calls the testing platform and proceeds to modify the page, then the front end instead of setting a client side cookie, or using localStorage (which are both destined to be destroyed by ITP) the front asks the server to set a HTTP cookie via a response containing a set cookie header command.

That gets kind of in the weeds – so won’t go too far down that road.

Why this is important:

  1. Your web server won’t automatically do this, but it’s required to prevent cell-reassignment.
  2. You may not have control over your server to enable this so you have to make a choice:
  • You could build an endpoint out in the cloud somewhere that does it.
  • You could use your Content Delivery Network to set segmentation for some tools.
  • You could find a new host and move the entire site to gain the capability to do so.
  • You could decide just to exclude Safari traffic from your tests.
  • You could decide to close down the entire testing program.
  • You decide to change testing platforms to a server side solution.

So of the seven choices above five are going to need additional investment by the company paying for the platform. You’ll need developers to build out a solution. That’s just to keep the system working as it did in August. The reason the contract holder of the testing software has to do the development deals with how cookies have to be set in order to be 1st Party and follow the Same-Origin Policy.

The other two – either don’t have a program so it’s no longer a problem, or you exclude Safari and deal with longer times to confidence as well as working with a non-representative sample.

I need to stress this: The platform isn’t broken, this is a limitation of the architecture for client side exclusive tools that need to set long term state. However I can understand that what is expected given the new restrictions of the browser, isn’t what is intended.

Does that address all of the things browsers could do? No, it very well could break again and cost you additional development resourcing.

Maybe you don’t want to deal with that?

You could consider a server side testing solution. They often work like this:

In this scenario, before the web page loads for the user, the server talks to the service and gets a segment, and as part of loading the web page, sets an HTTP cookie. Due to this design, Safari does not typically interfere with the test operation and additional investment isn’t needed because the platform in most cases is not impacted.

The downside here (because there always is one) is every test requires development resources to tinker with the server side code, and that commonly comes with a deployment so you see your testing timelines increase, and you likely can’t get the platform development team to build anything for security reasons.  

So in short – your platform doesn’t break, but it requires a special architecture and workflow to be viable.

The important lessons I’ve tried to explain over the last five pages are:

  • If you run an A/B testing program you need to evaluate how the platform works from a segmentation / state point of view. You want to be sure people are getting the experiences you think they are.
  • If you are running an exclusively client side program, there’s a high chance it’s broken, and you should look into that.
  • Expect more of the same as the privacy war continues.
  • Read the browser patch notes to be aware of changes.
  • Work with engineering to ensure whatever work is required to keep your platform running in light of the changes gets prioritized and done.

I hope this explained the situation and why the conversation around testing platforms will be changing and starting to favor server side tools.

Published inA/B TestingBrowser Updates