Video: Why Don't A/B Tests Add Up?

by April 25, 2023

Most companies use A/B tests to help them make product decision decisions. This three minute clip from my Mind the Gap presentation looks at why the cumulative results of these kinds of tests often don't add up to more significant impact.


In this sign-up flow, we can see there's a promo to try their subscription service. However, looking at the design, there doesn't seem to be any way not to take them up on their offer. Tapping "try it free" goes to two paid plan options.

But it turns out if you tap the little arrow in the upper left, you get taken to a map where you can unlock a bike and ride without the subscription plan. Not very clear in design.

I have no insider information, but I suspect this was a pretty well performing A-B test. Lots of people hit that try it free button.

You've probably heard a lot of people talk about the importance of A-B testing and the impact they can have on conversion. But once again we need to think about what are we measuring.

The classic A-B testing example is changing the color of a button and seeing results. In this example, 9% more clicks. When test results come back showing one item outperformed the other for a specific metric, it's pretty natural to want to implement that. So we make a product design choice because the data made us do it.

Isn't this how we improve user experiences by testing and seeing how user behavior improves? Yes, but it matters how you define and measure improves. Many companies have results that look like the button color example. In isolation, they show great short-term gains. But when you look at the long-term impact, the numbers tell a different story.

Multiple successful A-B tests you'd think would give you cumulative results much larger than what most companies end up seeing.

One of the most common reasons behind this is that we're not using tests with enough contrast. Looking at the impact of a button color change is a pretty low contrast comparison. A more significant contrast would be to change the action altogether, to do something like promoting a native payment solution by default on specific platforms.

The reason the button change is a low contrast change is it doesn't really impact what happens after someone clicks on it. They still go into the same checkout flow, the same forms.

The payment method change is higher contrast because it can completely alter the buying flow. In this case, shifting it from a multi-step form-based process to a single double tap with biometric authentication. So one way of making good use of testing is to try bigger, bolder ideas, ones that have higher risk-reward ratios.

The other way of using testing is basic good hygiene in product launches, using experiments to check outcomes when making changes, adding new features, and even fixing bugs. This gives you a way to measurably vet any updates and avoid causing problems by monitoring and being able to turn off new changes.