TheShed

Most A/B Tests are Illusionary

Category: Library
#Experiment #Data

Paper

MOST WINNING A/B TEST RESULTS ARE ILLUSORY Martin Goodson (DPhil) Jan 2014

Summary

Demonstrates how application of standard statistical techniques are equally valid when applied to A/B testing, and how missing these can result in erronous conculsions being drawn from A/B test results.

Statistical Power

Simply, that the size of the sample you measure increases the power of the result where power is the reliability of the measure to indicate a difference when there really is a difference.

For A/B testing this means you need to run an experiement for long enough that what your measuring is actually a difference. The paper includes a methodology for calculating sample size.

Multiple testing

• Performing many tests, not necessarily concurrently, will multiply the probability of encountering a false positive. • False positives increase if you stop a test when you see a positive result.

Regression to the mean

Over a period of time even random results will regress to the mean. If you use a smaller time window you may identify early winners that are in fact random winners. Look out for the trends over time — if an initial uplift in A/B tests falls you may be observing regression to the mean.

Final quote

You can increase the robustness of your testing process by following this statistical standard practice:

• Use a valid hypothesis - don’t use a scattergun approach

• Do a power calculation first to estimate sample size

• Do not stop the test early if you use ‘classical methods’ of testing

• Perform a second ‘validation’ test repeating your original test to check that the effect is real

Related