Why "number of tests" is a terrible success metric

May 2019

“How many tests are you running?” You may have a client or director breathing down your neck about this right now. “Number of tests launched per quarter” is a common metric for growth teams at Fortune 500 companies. It’s all wrong. Let’s look at why, and what you should measure instead.

What’s wrong with measuring the number of tests launched?

A couple things. Most glaringly, it incentivizes the wrong type of testing.

If your success is determined by how many tests you run, will you ever run a test with 5 variations? Why would you? You’re much too busy. But a single experiment, on a high impact page (or element), with 5 variations, is better than 5 individual A/B tests.

First off, the single A/B/C/D/E test will almost certainly yield a winner - a high impact winner. Most likely you’ll get 1 or 2 wins from your 5 A/B tests, and what will be their impact?

With a single variation, you will have improved, but not optimized. And the pressure to run so many tests means your winner might be on a thin slice of your total audience, or a lower value page. So your win is less of a win, two times over.

So why is it such a common metric?

It’s really easy to measure!

It’s a nice, round whole number. Anybody can log in to a testing platform and count how many tests say “Running.” And anybody can set meaningless goals around this metric. (“Hmm, 8 tests last quarter. Let’s make it 10 in Q3.”)

Moreover, it captures an intention that is well founded but slightly misguided: to test aggressively. The drive to “test aggressively” comes from a good place; your team wants great results, and they want them as quickly as possible. It’s easy to go from “we want to make a huge difference” to “we need to test aggressively” to “we need to run 14 tests per quarter.” But don’t!

What should we measure instead?

For simply measuring activity within a testing program, “number of experiences” is a better metric than “number of tests.” The difference is subtle, but the former metric empowers you to run that 5 variation test.

Still, activity isn’t value; busyness isn’t productivity. The ultimate metric of your testing program is impact on revenue. If you can’t get there, impact on conversions that are directly tied to revenue is great.

Express this using whole numbers: “This quarter we put experiences into production that we expect will lead to an additional 3,200 conversions in Q3.” You’re free to note the percentage lift, too, as long as you don’t only talk in percents. (“… This is a 10.3% lift over the previous benchmark of 2,900 conversions per quarter.”)

Why? Percents sound impressive, but they don’t communicate actual impact.

If you just say “10.3% lift” you haven’t said anything helpful. Lift in what? Clickthroughs for mobile visitors coming from paid Instagram campaigns? Cool, but how much are we paying you to get us these clicks, and how much money do they make us? Anticipate hard questions like this now, so you can back out preemptively and run a different test.

If you’re on a team who measures success by number of tests launched, a great first step is to broach the subject of counting experiences instead of tests. This can shift the discussion from “where else can we test?” to “where is the most important place to be testing?” And that should steer you in the direction of actually having some impressive impact to communicate.

Why "number of tests" is a terrible success metric

What’s wrong with measuring the number of tests launched?

So why is it such a common metric?

What should we measure instead?

Next time I write something, I'll let you know.