It might seem unfair, but it’s better to leave small audience segments out of your experiments entirely than to include them arbitrarily.
Let’s imagine you’re optimizing a landing page for paid search visitors who search for branded terms. There’s a ton of them, because your brand is so well known 🏆. You can run a test on this audience and get results in 2 weeks.
You’re doing a bit of non-brand SEM too - and bringing those visitors to a similar set of landing pages. There aren’t as many of them; it’d take you 8-12 weeks to run a test on this audience. Should you include the non-brand audience in your brand experiments?
It seems obvious that more testing is better, bigger audiences get results faster, and besides, what’s the alternative - not testing non-brand at all?
But it all depends on how it maths out.
Since your branded search audience is substantially bigger than non-brand, the former will dominate test results and the latter will just be along for the ride.
You’ll get “significant” results that you apply to both groups:
…when in reality they’re only significant with respect to the branded search visitors:
You’ll be making changes to the non-brand visitor experience with the confidence that comes from a conclusive test … except you didn’t really get conclusive results.
A better approach is to keep the test simple, run it for the larger branded search audience, draw conclusions for that audience, and move on with your life. No testing on non-brand - until that channel has sufficient traffic.
This applies not just to different paid search audiences, but to any situation where audiences of different sizes have different experiences.
(Should you include tablet visitors in your desktop-only navigation redesign test? Not if they’re only 7% of traffic - either you’ll change the tablet experience based on insufficient data, or you’ll let the experiment run forever in order to get valid tablet results … wasting time you could’ve spent iterating on desktop.)
So don’t feel bad about leaving out segments, channels, or devices that don’t get enough traffic for testing. No test is better than a fake test.