A client asks:
“For our next email campaign, we have 6,000 names available. Should we test 3,000 names first with two subject lines and then roll out the remaining 3,000 using the winning message? Or just send out all 6,000 and do an A/B test on the entire list?”
My answer: Option B (send the entire list). Here’s why:
Normally, I’m all for the “test then roll out” strategy but here the client didn’t really have the numbers to pull it off. With a larger list, testing message using a sample cell first is a great way to maximize overall response and minimize risk. For example, when broadcasting to a combination of in-house and rented lists, it often pays to test message (or offer, or whatever) on the free, in-house names first, then roll-out the winning version to the more costly rented names. (Note: since the in-house list isn’t a sample of the rented file, this isn’t a pure test, technically speaking.)
In the client scenario above, however, even if the email campaign in question were to generate a response rate of say, 4 percent, the two test cells would need to differ by a significant margin (much more than 1 percent) in order to achieve statistical significance. For example:
1,500 names x 4 percent = 60 responses
1,500 names x 5 percent = 75 responses
A difference of 15 names isn’t large enough to be confidently attributed to something other than sampling error. I reach that conclusion based on more than just hunch – you can determine statistical significance easily through this simple online calculator.
According to the calculator, in the scenario above, the second test cell would need to generate a response rate of 6 percent – a full 50 percent higher than the control – in order to be statistically significant. Or, put another way, in order for a 1 percent difference in response to be significant, the sample sizes would need to be in excess of 3,000 names each.
Nice post Howard. Your points are right on, but I’d like to add one thought.
I believe it is worth considering that part of the problem lies in what is being tested. If offer A were to drive a 4% response rate and offer B were to drive a 6% response rate, then the results would in fact represent a statistically significant difference. There wouldn’t be an issue determining a clear winner.
Often it seems that marketers put the cart before the horse by making statistical significance their goal in testing. It isn’t. The goal is to learn things that help us make our businesses better. Sometimes, if the results from a test aren’t statistically significant, it just means you need to go back to the drawing board and find something better to test.