Understand Control Groups and Random Sampling in Data Driven Marketing
Measuring results is a fundamental part of data driven marketing. You are in a unique position to be quite precise in quantifying the effectiveness of your campaigns. You know exactly whom you contacted, when, and how. And you know who responded. This allows you to conduct your campaigns the same way a scientist would conduct an experiment.
How to use control groups in data driven marketing
When a drug company wants to test the effectiveness of a new drug, they don’t just give it to a bunch of people and see if they respond. They design an experiment where some people get the drug and some people get a benign, neutral substance that has no effect, called a placebo. This placebo group is known as a control group.
The basic idea is that they want to isolate and measure the effects of the drug and only the drug. It might happen that 5 percent of those taking the drug develop a rash. If 5 percent of the people taking the placebo also develop a rash, then it is not likely that the experimental drug was the cause.
Control groups play a central role in your measurement process. The idea is the same as in the drug experiment. Once you have identified your target audience for a particular campaign, you need to send some of them a placebo. Actually, you need to send some of them nothing at all. You just need to flag them in your database as members of the control group for this campaign.
When it comes time to analyze responses, you check to see how many customers from the control group responded without being contacted. This may sound silly. How would they respond if you didn’t even send them the offer? But remember that your company has other marketing initiatives out there designed to drive sales.
You then compare the response rate of the control group with that of the group you actually mailed. This allows you to calculate how many of the responses can reasonably be attributed to your campaign.
How to select customers at random in data driven marketing
After starting a new job as a database marketing analyst years ago, one of the first assignments was to do a response analysis on a fairly large marketing campaign. Taking everything they said at face value, this person compared response rates between the mail and control groups. Much to their surprise, the control group outperformed the mail group. And by a large margin.
One noticeable item was the response rates varied significantly by geography. One brand was more established in some places than others. This person started asking about the control group selection. Who did it? How was it done?
Turns out, the company had recently hired a new vendor to execute its mailings. The previous vendor had always pulled its control groups for them, so they asked the new vendor to do so as well. But the new vendor didn’t really understand control groups. It was asked to hold out 20,000 names in a control group, so they simply peeled the first 20,000 names off the list.
Now, one thing that mail vendors do is prepare mail for bulk rates from the USPS. This involves, among other things, sorting the mail file by zip code. Our entire control group came from the top of a sorted list. Everyone in it lived in a small number of zip codes in a region that had an unusually high response rate. This made meaningful measurement of the campaign’s success impossible.
Your control group needs to accurately reflect your target audience. If it doesn’t, then your experiment is flawed, and your measurements will be suspect or meaningless. The best way to ensure that your control group is representative of your target audience is to select its members randomly.
Selecting a group of customers at random is called random sampling. Creating random samples is a job for your technical team. Every list that’s pulled out of a database is sorted by some customer trait or other. That sorting can render your measurement plan completely ineffective.
It’s a good idea to have at least a general sense of how your technical team is selecting your control group. Database and analytic software, even spreadsheets, have the ability to generate random numbers. These numbers typically range from 0 to 1.
To split the file in half, you simply generate a random number for each record. If the number is less than .5, you put it in the target audience. If it’s greater than .5, you put it in the control group.