How to Look for Relationships in Your Data Driven Marketing
Customer data is interrelated in data driven marketing. It may seem at first glance that age and income represent two completely different aspects of a customer. But a relationship emerges when you look across your database as a whole. You’ll find that as customers age, their incomes tend to go up as well.
This tendency for two traits to share a common tendency is known as correlation. Such tendencies may be strong or weak or nonexistent. People’s heights might be very strongly correlated with their mothers’. But they probably aren’t quite so strongly correlated with their great-grandmothers’. They probably have nothing at all to do with what day of the year they were born on.
These tendencies may also be positive or negative. People’s total debt tends to go down as they get older and pay off mortgages and other loans. This is an example of a negative correlation.
Cause and effect in data driven marketing
The existence of a statistical tendency does not, by itself, imply that one thing in any way causes another. There is a correlation between the number of cigarette lighters that a person buys and their risk of lung cancer. But it’s the cigarettes they also buy, not the lighters, that explains this tendency. The connection between lighters and lung cancer is known as a spurious correlation.
There was a marketing program in a bank that was designed to increase deposits in CD accounts. The bank’s marketing team started to analyze the results of that program after it had been in market for a while. Initially, they noted that the number of CDs sold since the program was in market had jumped significantly. Great news! The campaign was working.
But when they tried to calculate the profit that had been generated by this wonderful campaign, they ran into a problem. Despite the fact that they were opening all these new accounts, the overall dollar volume hadn’t changed much.
After digging around a bit, they discovered that in order to support the CD campaign, the branch network had put an incentive program in place for tellers. This program, not surprisingly, rewarded them for opening CD accounts. But the rewards were based on the number of accounts they opened.
The team went back through the data and looked at the customers who were opening new CD accounts. It turns out that this wasn’t new business at all. Rather, the volume was due to expiring CDs that were being rolled over into new accounts. The tellers were simply rolling them over into multiple new accounts. A $20,000 CD was being rolled over into four $5,000 accounts.
Their initial excitement over the success of our marketing program turned out to be unjustified. They had mistaken the spurious correlation between our marketing campaign and the new accounts for cause and effect. The actual cause was the teller incentive program.
You need to be careful about attributing cause and effect to correlations. This is especially true when you’re evaluating the success of your marketing campaigns. The best way to do this is to design your marketing campaigns in the same way that scientific experiments are designed.
Spurious correlations can be useful in data driven marketing
Spurious or not, you can take advantage of statistical tendencies to enhance the power of your marketing database. You will run into situations where you know or suspect that a particular customer trait is central to understanding customer behavior. The problem is that you don’t carry that trait in your database.
Here’s where correlations come in. You may very well have a variable in your database that is correlated with the trait you are interested in, called a proxy variable. Survey research often uncovers these kinds of correlations. There is also a great deal of demographic research in the public domain — census data, for example — that analyzes connections between variables.
By replacing one variable with a different, correlated variable — called a proxy variable — you can essentially make use of information that you don’t actually have. The proxy variable certainly won’t be the same as actually having the information you want. But finding a proxy variable that’s highly correlated with the trait you’re interested in is the next best thing.
In the lighter versus cigarette example, it’s clear that attempting to reduce lung cancer rates by targeting the sale of lighters is misguided. The lighters aren’t the source of the problem. But if all you want is to identify people who are at risk of lung cancer, then lighter purchases would make a reasonable proxy.