Statistics Conundrums: Dealing with Survey Nonresponders

Statistics All-in-One For Dummies

Nonresponders are always a problem when you're calculating the results of a survey. Before you can crunch the numbers in all the surveys you get back, you have to decide what to do about the surveys you didn't get back.

A newspaper article on the latest survey says that 50 percent of the respondents said blah blah blah. The fine print says that the results are based on a survey of 1,000 adults in the United States. But wait — is 1,000 the actual number of people selected for the sample, or is it the final number of respondents? You may need to take a second look; those two numbers hardly ever match.

For example, Jenny wants to know what percentage of people in the United States have ever knowingly cheated on their taxes. In her statistics class, she found out that if she gets a sample of 1,000 people, the margin of error for her survey is only plus or minus 3 percent, which she thinks is groovy. So she sets out to achieve the goal of 1,000 responses to her survey. She knows that, these days, it's hard to get people to respond to a survey, and she's worried that she may lose a great deal of her sample that way, so she has an idea. Why not send out more surveys than she needs so that she gets 1,000 surveys back?

Jenny looks at several survey results in the newspapers, magazines, and on the Internet, and she finds that the response rate (the percentage of people who actually respond to a survey) is typically around 25 percent. (In terms of the real world, this is generous, believe it or not. But think about it: How many surveys have you thrown away lately?) So Jenny figures that if she sends out 4,000 surveys and gets 25 percent of them back, she has the 1,000 surveys she needs to do her analysis, answer her question, and have that small margin of error of plus or minus 3 percent.

Jenny conducts her survey, and just like clockwork, out of the 4,000 surveys she sends out, 1,000 come back. She goes ahead with her analysis and finds that 400 of those people reported cheating on their taxes (40 percent). She adds her margin of error and reports, "Based on my survey data, 40 percent of Americans cheat on their taxes, plus or minus 3 percentage points."

Now hold the phone, Jenny. She only knows what those 1,000 people who returned the survey said. She has no idea what the other 3,000 people said. And here's the kicker: Whether or not someone responds to a survey is often related to the reason the survey is being done. It's not a random thing. Those nonrespondents (people who don't respond to a survey) carry a lot of weight in terms of what they're not taking time to tell you.

For the sake of argument, suppose that 2,000 of the people who originally got the survey were uncomfortable with the question because they do cheat on their taxes, and they didn't want anyone to know about it, so they threw the survey in the trash. Suppose that the other 1,000 people don't cheat on their taxes, so they didn't think it was an issue and didn't return the survey. If these two scenarios were true, the results would look like this:

Cheaters = 400 (surveyed) + 2,000 (nonrespondents) = 2,400

These results raise the total percentage of cheaters to 2,400 divided by 4,000 — 60 percent. That's a huge difference!

You could go completely the other way with the 3,000 nonrespondents. You can suppose that none of them cheat, but they just didn't take the time to say so. If you knew this info, you would get 600 (surveyed) + 3,000 (nonrespondents) = 3,600 noncheaters. Out of 4,000 surveyed, this is 90 percent. The truth is likely to be somewhere between the two preceding examples, but nonrespondents make it too hard to tell.

And the worst part is that the formulas Jenny uses for margin of error don't know that the information she put into them is based on biased data, so her reported 3 percent margin of error is wrong. The formulas happily crank out results no matter what. It's up to you to make sure that what you put into the formulas is good, clean info.

Getting 1,000 results when you send out 4,000 surveys is nowhere near as good as getting 1,000 results when sending out 1,000 surveys (or even 100 results from 100 surveys). Plan your survey based on how much follow-up you can do with people to get the job done, and if it takes a smaller sample size, so be it. At least the results have a better chance of being statistically correct.