Identifying Bad Statistical Samples - dummies

# Identifying Bad Statistical Samples

After a statistical study has been designed, be it a survey or an experiment, you need to select a sample of individuals who represent a cross-section of the entire population. This is critical to producing credible data in the end.

Statisticians have a saying, “Garbage in equals garbage out.” If you select your subjects (the individuals who will participate in your study) in a way that is biased — that is, favoring certain individuals or groups of individuals — then your results will also be biased. It’s that simple.

Suppose Bob wants to know the opinions of people in your city regarding a proposed casino. Bob goes to the mall with his clipboard and asks people who walk by to give their opinions. What’s wrong with that? Well, Bob is only going to get the opinions of a) people who shop at that mall; b) on that particular day; c) at that particular time; d) and who take the time to respond.

Those circumstances are too restrictive — those folks don’t represent a cross-section of the city. Similarly, Bob could put up a Web site survey and ask people to use it to vote. However, only people who know about the site, have Internet access, and want to respond will give him data, and typically only those with strong opinions will go to such trouble. In the end, all Bob has is a bunch of biased data on individuals that likely don’t represent the city at all.

To minimize bias in a survey, the key word is random. You need to select your sample of individuals randomly — that is, with some type of “draw names out of a hat” process.

Note that in designing an experiment, collecting a random sample of people and asking them to participate often isn’t ethical (particularly when the study is related to health issues) because experiments impose a treatment on the subjects. What you do is send out requests for volunteers to come to you. Then you make sure the volunteers you select from the group represent the population of interest and that the data is well collected on those individuals so the results can be projected to a larger group.