Determining Causation with Customer Analytics - dummies

Determining Causation with Customer Analytics

By Jeff Sauro

While correlation alone is not causation, there are ways to determine and show causation between customer variables. The amount of faith you can have in claims of causation depends on the method used to collect the data. While you may think that a new web page design resulted in more page views, it could be that page views were already increasing.

You can use any of five methods to make claims about causation, starting from the strongest and proceeding through the weakest.

Randomized experimental study

Randomly assigning participants to different design treatments and/or a control in a research study is an experimental design. For example, if you wanted to know which design customers would understand the most on a check-out page, you can create three different designs:

  • The dependent variable could be something like

    • Accuracy in answering questions

    • Difficulty in checking out

    • Confidence in checking out

    • Time to check out

  • The independent variable is the design — with three variations.

The hallmark of experimental research is randomly assigning participants to different treatments. You identify the design that users correctly selected and were most confident in using to make their selection.

There are all sorts of variables you can’t control for — or are unaware of — that could impact results. But by randomly assigning participants to different designs or treatment conditions, you spread those nuisance variables evenly across designs. This increases the internal validity and generalizability of the findings.

As another example, researchers in Europe conducted an experiment in which they manipulated both the usability and visual appeal of an online e-commerce website. They essentially took one website, made the navigation intuitive or not intuitive, and then changed the colors and contrast to be appealing or unattractive.

They found that customers find more usable websites more attractive. The researchers concluded that better usability increases opinions about attractiveness. Their conclusion is well-substantiated because they used a randomized experimental design.

Experiments (with random assignment) provide the strongest controls against extraneous variables and provide the highest levels of internal validity. These generate the strongest types of research results. But what happens if you cannot randomly assign participants?

Quasi-experimental design

If you want to test different conditions, but you cannot randomly assign participants to the different conditions, then the study is quasi-experimental. For example, you might want to know if customers find the beta version of a software product more usable than an existing version. Customers of beta software usually volunteer to use the software during the beta-test period.

This self-selection (non-random) assignment introduces a potential source of bias into the results. It has higher external validity because these groups are naturally segmented, but has lower internal reliability.

When you compare attitudes of usability (say from the SUS or SUPR-Q) from the beta software customers to the existing version customers and find a difference, the difference could be due to differences in the type of people using the software and not actual differences in the attitude. This type of problem is confounding and makes the quasi-experimental design type less internally valid than the experimental condition.

The weakness with quasi-experimental studies is that you can never be as sure as you can with random assignment that any increase in sales is attributable to the variable (in this case, sales) or to other nuisance variables (in this case, just differences between the markets).

Correlational study

A correlational study, as the name suggests, is when you look at the relationship between two variables and report the correlation. For example, the relationship between product usability and likelihood to recommend is a strong positive correlation (meaning ease is strongly associated with, and likely predicts, much of why users do and don’t recommend products).

While correlational studies provide valuable results, they don’t have random assignment and the independent variables aren’t manipulated, which lessens the internal validity of the findings and weakens the case for causation.

The next time you hear that one customer metric causes another metric, look to identify how that was determined. Chances are it was done with either a correlational study or a quasi-experimental design. That doesn’t mean one variable doesn’t cause another; it just means you can’t be as confident.

Single-subjects study

It’s often the case that getting access to customers is extremely difficult. For example, you might be interested in whether a new interface to a PET scanner reduces the time it takes attending radiologists to adjust a setting on the scanner.

If you had access to one of these customers, you could ask her to perform a task on the existing software version three times, record how long it took to complete, have her attempt the same task three times on the new software, and finally, have her attempt it again three times on the old version. The figure shows how this data looks on a scatterplot.


This type of single-subject study uses what’s called an ABA condition (where A is the existing software and B is the new software). The repeated trials help establish stability in the measures and increase the internal validity of the finding (as much as you can from a single subject).

The obvious limitation with the single-subject design is generalizability. All you know is that when you manipulate an independent variable (the software), task time goes down for one user. There could be a number of variables you’re not accounting for. For this reason, single-subject designs aren’t used very often in customer research.

You can actually use more than one participant in a single-subject design (for example, two or three radiologists) and use the same technique to establish the pattern. To be more sophisticated in your analysis, you can also use time series analysis to examine trends over time and by condition for each user or the data in aggregate.


Unfortunately, many business decisions are made based on opinion or hearing from a vocal customer or sales rep. While a good story of a successful product strategy can be convincing emotionally, it carries little weight when establishing causation.