How to Deal with Missing Data from a Clinical Trial

By John Pezzullo

Most clinical trials have incomplete data for one or more variables, which can be a real headache when analyzing your data. The statistical aspects of missing data are quite complicated, so you should consult a statistician if you have more than just occasional, isolated missing values. Here are some commonly used approaches to coping with missing data:

  • Exclude a case from an analysis if any of the required variables for that analysis is missing. This approach can reduce the number of analyzable cases, sometimes quite severely (especially in multiple regression, where the whole case must be thrown out, even if only one of the variables in the regression is missing).

    And if the result is missing for a reason that’s related to treatment efficacy, excluding the case can bias your results.

  • Replace (impute) a missing value with the mean (or median) of all the available values for that variable. This approach is quite common, but it introduces several types of bias into your results, so it’s not a good technique to use.

  • If one of a series of sequential measurements on a subject is missing (like the third of a series of weekly glucose values), use the previous value in the series. This technique is called Last Observation Carried Forward (LOCF) and is one of the most widely used strategies. LOCF usually produces “conservative” results, making it more difficult to prove efficacy.

    This approach is popular with regulators, who want to put the burden of proof on the drug.

More complicated methods can also be used, such as estimating the missing value of a variable based on the relationship between that variable and other variables in the data set, or using an analytical method like mixed-model repeated measures (MMRM) analysis, which uses all available data and doesn’t reject a case just because one variable is missing.