Biostatistics For Dummies
To estimate sample size in biostatistics, you must state the effect size of importance, or the effect size worth knowing about. If the true effect size is less than the “important” size, you don’t care if the test comes out nonsignificant. With a few shortcuts, you can pick an important effect size and find out how many participants you need, based on that effect size, for several common statistical tests.

All the graphs, tables, and rules of thumb here are for 80 percent power and α = 0.05. In other words, the guidance applies to calculating sample size you need in order to have an 80 percent chance of getting a p value that’s less than or equal to 0.05. If you want sample sizes for other values of power and α, use these simple scale-up rules:

• For 90 percent power instead of 80 percent: Increase N by a third (multiply N by 1.33).
• For α = 0.01 instead of 0.05: Increase N by a half (multiply N by 1.5).
• For 90 percent power and α = 0.01: Double N (multiply N by 2).

Estimating sample size for correlation tests

For a correlation test in biostatistics (such as the Pearson or Spearman test), pick the scatter chart that looks like an important amount of correlation. Each chart shows the value of r (the correlation coefficient) and the required number of participants who provide complete data, meaning they each provide an x and a y value. For example, if the scatter chart in the lower-left corner (corresponding to r = 0.6) appears to show an important amount of correlation, you’ll need analyzable data from about 20 participants.

Credit: Illustration by Wiley, Composition Services Graphics

For other r values that aren’t in the preceding scatter charts, use this rule of thumb to estimate sample size: You need about 8/r2 – 3 analyzable participants.

Sample size estimation for unpaired student t tests

In biostatistics, when comparing the means of two independent groups of participants using an unpaired Student t test, the effect size is expressed as the ratio of Δ (delta, the difference between the means of two groups) divided by σ (sigma, the within-group standard deviation).

Each chart in the following figure shows overlapping bell curves that indicate the amount of separation between two groups, along with the effect size (Δ/σ) and the required number of analyzable participants in each group. Pick the chart that looks like an important amount of separation between the two groups. Notice the middle chart corresponds to a between-group Δ that is three-fourths as large as the within group σ. If you think the middle chart looks like an important amount of separation, then you need analyzable data from at least 29 participants per group (for a total of 58 participants contributing analyzable data).

Credit: Illustration by Wiley, Composition Services Graphics

For other Δ/σ values, use this rule of thumb to estimate sample size: You need about 16/(Δ/σ)2 analyzable subjects in each group.

Sample size estimation for paired student t tests

In biostatistics, when comparing paired measurements (such as changes between two time points for the same participant) using a paired Student t test, the effect size is expressed as the ratio of Δ (delta, the mean change) divided by σ (sigma, the standard deviation of the changes). Another, perhaps easier, way to express the effect size is by the relative number of expected participants with positive versus negative changes.

Each chart in the following figure shows a bell curve indicating the spread of changes, along with the effect size (Δ/σ), the ratio of positive to negative differences, and the required number of participants contributing analyzable data (a pair of measurements each). On the charts, the ratio of positive to negative differences is shown below each curve, and the vertical line represents no change (Δ = 0). To calculate sample size, select the chart that looks like an important amount of change (relative to the vertical line representing no change). For example, the middle chart corresponds to a mean Δ that is three-fourths as large as the σ of the changes. It shows about 3.4 times as many participants on the positive side compared to the negative. If this looks like an important amount of change, then you need 16 pairs of measurements (such as 16 participants, each with a pre-treatment and a post-treatment value).

Credit: Illustration by Wiley, Composition Services Graphics

For other Δ/σ values, use this rule of thumb to estimate sample size: You need about 8/(Δ/σ)2 + 2 pairs of measurements.

Estimating sample size when comparing two proportions

The proportion of participants having some attribute (such as responding to treatment) can be compared between two groups of participants by creating a cross-tab from the data, where the two rows represent the two groups, and the two columns represent the presence or absence of the attribute. In biostatistics, this cross-tab can be analyzed with a chi-square or Fisher Exact test.

To estimate the required sample size, you need to provide the expected proportions in the two groups. Look up the two proportions you want to compare at the left and top of the following table. It doesn’t matter which proportion you look up on which side. The number in the cell of the table is the number of participants who would be required to provide complete data in each group. Please note that the total required sample size is twice this number.

Credit: Illustration by Wiley, Composition Services Graphics

For example, looking at the table, if you expect 30 percent of patients with an untreated condition to have it resolve on its own, but you expect 40 percent of patients to have it resolve if treated with your new drug, you would look for the cell at the intersection of the 0.30 row and the 0.40 column (or vice versa), which contains the number 376. This means that in each group, you need 376 participants to contribute analyzable data for a total of 752 participants altogether. This is why it is important to enroll more participants in research studies than needed so that in case some drop out or are lost to follow-up, you still have analyzable data on enough of them to conduct your final analysis.