How to Describe a Data Set Statistically for the GED Science Test - dummies

How to Describe a Data Set Statistically for the GED Science Test

By Murray Shukyn, Achim K. Krull

The GED Science test will ask questions related to descriptive statistics. You can often summarize a collection of data (from an experiment, observations, or surveys, for example) by using descriptive statistics, numbers used to summarize and analyze the data and draw conclusions from it. Descriptive statistics for a collection of data include the following:

  • Frequency: The number of individuals in a group or the number of times a value occurs in a data set. For example, in a community of 360 children, 240 of them have brown, curly hair, so the frequency is 240.

    • Relative frequency: The number of individuals in a group or the number of times a value occurs in a data set relative to the total number of individuals in the group or the total number of values in the data set. For example, the relative frequency of children with brown, curly hair from the preceding bullet would be 240/360 = 2/3.

    • Cumulative frequency: The running total of frequencies, which is often represented in a linear graph. For example, if you’re tracking the appearance of a full moon, you have 1 occurrence roughly every 29.5 days, so at the end of 29.5 days, the cumulative frequency would be 1. At the end of 59 days, it would be 2; at the end of 88.5 days, it would be 3; and so on.

  • Measures of the center: The midpoint of the data set, which may be any of the following:

    • Mean is the average. To calculate the mean, total the values and divide by the number of values; for example, the mean of 3, 4, and 5 is (3 + 4 + 5) ÷ 3 = 12 ÷ 3 = 4.

    • Median is the middle value in the set when the values are arranged sequentially. Half of the numbers in a data set lie below the median and half lie above the median. If a data set contains an even number of values, average the two in the middle to find the median. For example, the median of 3, 4, 5, and 6 is (4 + 5) ÷ 2 = 9 ÷ 2 = 4.5.

    • Mode is the value that appears most often in the set.

  • Measures of the spread: How spread out the values are in a data set, which includes the following:

    • Range: The difference between the highest and the lowest value in the data set.

    • Interquartile range: The range of the middle 50 percent of the values in the data set. Think of this as the midrange.

Here are a few sample questions to help you warm up for statistics questions you’re likely to encounter on the GED Science test.

Researchers tested soil samples to estimate levels of soil compaction on a farm. The data are shown in the following table. To answer the questions, note that density = mass/volume.

Sample Soil Mass (grams) Soil Volume (cm3)
1 8.9 15.9
2 7.4 11.54
3 12.2 20.3
4 11.7 19.7
5 9.3 16.5
  1. The average soil density for all the samples is closest to which of the following?

    • (A) (5.9

    • (B) 0.06

    • (C) 0.59

    • (D) 1.696

  2. The median soil density for all the samples is closest to which of the following (rounded to 2 decimal places)?

    • (A) 0.59

    • (B) 0.594

    • (C) 0.564

    • (D) 0.6

  3. The range of soil density is closest to which of the following?

    • (A) 0.814

    • (B) 8.76

    • (C) 4.8

    • (D) 0.081

Now check your answers:

  1. To calculate the average soil density, total the mass for all soil samples, total the volume of all soil samples, and then divide the mass total by the volume total:

    which rounds up to 0.59, answer Choice (C). Another way to find the answer is to calculate the density of each soil sample and then calculate the average of those densities.

  2. To find the median soil density, calculate the soil density for each sample, arrange the soil densities from smallest to greatest, and choose the one in the middle, Choice (B), 0.594.

  3. The range of soil densities is the difference between the greatest and smallest soil density, so calculate the soil density for each sample and subtract the smallest from the largest to get 0.081, Choice (D).

Graphs, especially line and bar graphs, are often used to display data graphically. In most cases, when you see a question with a graph, the task of describing the data statistically has been done for you. The graph displays the data in a meaningful format, so you can visualize the mean, median, mode, and distribution of data. However, even if a question includes a graph, you may be asked to identify a statistical aspect of the data displayed.

To answer such questions, you may need to convert the visual data into an actual value. Here are a couple questions for practice.

  1. The Centers for Disease Control (CDC) released the graph shown here:

    Credit: Source material courtesy of Centers for Disease Control

    Which of the following statements best summarizes the data displayed in the graph?

    • (A) Rabies in raccoons is a growing problem.

    • (B) Incidents of rabies overall has been declining since 1993.

    • (C) Raccoons are primarily responsible for infecting people with rabies.

    • (D) Bats pose the greatest rabies risk to humans.

  2. Which of the plants in the following graph grows best with an average amount of sunshine?


    • (A) geranium

    • (B) fuchsia

    • (C) impatiens

    • (D) trillium

Check your answers:

  1. You can rule out Choices (C) and (D) because the graph shows no correlation between rabies in animals and in humans. You can rule out Choice (A) because the incidents of rabies in raccoons actually declined from 1993 to 2010, which is also the reason that Choice (B) is the correct answer.

  2. The average (mean) amount of sunshine is between Full Sun and Full Shade, which is labeled Partial Shade on the graph. The plant shown to grow best in partial shade is impatiens, Choice (C).