How to Work with Scales in a ggplot2 in R
How to Save the History of Your Work in R
How to Work with Lookup Tables in R

How to Describe the Center of Continuous Data in R

You have the dataset and you’ve formatted it to fit your needs in R, so now you’re ready for the real work. Analyzing your data always starts with describing it. This way you can detect errors in the data, and you can decide which models are appropriate to get the information you need from the data you have.

Which descriptive statistics you use depends on the nature of your data, of course.

Sometimes you’re more interested in the general picture of your data than you are in the individual values. You may be interested not in the mileage of every car, but in the average mileage of all cars from that dataset. For this, you calculate the mean using the mean() function, like this:

> mean(cars$mpg)
[1] 20.09062

You also could calculate the average number of cylinders those cars have, but this doesn’t really make sense. The average would be 6.1875 cylinders, and no cars drive with an incomplete cylinder. In this case, the median — the most central value in your data — makes more sense. You get the median from using the function median(), like this:

> median(cars$cyl)
[1] 6

There are numerous other reasons for calculating the median instead of the mean, or even both together. Both statistics describe a different property of your data, and even the combination can tell you something.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
How to Read Data from Excel into R
How to Make a Minimal Reproducible Example to Get Help with R
How to Work with Factors and Numeric Vectors in R Models and Calculations
How to Search for Individual Words in R
How to Test Data Normality Graphically in R