How to Describe the Center of Continuous Data in R - dummies

How to Describe the Center of Continuous Data in R

By Andrie de Vries, Joris Meys

You have the dataset and you’ve formatted it to fit your needs in R, so now you’re ready for the real work. Analyzing your data always starts with describing it. This way you can detect errors in the data, and you can decide which models are appropriate to get the information you need from the data you have.

Which descriptive statistics you use depends on the nature of your data, of course.

Sometimes you’re more interested in the general picture of your data than you are in the individual values. You may be interested not in the mileage of every car, but in the average mileage of all cars from that dataset. For this, you calculate the mean using the mean() function, like this:

> mean(cars$mpg)
[1] 20.09062

You also could calculate the average number of cylinders those cars have, but this doesn’t really make sense. The average would be 6.1875 cylinders, and no cars drive with an incomplete cylinder. In this case, the median — the most central value in your data — makes more sense. You get the median from using the function median(), like this:

> median(cars$cyl)
[1] 6

There are numerous other reasons for calculating the median instead of the mean, or even both together. Both statistics describe a different property of your data, and even the combination can tell you something.