What's the Center of the Data?

Alan Anderson

David Semmelroth

Updated

2016-03-26 07:28:26

From the book

Statistics for Big Data For Dummies

Download E-Book

Statistics for Big Data For Dummies

Explore Book

Download E-Book

Statistics for Big Data For Dummies

Explore Book

You identify the center of a dataset with several different summary measures. These include the big three: mean, median, and mode. You calculate the mean of a dataset by adding up the values of all the elements and dividing by the total number of elements. For example, suppose a small dataset consists of the number of days required to receive a package by the residents of an apartment complex:

1, 2, 2, 4, 7, 9, 10

The mean of this dataset would be the following:

The average length of time for the residents to receive a package is 5 days.

The median of a dataset is a value that divides the data in half. The first half contains the smallest elements and the second half consists of the largest elements. In the previous example, because the data consist of seven observations, the fourth smallest value would be the median:

1, 2, 2, 4, 7, 9, 10

The median is 4, because half of the observations are less than 4, and half are greater than 4.

The mode of a dataset is simply the most frequently occurring value. With the package delivery example, the mode is 2.

For a real-world example, this figure shows a histogram for daily returns to ExxonMobil stock in 2013.

Histogram of daily returns to ExxonMobil stock for 2013.

Each bar represents a range of values; the width of each interval is 0.005. The heights of the bars indicate how many returns fell within each interval. The histogram makes it easy to see which ranges of values occurred the most frequently and which occurred the most infrequently.

The histogram shows that most of the returns are close to the mean, which is 0.000632 (0.0632 percent). The median is −0.000118, and the mode could be considered to be the range of values between −0.005 and 0.

About This Article

About the book author:

Alan Anderson, PhD is a teacher of finance, economics, statistics, and math at Fordham and Fairfield universities as well as at Manhattanville and Purchase colleges. Outside of the academic environment he has many years of experience working as an economist, risk manager, and fixed income analyst. Alan received his PhD in economics from Fordham University, and an M.S. in financial engineering from Polytechnic University.

David Semmelroth has two decades of experience translating customer data into actionable insights across the financial services, travel, and entertainment industries. David has consulted for Cedar Fair, Wachovia, National City, and TD Bank.

This article can be found in the category:

Big Data

Hot off the press

Explore Related content

Statistics for Big Data For Dummies

Big Data For Dummies

Big Data For Small Business For Dummies

Book & Article Categories

Book & Article Categories

Collections

What's the Center of the Data?

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

What's the Center of the Data?

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Beyond Boundaries: Unstructured Data Orchestration

Big Data For Dummies Cheat Sheet

Statistics for Big Data For Dummies Cheat Sheet

Big Data for Small Business For Dummies Cheat Sheet

Integrate Big Data with the Traditional Data Warehouse

Best Practices for Big Data Integration

How to Analyze Big Data to Get Results

Big Data Planning Stages

Ten Hot Big Data Trends

Explore the Big Data Stack

Defining Big Data: Volume, Velocity, and Variety

Understanding Unstructured Data

Basics of Big Data Infrastructure

The Role of Traditional Operational Data in the Big Data Environment

Laying the Groundwork for Your Big Data Strategy

Managing Big Data with Hadoop: HDFS and MapReduce

Identify the Data You Need for Your Big Data

Layer 2 of the Big Data Stack: Operational Databases

Manage Virtualization for Big Data

Layer 4 of the Big Data Stack: Analytical Data Warehouses