Predictive Analytics For Dummies
Book image
Explore Book Buy On Amazon
There are two main challenges of big data as it applies to predictive analytics: velocity and volume. These are (respectively) the rate at which data is being generated, received, and analyzed, and the growing mass of data.

Data velocity

Velocity is the speed of an object moving in a specific direction. Data velocity refers to another challenge of big data: the rate at which data is being generated, captured, or delivered. The challenge is figuring out how to keep up.

Think of the data being generated by a cellphone provider. It includes all customers’ cellphone numbers, call durations, and GPS locations (for openers). This data is growing all the time, making the task of capturing smart data from big data even more challenging.

So how do you overcome this challenge? There isn’t one simple solution available yet. However, your team can (in fact, must) decide

  • How often you can capture data
  • What you can afford in resources and finances
  • Which type of data you're going to model (for example, streaming or one-time data)
  • Whether you're modeling streaming data or only deriving prediction scores of one or multiple records
If (for example) you own a supercomputer and you have the funds, then you should capture as much data as you can — but you might also need to take into consideration how often that data is changing.

High volume of data

A common mistake that people make when they talk about big data is to define it as merely a large amount of data. Big data isn't just about large volumes of data; it’s more about a wide variety of data (yes, in huge amounts) generated at high speed and frequency. Big data spans three dimensions in spiral exponential fashion; it looks like a tornado.

Big data as tornado.

Big data is “big” not only because of its large volume (such as numbers of rows, or columns, or comprehensiveness); it’s also — and mainly — about those other three dimensions: volume, velocity and variety.

About This Article

This article is from the book:

About the book authors:

Anasse Bari, Ph.D. is data science expert and a university professor who has many years of predictive modeling and data analytics experience.

Mohamed Chaouchi is a veteran software engineer who has conducted extensive research using data mining methods.

Tommy Jung is a software engineer with expertise in enterprise web applications and analytics.

This article can be found in the category: