How to Deal with Constantly Changing Data in Predictive Analytics
There are two main challenges of big data as it applies to predictive analytics: velocity and volume. These are (respectively) the rate at which data is being generated, received, and analyzed, and the growing mass of data.
Velocity is the speed of an object moving in a specific direction. Data velocity refers to another challenge of big data: the rate at which data is being generated, captured, or delivered. The challenge is figuring out how to keep up.
Think of the data being generated by a cellphone provider. It includes all customers’ cellphone numbers, call durations, and GPS locations (for openers). This data is growing all the time, making the task of capturing smart data from big data even more challenging.
So how do you overcome this challenge? There isn’t one simple solution available yet. However, your team can (in fact, must) decide
- How often you can capture data
- What you can afford in resources and finances
- Which type of data you’re going to model (for example, streaming or one-time data)
- Whether you’re modeling streaming data or only deriving prediction scores of one or multiple records
If (for example) you own a supercomputer and you have the funds, then you should capture as much data as you can — but you might also need to take into consideration how often that data is changing.
High volume of data
A common mistake that people make when they talk about big data is to define it as merely a large amount of data. Big data isn’t just about large volumes of data; it’s more about a wide variety of data (yes, in huge amounts) generated at high speed and frequency. Big data spans three dimensions in spiral exponential fashion; it looks like a tornado.
Big data is “big” not only because of its large volume (such as numbers of rows, or columns, or comprehensiveness); it’s also — and mainly — about those other three dimensions: volume, velocity and variety.