Basics of Static and Streamed Data in Predictive Analyics

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

Data in predictive analytics can be identified as streamed, static, or a mix of the two. Streamed data changes continuously; examples include the constant stream of Facebook updates, tweets on Twitter, and the constantly changing stock prices while the market is still open.

Streamed data is continuously changing; static data is self-contained and enclosed. The problems associated with static data include gaps, outliers, or incorrect data, all of which may require some cleansing, preparation, and preprocessing before you can use static data for an analysis.

As with streamed data, other problems may arise. Volume can be a problem; the sheer amount of non-stop data constantly arriving can be overwhelming. And the faster the data is streaming in, the harder it is for the analysis to catch up.

The two main models for analyzing streamed data are as follows:

  • Examine only the newest data points and make a decision about the state of the model and its next move. This approach is incremental — essentially building up a picture of the data as it arrives.

  • Evaluate the entire dataset, or a subset of it, to make a decision each time new data points arrive. This approach is inclusive of more data points in the analysis — what constitutes the “entire” dataset changes every time new data is added.

Depending on the nature of your business and the anticipated impact of the decision, one model is preferable over the other.

Some business domains, such as the analysis of environmental, market, or intelligence data, prize new data that arrives in real time. All this data must be analyzed as it’s being streamed — and interpreted not only correctly but right away.

Based on the newly available information, the model redraws the whole internal representation of the outside world. Doing so provides you with the most up-to-date basis for a decision you may need to make and act upon quickly.

For example, a predictive analytics model may process a stock price as a data feed, even while the data is rapidly changing, analyze the data in the context of immediate market conditions existing in real time, and then decide whether to trade a particular stock.

Clearly, analyzing streamed data differs from analyzing static data. Analyzing a mix of both data types can be even more challenging.