How to Use Data Smoothing in Predictive Analytics - dummies

How to Use Data Smoothing in Predictive Analytics

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

Data smoothing in predictive analytics is, essentially, trying to find the “signal” in the “noise” by discarding data points that are considered “noisy”. The idea is to sharpen the patterns in the data and highlight trends the data is pointing to.


The implication behind data smoothing is that the data consists of two parts: one part (consisting of the core data points) that signifies overall trends or real trends, and another part that consists mostly of deviations (noise) — some fluctuating points that result from some volatility in the data. Data smoothing seeks to eliminate that second part.

How to turn down the noise

Data smoothing operates on several assumptions:

  • That fluctuation in data is likeliest to be noise.

  • That the noisy part of the data is of short duration.

  • That the data’s fluctuation, regardless of how varied it may be, won’t affect the underlying trends represented by the core data points.

Noise in data tends to be random; its fluctuations should not affect the overall trends drawn from examining the rest of the data. So reducing or eliminating noisy data points can clarify real trends and patterns in the data — in effect, improving the data’s “signal-to-noise ratio.”

Provided you’ve identified the noise correctly and then reduced it, data smoothing can help you predict the next observed data point simply by following the major trends you’ve detected within the data.

Data smoothing concerns itself with the majority of the data points, their positions in a graph, and what the resulting patterns predict about the general trend of (say) a stock price, whether its general direction is up, down, or sideways.

This technique won’t accurately predict the exact price of the next trade for a given stock — but predicting a general trend can yield more powerful insights than knowing the actual price or its fluctuations.

A forecast based on a general trend deduced from smoothed data assumes that whatever direction the data has followed thus far will continue into the future in a way consistent with the trend. In the stock market, for example, past performance is no definite indication of future performance, but it certainly can be a general guide to future movement of the stock price.

Methods, advantages, and downsides of data smoothing

Data smoothing is not be confused with fitting a model, which is part of the data analysis consisting of two steps:

  1. Find a suitable model that represents the data.

  2. Make sure that the model fits the data effectively.

Data smoothing focuses on establishing a fundamental direction for the core data points by (1) ignoring any noisy data points and (2) drawing a smoother curve through the data points that skips the wriggling ones and emphasizes primary patterns — trends — in the data, no matter how slow their emergence. Accordingly, in a numerical time series, data smoothing serves as a form of filtering.

Data smoothing can use any of the following methods:

  • Random walk is based on the idea that the next outcome, or future data point, is a random deviation from the last known, or present, data point.

  • Moving average is a running average of consecutive, equally spaced periods. An example would the calculation of a 200-day moving average of a stock price.

  • Exponential smoothing assigns exponentially more weight, or importance, to recent data points than to older data points.

    • Simple: This method should be used when the time series data has no trend and no seasonality.

    • Linear: This method should be used when the time series data has a trend line.

    • Seasonal: This method should be used when the time series data has no trend but seasonality.

What these smoothing methods all have in common is that they carry out some kind of averaging process on several data points. Such averaging of adjacent data points is the essential way to zero in on underlying trends or patterns.

The advantages of data smoothing are

  • It’s easy to implement.

  • It helps identify trends.

  • It helps expose patterns in the data.

  • It eliminates data points that you’ve decided are not of interest.

  • It helps predict the general direction of the next observed data points.

  • It generates nice smooth graphs.

But everything has a downside. The disadvantages of data smoothing are

  • It may eliminate valid data points that result from extreme events.

  • It may lead to inaccurate predictions if the test data is only seasonal and not fully representative of the reality that generated the data points.

  • It may shift or skew the data, especially the peaks, resulting in a distorted picture of what’s going on.

  • It may be vulnerable to significant disruption from outliers within the data.

  • It may result in a major deviation from the original data.

If data smoothing does no more than give the data a mere facelift, it can draw a fundamentally wrong in the following ways:

  • It can introduce errors through distortions that treat the smoothed data as if it were identical to the original data.

  • It can skew interpretation by ignoring — and hiding — risks embedded within the data.

  • It can lead to a loss of detail within your data — which is one way that a smoothed curve may deviate greatly from that of the original data.

How seriously data smoothing may affect your data depends on the nature of the data at hand, and which smoothing technique was implemented on that data. For example, if the original data has more peaks in it, then data smoothing will lead to major shifting of those peaks in the smoothed graphs — most likely a distortion.

Here are some cautionary points to keep in mind as you approach data smoothing:

  • It’s a good idea to compare smoothed graphs to untouched graphs that plot the original data.

  • Data points removed during data smoothing may not be noise; they could be valid, real data points that are result from rare-but-real events.

  • Data smoothing can be helpful in moderation, but its overuse can lead to a misrepresentation of your data.

By applying your professional judgment and your business knowledge expertise, you can use data smoothing effectively. Removing noise from your data — without negatively affecting the accuracy and usefulness of the original data — is at least as much an art as a science.