You’ll find a nuance about big data analysis. It’s really about small data. While this may seem confusing and counter to the whole premise, small data is the product of big data analysis. This is not a new concept, nor is it unfamiliar to people who have been doing data analysis for any length of time. The overall working space is larger, but the answers lie somewhere in the “small.”
Traditional data analysis began with databases filled with customer information, product information, transactions, telemetry data, and so on. Even then, too much data was available to efficiently analyze. Systems, networks, and software didn’t have the performance or capacity to address the scale. As an industry, the shortcomings were addressed by creating smaller data sets.
These smaller data sets were still fairly substantive, other shortcomings were quickly discovered; the most glaring was the mismatch between the data and the working context. If you worked in Accounts Payable, you had to look at a large amount of unrelated data to do your job. Again, the industry responded by creating smaller, contextually relevant data sets — big to small to smaller still.
You may recognize this as the migration from databases to data warehouses to data marts. More often than not, the data for the warehouses and the marts was chosen on arbitrary or experimental parameters resulting in a great deal of trial and error. Businesses weren’t getting the perspectives they needed or were possible because the capacity reductions weren’t based on computational fact.
Enter big data, with all its volumes, velocities, and varieties, and the problem remains or perhaps worsens. The shortcomings of the infrastructure have been addressed and can store and process huge amounts of additional data, but new technologies were needed specifically to help manage big data.
Despite the outward appearances, this is a wonderful thing. Today and in the future, companies will have more data than they can imagine and they’ll have the means to capture and manage it. What is more necessary than ever is the capability to analyze the right data in a timely enough fashion to make decisions and take actions.
Businesses will still shrink the data sets into “fighting trim,” but they can do so computationally. They process the big data and turn it into small data so that it’s easier to comprehend. It’s more precise and, because it was derived from a much larger starting point, it’s more contextually relevant.