Layer 4 of the Big Data Stack: Analytical Data Warehouses
The data warehouse, layer 4 of the big data stack, and its companion the data mart, have long been the primary techniques that organizations use to optimize data to help decision makers. Typically, data warehouses and marts contain normalized data gathered from a variety of sources and assembled to facilitate analysis of the business.
Data warehouses and marts simplify the creation of reports and the visualization of disparate data items. They are generally created from relational databases, multidimensional databases, flat files, and object databases — essentially any storage architecture. In a traditional environment, where performance may not be the highest priority, the choice of the underlying technology is driven by the requirements for the analysis, reporting, and visualization of the company data.
As the organization of the data and its readiness for analysis are key, most data warehouse implementations are kept current via batch processing. The problem is that batch-loaded data warehouses and data marts may be insufficient for many big data applications. The stress imposed by high-velocity data streams will likely require a more real-time approach to big data warehouses.
This doesn’t mean that you won’t be creating and feeding an analytical data warehouse or a data mart with batch processes. Rather, you may end up having multiple data warehouses or data marts, and the performance and scale will reflect the time requirements of the analysts and decision makers.
Because many data warehouses and data marts are comprised of data gathered from various sources within a company, the costs associated with the cleansing and normalizing of the data must also be addressed. With big data, you find some key differences:
Traditional data streams (from transactions, applications, and so on) can produce a lot of disparate data.
Dozens of new data sources also exist, each of them needing some degree of manipulation before it can be timely and useful to the business.
Content sources will also need to be cleansed, and these may require different techniques than you might use with structured data.
Historically, the contents of data warehouses and data marts were organized and delivered to business leaders in charge of strategy and planning. With big data, a new set of teams are leveraging data for decision making.
Many big data implementations provide real-time capabilities, so businesses should be able to deliver content to enable individuals with operational roles to address issues such as customer support, sales opportunities, and service outages in near real time. In this way, big data helps move action from the back office to the front office.
Existing analytics tools and techniques will be very helpful in making sense of big data. However, there is a catch. The algorithms that are part of these tools have to be able to work with large amounts of potentially real-time and disparate data. The infrastructure will need to be in place to support this.
And, vendors providing analytics tools will also need to ensure that their algorithms work across distributed implementations. Because of these complexities, expect a new class of tools to help make sense of big data.
There are three classes of tools in this layer of the reference architecture. They can be used independently or collectively by decision makers to help steer the business. The three classes of tools are as follows:
Reporting and dashboards: These tools provide a “user-friendly” representation of the information from various sources. Although a mainstay in the traditional data world, this area is still evolving for big data. Some of the tools that are being used are traditional ones that can now access the new kinds of databases collectively called NoSQL (Not Only SQL).
Visualization: These tools are the next step in the evolution of reporting. The output tends to be highly interactive and dynamic in nature. Another important distinction between reports and visualized output is animation. Business users can watch the changes in the data utilizing a variety of different visualization techniques, including mind maps, heat maps, infographics, and connection diagrams. Reporting and visualization occur at the end of the business activity.
Analytics and advanced analytics: These tools reach into the data warehouse and process the data for human consumption. Advanced analytics should explicate trends or events that are transformative, unique, or revolutionary to existing business practice. Predictive analytics and sentiment analytics are good examples of this science.