The Data Warehouse Lite
A data warehouse lite is a no-frills, bare-bones, low-tech approach to providing data that can help with some of your business decision-making. No-frills means that you put together, wherever possible, proven capabilities and tools already within your organization to build your system.
Subject areas and data content of a data warehouse lite
A data warehouse lite is focused on the reporting or analysis of only one or possibly two subject areas. Suppose that in your job at a wireless division of a telephone company, you analyze the sales of services such as in-network minutes, out-of-network minutes, text messaging, Internet access, and other mobile usage to consumer households.
If you build a data warehouse lite exclusively for this purpose, you have all the necessary information to support your analysis and reporting for the consumer market. You don’t have any information about business users and payment history, however, because that information is part of a different subject area, as shown in this figure.
Based on the subject area limitation, a data warehouse lite has just enough data content to satisfy the primary purpose of the environment, but not enough for many unstructured what-if scenarios its users might create.
You must choose carefully, therefore, from among the set of all possible data elements and select a manageable subset — elements that, without a doubt, are important to have. This process is the same for any data warehouse implementation, except that you must be extremely disciplined when you’re making decisions about what content to include.
Use standard reports, particularly those that currently require a great deal of manual preparation, as one of your primary guides to determine data content in a data warehouse lite.
A data warehouse lite has a limited set of data sources — typically, one to a handful. As part of an overall single-application environment, for example, the data warehouse lite acts as the restructuring agent for the application’s data to make it more query- and report-friendly.
The most common means of restructuring a single application’s data is to denormalize the contents of the application’s relational database tables to eliminate as many relational join operations (the process of bringing together data from more than one database table) as possible when users run reports or do simple querying.
Denormalization is the opposite of the relational database concept of normalization, a somewhat complex set of guidelines that tells you which data elements should be in which tables in a database.
When you denormalize a database, you don’t worry about duplicated data; you try to create rows of data in a single table that most likely mirrors the reports and queries that users run. This figure shows an example of a single-source data warehouse lite built on denormalization.
Although you can use externally provided data in a data warehouse lite implementation, the data you use is rarely newly acquired. You’re more likely to incorporate data that you already use for analysis (perhaps in a stand-alone manner).
Business intelligence tools
The users of a data warehouse lite usually ask questions and create reports that reflect a “Tell me what happened” perspective. Because those users don’t do much heavy-duty analytical processing, the products they use to access the data warehouse should be easy for them to use.
Data extraction, movement, and loading
Simplicity is the name of the game in a data warehouse lite. Therefore, make the process of extracting data from sources and performing all the functions necessary to prepare that data for loading as straightforward as possible by using these two elements:
Simple file extracts from the run-the-business systems and file transfers that allow you to move data from its sources to the data warehouse lite
Straightforward custom code (or perhaps an easy-to-use tool) that can extract and move the data
If the data source for your data warehouse lite is built on a relational database and you’re planning to use the same database product for your data warehouse, use SQL to easily handle data extraction and movement. These steps — as shown in the figure — provide a standard procedure for this process (you’ll want to tailor these steps to your particular environment, of course):
On the system that houses your warehouse, use the SQL CREATE TABLE statement to create the definition for each table in your data warehouse lite.
Create a database backup that contains copies of all tables from the source that provide data to the warehouse, and then reload those tables into a staging area on the system where you plan to locate your data warehouse.
You should ensure that the network bandwidth and time window are adequate to copy all the source tables to the system by using a file-transfer program.
Use the SQL INSERT statement, with a nested SELECT statement specifying the source tables and their respective columns that will populate the data warehouse table (and how the tables will be joined), to load the data into your data warehouse lite.
Run a series of quality assurance (QA) routines to verify that all data has been loaded properly.
Check row counts, numeric totals, and whatever else you can.
The architecture of a data warehouse lite is composed of the database used to store the data, the front-end business intelligence tools used to access the data, the way the data is moved, and the number of subject areas. The watchword of this environment is minimalist: no bells, no whistles, nothing fancy — just enough technology applied to the environment to give users access to data they need.
The architecture of a data warehouse lite, as shown in this figure, contains these major component types:
A single database contains the warehouse’s data.
That database is fed directly from each of the sources providing data to the warehouse.
Users access data directly from the warehouse.