Does Your Company Have Existing Data Structures?
Your organization has overwhelmingly favorable odds of having at least one sort-of data warehouse — a reporting system that provides informational capabilities and, sometimes, analytical capabilities to one or more groups of users.
What is an extract file?
Your users probably use the term extract file to describe this type of environment because it’s populated by extracts of data from production systems, rather than by users being forced to execute their queries or receive their reports from the operational production databases or files. Still interested in playing the odds? Here are a few more examples of types of data environments that might be described as sort-of data warehouses:
Although the extracted data is almost always housed in a single file or database, a merge process probably combines extracted data from more than one application source.
Only selected elements, not all elements from all tables or files, from each data source are usually extracted and copied to the extract file.
Some sort of data quality assurance process is usually going on each step of the way, from the initial extract to loading the data into the extract file.
Some power users probably can execute queries or create statistical programs (in SAS or SPSS, for example) against the data, but many users aren’t likely to touch the data directly. Instead, they probably regularly receive reports generated either automatically or in response to their requests.
Sure sounds like a data warehouse, doesn’t it? The reality is that these sort-of data warehouses are typically serving a very small population and are not done in a standard manner to support the broader needs of the enterprise. You might also call them wanna-be data warehouses.
Here’s the dichotomy of most organizations access to data:
Data analysis have-nots: Organizations and individuals who have few (and more likely no) capabilities to do the type of analysis that can bring about information-driven decision-making
Data analysis haves: Organizations and individuals who might not have a data warehouse up and running, but are doing something with data that they’re getting from somewhere. In many cases, it’s suiting their business needs just fine.
Why aren’t extract files considered to be data warehouses?
They are, sort of. Extract files, whether in the 1970s, 1980s, 1990s, or still in use today, exist for the same basic reasons that a full-fledged data warehouse or a data mart does: to provide information delivery despite a variety of barriers, such as hard-to-understand data structures, don’t touch the production system rules, and the lack of multi-file or multi-database cross-reference.
Some data warehousing proponents argue that combining and reconfiguring data simply for the purpose of generating reports or to perform statistical analysis is hardly a data warehouse in the modern sense of the term. Extract files aren’t equipped with multidimensional or business-analysis capabilities, such as drill-down and data pivoting.
If you separate the data-warehouse side (what it takes to gather, move, and reconfigure data from one or more sources) from the business-intelligence side (what you do with the data after you have it available), the picture becomes much clearer.
Extract files, or whatever you want to call them, are very much part of the barrier-breaking philosophy of a data warehouse. Many of what users refer to as extract files are file-based systems (rather than built on databases), and they probably aren’t flexible enough to support ad hoc querying and dimensional analysis. In a real sense, however, these environments serve the purpose of warehousing data for subsequent use.
To many users, business-analysis capabilities, such as drill-down and data pivoting, have little or no use — at least, not in the context of their current job definitions. The users’ jobs call for functionality that these extract files can deliver, as well as the static reports and statistical analysis accomplished with that data.
The moral of the story: Don’t go into an organization that effectively uses data through extract files and expound on the wonders of data warehousing. Instead, be cautious about proposing any data warehousing solution that can be viewed as a step backward. If you do make this kind of proposition, you’re in for a long, bumpy ride.