Middleware for Data Warehousing - dummies

By Thomas C. Hammergren

Middleware is computer software that connects software components. In a data warehousing environment, the middleware services are the set of programs and routines that do the following:

  • Pull data from the source (or sources).

  • Make sure that the data’s correct.

  • Move the data around the environment from platform to platform, as necessary.

  • Handle any necessary data transformations.

  • Load the data into the data warehouse’s database (or databases).

In a more formal sense, the items in the preceding list are handled by these middleware services:

  • Data selection and extraction

  • Data quality assurance, part I (at the component level)

  • Data movement, part I (also at the component level)

  • Data mapping and transformation

  • Data quality assurance, part II (after transformation has occurred)

  • Data movement, part II (into the data warehouse’s platform environment)

  • Data loading (into the data warehouse)

This figure illustrates how these middleware services flow together in a moderately sized data warehousing environment.


But your data warehousing environment might differ from the one shown in this figure, particularly in the area of the data-movement services. A data-movement service is necessary every time data crosses system boundaries. Your conceptual picture differs, depending on the details of your particular end-to-end environment.

You absolutely, positively need to plan, design, and otherwise think about data warehouse middleware in terms of the individual services in the preceding list, rather than in generic terms, such as “extraction tools.” Many different vendors provide some, many, or all these services as part of a single product or a suite of products.

But a tool that has strong mapping and transformation services, for example, might be weak in data-loading services, or a tool that provides a rich set of extraction services might be less effective in the mapping and transformation space, in addition to data quality assurance.

Before selecting a tool for your data warehousing project (if that’s the route you take, rather than custom coding), make sure that you have a good idea about the particular challenges in your environment. If you have relatively straightforward data-extraction needs, for example, but challenging data-quality problems, concentrate on finding the best quality-assurance tool available, even if it has only so-so extraction capabilities.

(This advice applies even if the tool has no extraction capabilities, in which case, you have to combine it with another tool.)