The Idea behind Multidimensional Databases
Data Mining for Data Warehousing
Determine What External Data You Really Need

Middleware Services: Data Mapping and Transformation

This figure shows an environment in which data is being extracted from three different data sources for inclusion in a data warehouse, and each of the three sources is on a different platform. At some point in the middleware process, these QA’d extracts must be brought together for a combined mapping and transformation process.


The mapping and transformation service handles classical data warehousing problems. Suppose that one data source stores customers by using a five-character customer ID, and another source uses a six-digit numeric customer identifier.

To enable comparisons and other data warehouse processing, you need a common method of customer identification: One of the identification schemes must be converted to the other, or perhaps a third, neutral identification system, depending on the environment’s characteristics.

In addition to handling cross-system incompatibilities, additional transformations might include

  • Data summary: A summary can be performed earlier in the process, before cross-system movement, depending on the peculiarities of your specific data warehousing environment.

  • Selective inclusion of data: You might include records from only one data source, for example, if you get a comparable record from another extract. You don’t know, until you converge all the data source’s contributions, how selective inclusion rules are applied.

  • Data convergence: Certain elements from one data source are combined with elements from another source to create one unified record for each customer, product, contract, or whatever type of data you’re dealing with.

The main point to remember about the mapping and transformation service is that you should have, at its conclusion, a unified set of data that’s ready to load into the data warehouse — as soon as you complete a few more steps.

In complex data warehousing environments, you might want to consider multiple transformation processes. As shown in this figure, for example, data extracts converge at several different levels of transformation before moving farther down the middleware pipeline, enabling you to apply more horsepower to the transformation process by using multiple servers early in the flow.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Examples from Highly Successful Data Collaboration Solutions
Data Extraction, Movement, and Loading for Data Warehouse Deluxe
Mine Big Data with Hive
Should You Use Tools or Custom Code?
Ten Questions to Consider When You’re Selecting User Tools