Middleware Services: Data Mapping and Transformation
This figure shows an environment in which data is being extracted from three different data sources for inclusion in a data warehouse, and each of the three sources is on a different platform. At some point in the middleware process, these QA’d extracts must be brought together for a combined mapping and transformation process.
The mapping and transformation service handles classical data warehousing problems. Suppose that one data source stores customers by using a five-character customer ID, and another source uses a six-digit numeric customer identifier.
To enable comparisons and other data warehouse processing, you need a common method of customer identification: One of the identification schemes must be converted to the other, or perhaps a third, neutral identification system, depending on the environment’s characteristics.
In addition to handling cross-system incompatibilities, additional transformations might include
Data summary: A summary can be performed earlier in the process, before cross-system movement, depending on the peculiarities of your specific data warehousing environment.
Selective inclusion of data: You might include records from only one data source, for example, if you get a comparable record from another extract. You don’t know, until you converge all the data source’s contributions, how selective inclusion rules are applied.
Data convergence: Certain elements from one data source are combined with elements from another source to create one unified record for each customer, product, contract, or whatever type of data you’re dealing with.
The main point to remember about the mapping and transformation service is that you should have, at its conclusion, a unified set of data that’s ready to load into the data warehouse — as soon as you complete a few more steps.
In complex data warehousing environments, you might want to consider multiple transformation processes. As shown in this figure, for example, data extracts converge at several different levels of transformation before moving farther down the middleware pipeline, enabling you to apply more horsepower to the transformation process by using multiple servers early in the flow.