Replication Services for Data Warehousing - dummies

Replication Services for Data Warehousing

By Thomas C. Hammergren

Replication middleware services combine selection and extraction, movement, and loading from one database to one or more others, usually managed by a single DBMS product. (The source database and all the targets are all Oracle, all Sybase, or all Microsoft SQL Server, for example.)

Although replication service capabilities vary among DBMS products, traditionally, they’ve been snapshot-oriented: A snapshot of either an entire database or the changes since the last replication occurred are extracted, at a predetermined time, from the source and copied over a networked environment to the intended targets.

The data is then transmitted and loaded as-is (no transformation occurs). Many database vendors have implemented their replication by reading the log files for changes. This style of replication is very efficient because it doesn’t increase the overhead of your transactional systems to replicate the data.

But replication doesn’t replace the long list of data warehousing middleware services. You want to use replication in a data warehousing environment primarily when capturing changes in the source database, often called change data capture (CDC), or after you load the data into your data warehouse and then extract data and send it to data marts, as shown in this figure.

RDBMS-based replication services capture changes in the source system to optimize the data selection and extraction process.