In the early days of data warehousing, most organizations handled middleware services through custom coding, rather than with the few tools available at the time, as shown in this example:

  1. An organization writes a program in a programming language such as COBOL, or perhaps in an environment such as SAS, to handle the data extracts from a mainframe data source and then do the quality assurance checking and the transformation.

  2. A file-transfer service, such as standard FTP (File Transfer Protocol), is used to copy the transformed and “cleansed” data to the machine on which the data warehouse will reside.

  3. Plain old SQL, or a bulk loading utility, is used to load a relational database with the new (or updated) contents of the data warehouse.

Nothing’s wrong with this programmatic approach. Always determine for your specific environment whether custom coding or tools are the “right” way to go. Don’t automatically assume that you should implement your data warehouse by using middleware tools.

But your team might find replicating the reusable logic built into most middleware tools very expensive. And, furthermore, you can find open-source (free) middleware tools available, making the argument of “we’ll save money by using internal resources” a difficult one to justify. Therefore, most implementations today are done using Extract, Transform, and Load (ETL) tools.