|
Third Generation ETL: Delivering the Best Performance (Part 1)
|
Corporate dashboards are becoming the “must have” business intelligence technology for executives and business users across corporate America. Dashboard solutions have been around for over a decade, but have recently seen a resurgence in popularity due to the advance of enabling business intelligence and integration technologies. This paper discusses how to create an effective operational dashboard and some of the associated design best practices.
|
|
The process of selecting an ETL (extract, load, transfer) software solution is typically a complex one, during which many features need to be evaluated. One of the most critical criteria that must be met is performing well in a given environment and configuration.
Many vendors of ETL software will conclude their sales pitch by giving numbers – always very impressive – regarding the performance of their solution. They are almost invariably provided in the form: such tool can transfer so many rows per time unit.
Many users, however, have been misled by impressive-looking performance numbers that turned out to be less than impressive in real life. Why? Because performance is one of the most difficult elements to evaluate without conducting a full-scale evaluation. Indeed, performance in the production environment is significantly affected by the overall architecture of the information system and by the flow of data during the ETL process.
This article addresses a number of elements that impact the performance of ETL. This month, we will review the different generations of ETL software and the architecture of third generation ETL products. Next month, we will study into more detail some of the essential characteristics of this architecture.
Historical Background
As computer systems started to evolve from monolithic mainframes to distributed computing systems, and as business intelligence made its debut, the first ETL solutions were introduced. Since that time, several generations of ETL have been produced.
First Generation: the Origin of ETL and the Legacy Code Generators. Original ETL tools generated native code for the operating system of the platform on which the ETL processes were to run. Most of these products actually generated COBOL, since at that time data was largely stored on mainframes. These products made the ETL processes easier than they had been by taking advantage of a centralized tool to generate ETL processes and by propagating the code to the appropriate platforms – instead of manually writing programs to do so. Performance was very good because of the inherent performance of native compiled code, but these tools required an in-depth knowledge of programming on the different platforms. Maintenance was also difficult because the code was disseminated to different platforms and differed with the type of sources and target. At the time, this architecture provided the best performance possible, since data was stored in flat files and hierarchical databases and record-level access was fast. Although this worked well on mainframes, using such an approach on relational databases has proven to be less successful for managing large data volumes.
|
|
|
Oracle #1 in Business Analytics According to IDC Research
|
The Business Intelligence Search Engine has all the answers.
|
Find all you need on The Business Intelligence Search Engine.
|
|
|
|
|
|
 |
|
Other
Articles by this Author
|
|
|