|
ETL: Why abandon SQL when it will give you the best performance?
(Continued from Page 1)
|
Spreadsheets have been around since the late 1970s. They were an instant sensation. Over the years, spreadsheets have evolved and matured, but the basic form and substance of spreadsheets has hardly changed. While new features and capabilities continue to be added to spreadsheets, for the most part spreadsheet technology has reached a plateau. This is typical of a highly successful product. However, it is becoming clear that new approaches and paradigms should and are beginning to emerge.
|
|
Data Update
You are transferring data from some database to your ODS (online data store) and of course, you wish to keep the ODS up to date. For this, you need to update changes in the ODS dynamically. If the tables in your ODS are quite large, this operation quickly becomes a struggle. Because the ETL engine is the only way to perform transformations, you are typically left with a very time consuming solution: take all incoming rows one by one, compare each row with your target table to identify whether you have to insert it, or whether it is just an update to an existing record or even, perhaps, the duplicate of an existing record. One…row… at a time. Of course, you can optimize the process. So you will spend hours optimizing the processes in the tool itself then do what makes the most sense in terms of optimization: write the incoming data into a temporary table in your target environment then go on into the target database (I mean: physically go to that box!), log in to the box, type the necessary SQL code manually (it’s so easy to identify whether it will be an insert or an update in SQL. It will be done in just a few queries!) Now, repeat for all of the tables of your ODS… and you will quickly find yourself back to square one, with your manageability issues, traceability issues, etc.
You acquired an expensive solution that, at the most critical time, could keep up with the required performance, and forced you to go back to square one: manual coding.
Quite embarrassing…
Star Schema
For those of you not familiar with the concept of a star schema, here is a quick introduction. A star schema is a database architecture made of Dimension tables and Fact tables. Dimension tables will represent various axes of interest in your environment: time dimension, geography dimension, product line dimension, customer information dimension, etc. These usually consist of a surrogate key, a natural key, and the data representing the dimension. The surrogate key, also known as the artificial key, has only one purpose: to uniquely identify the record in the database containing the star schema.
The natural key will allow you to refer to the original record in the source database, hence its name. In other words, it’s the original key.
The fact table will contain the main data of interest (units sold, company purchases, etc.) and will reference your dimensions with foreign keys. The foreign key is typically the dimension table’s surrogate key.
|
|
|
Oracle #1 in Business Analytics According to IDC Research
|
The Business Intelligence Search Engine has all the answers.
|
Find all you need on The Business Intelligence Search Engine.
|
|
|
|
|
|
 |
|
Other
Articles by this Author
|
|
|