Home | Extracts   
Building the Data Warehouse, 3rd Edition - Chapter 1

by Bill Inmon

White Paper : Five Ways to Tame the Chaos of Corporate Data
Enterprise reporting lets IT take five steps to overcome major barriers that develop when organizations are limited to spreadsheets and siloed reporting tools. Download

We are told that the hieroglyphics in Egypt are primarily the work of an accountant declaring how much grain is owed to the Pharaoh. Some of the streets in Rome were laid out by engineers more than 2,000 years ago. Examination of bones found in archeological excavations shows that medicine – in, at least a rudimentary form – was practiced as long as 10,000 years ago. Other professions have roots that can be traced back to antiquity. From this perspective, the profession and practice of information systems and processing is certainly immature, because it has existed only since the early 1960’s.

Information processing shows this immaturity in many ways, such as its tendency to dwell on detail. There is the notion that if we get the details right, the end result with somehow tale care of itself and we will achieve success. It’s like saying that if we know how to lay concrete, how to drill, and how to install nuts and bolts, we don’t have to worry about the shape or use of the bridge we are building. Such an attitude would drive a more professionally mature civil engineer crazy. Getting all the details right does not necessarily bring more success.

The data warehouse requires an architect that begins by looking at the whole and then works down to the particulars. Certainly, details are important throughout the data warehouse. But details are important only when viewed in a broader context.

The story of the data warehouse begins with the evolution of information and decision support systems. This broad view should help put data warehousing into clearer perspective.

The Evolution

The origins of DSS processing hark back to the very early days of computers and information systems. It is interesting that decision support system (DSS) processing developed out of a long and complex evolution of information technology. Its evolution continues today.

In the early 1960s, the world of computation consisted of creating individual applications that were run using master files. The applications featured reports and programs, usually built in COBOL. Punched cards were common. The master files were housed on magnetic tape, which were good for storing a large volume of data cheaply, but the drawback was that they had to be accessed sequentially. In a given pass of a magnetic tape file, where 100 percent of the records have to be accessed, typically only 5 percent or fewer of the records are actually needed. In addition, accessing an entire tape file may take as long as 20 to 30 minutes, depending on the data on the file and the processing that is done.

Around the mid-1960s, the growth of master files and magnetic tape exploded. And with that growth came huge amounts of redundant data. The proliferation of master files and redundant data presented some very insidious problems:

  • The need to synchronize data upon update
  • The complexity of maintaining programs
  • The complexity of developing new programs
  • The need for extensive amounts of hardware to support all the master files
In short order, the problems of master files—problems inherent to the medium itself—became stifling.

It is interesting to speculate what the world of information processing would look like if the only medium for storing data had been the magnetic tape. If there had never been anything to store bulk data on other than magnetic tape files, the world would have never had large, fast reservations systems, ATM systems, and the like. Indeed, the ability to store and manage data on new kinds of media opened up the way for a more powerful type of processing that brought the technician and the businessperson together as never before.

The Advent of DASD

By 1970, the day of a new technology for the storage and access of data had dawned. The 1970s saw the advent of disk storage, or direct access storage device (DASD). Disk storage was fundamentally different from magnetic tape storage in that data could be accessed directly on DASD. There was no need to go through records 1, 2, 3, . . . n to get to record n 1. Once the address of record n 1 was known, it was a simple matter to go to record n 1 directly. Furthermore, the time required to go to record n 1 was significantly less than the time required to scan a tape. In fact, the time to locate a record on DASD could be measured in milliseconds.

With DASD came a new type of system software known as a database management system (DBMS). The purpose of the DBMS was to make it easy for the programmer to store and access data on DASD. In addition, the DBMS took care of such tasks as storing data on DASD, indexing data, and so forth. With DASD and DBMS came a technological solution to the problems of master files. And with the DBMS came the notion of a “database.” In looking at the mess that was created by master files and the masses of redundant data aggregated on them, it is no wonder that in the 1970s a database was defined as a single source of data for all processing.

By the mid-1970s, online transaction processing (OLTP) made even faster access to data possible, opening whole new vistas for business and processing. The computer could now be used for tasks not previously possible, including driving reservations systems, bank teller systems, manufacturing control systems, and the like. Had the world remained in a magnetic-tape-file state, most of the systems that we take for granted today would not have been possible.

PC/4GL Technology

By the 1980s, more new technologies, such as PCs and fourth-generation languages (4GLs), began to surface. The end user began to assume a role previously unfathomed—directly controlling data and systems—a role previously reserved for the data processor. With PCs and 4GL technology came the notion that more could be done with data than simply processing online transactions. MIS (management information systems), as it was called in the early days, could also be implemented. Today known as DSS, MIS was processing used to drive management decisions. Previously, data and technology were used exclusively to drive detailed operational decisions. No single database could serve both operational transaction processing and analytical processing at the same time. Figure 1.1 shows the single-database paradigm.

Enter the Extract Program

Shortly after the advent of massive OLTP systems, an innocuous program for “extract” processing began to appear (see Figure 1.2).

The extract program is the simplest of all programs. It rummages through a file or database, uses some criteria for selecting data, and, on finding qualified data, transports the data to another file or database.

The nature of extract processing

Figure 1.2 The nature of extract processing

The extract program became very popular, for at least two reasons:

  • Because extract processing can move data out of the way of high-performance online processing, there is no conflict in terms of performance when the data needs to be analyzed en masse.
  • When data is moved out of the operational, transaction-processing domain with an extract program, a shift in control of the data occurs. The end user then owns the data once he or she takes control of it.
For these (and probably a host of other) reasons, extract processing was soon found everywhere.

The Spider Web

As illustrated in Figure 1.3, a “spider web” of extract processing began to form. First, there were extracts; then there were extracts of extracts; then extracts of extracts of extracts; and so forth. It was not unusual for a large company to perform as many as 45,000 extracts per day.

This pattern of out-of-control extract processing across the organization became so commonplace that it was given its own name—the “naturally evolving architecture”—which occurs when an organization handles the whole process of hardware and software architecture with a laissez-faire attitude. The larger and more mature the organization, the worse the problems of the naturally evolving architecture become.

The naturally evolving architecture presents many challenges, such as:

  • Data credibility
  • Productivity
  • Inability to transform data into information


Resource Center
Business Intelligence
Oracle #1 in Business Analytics According to IDC Research
BI Search
The Business Intelligence Search Engine has all the answers.
Business Intelligence Search Engine
Find all you need on The Business Intelligence Search Engine.
Add a Link Add a Link

  
  




Designing and Implementing Business Intelligence Solutions Using Microsoft SQL Server. Click for details.
Business Intelligence Solution Finder

What do you need?

Location of solution provider

What type of solution are you interested in?

Are you interested in a specific solution?                      


All product names are trademarks of their respective companies.
Copyright © ITNetwork365 - All Rights Reserved