Lack of Data Credibility
The lack of data credibility is illustrated in Figure 1.3. Say two departments are delivering a report to management—one department claims that activity is down 15 percent, the other says that activity is up 10 percent. Not only are the two departments not in sync with each other, they are off by very large margins. In addition, trying to reconcile the departments is difficult.
Unless very careful documentation has been done, reconciliation is, for all practical purposes, impossible.

Figure 1.3 Lack of data credibility in the naturally evolving architecture
When management receives the conflicting reports, it is forced to make decisions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture.
This crisis is widespread and predictable. Why? As depicted in Figure 1.3, there are five reasons:
- No time basis of data
- The algorithmic differential of data
- The levels of extraction
- The problem of external data
- No common source of data from the beginning
The first reason for the predictability of the crisis is that there is no time basis for the data. Figure 1.4 shows such a time discrepancy. One department has extracted its data for analysis on a Sunday evening, and the other department extracted on a Wednesday afternoon. Is there any reason to believe that analysis done on one sample of data taken on one day will be the same as the analysis for a sample of data taken on another day? Of course not! Data is always changing within the corporation. Any correlation between analyzed sets of data that are taken at different points in time is only coincidental.

Figure 1.4 The reasons for the predictability of the crisis in data credibility in the naturally evolving architecture
The second reason is the algorithmic differential. For example, one department has chosen to analyze all old accounts. Another department has chosen to ana lyze all large accounts. Is there any necessary correlation between the characteristics of customers who have old accounts and customers who have large accounts? Probably not. So why should a very different result surprise anyone?
The third reason is one that merely magnifies the first two reasons. Every time a new extraction is done, the probabilities of a discrepancy arise because of the timing or the algorithmic differential. And it is not unusual for a corporation to have eight or nine levels of extraction being done from the time the data enters the corporation’s system to the time analysis is prepared for management. There are extracts, extracts of extracts, extracts of extracts of extracts, and so on. Each new level of extraction exaggerates the other problems that occur.
The fourth reason for the lack of credibility is the problem posed by external data. With today’s technologies at the PC level, it is very easy to bring in data from outside sources. For example, Figure 1.5 shows one analyst bringing data into the mainstream of analysis from the Wall Street Journal, and another analyst bringing data in from Business Week. However, when the analyst brings data in, he or she strips the identity of the external data. Because the origin of the data is not captured, it becomes generic data that could have come from any source.
Furthermore, the analyst who brings in data from the Wall Street Journal knows nothing about the data being entered from Business Week, and vice versa. No wonder, then, that external data contributes to the lack of credibility of data in the naturally evolving architecture.
The last contributing factor to the lack of credibility is that often there is no common source of data to begin with. Analysis for department A originates from file XYZ. Analysis for department B originates from database ABC. There is no synchronization or sharing of data whatsoever between file XYZ and database ABC.
Given these reasons, it is no small wonder that there is a crisis of credibility brewing in every organization that allows its legacy of hardware, software, and data to evolve naturally into the spider web.
|