“Most enterprises don’t fathom the magnitude of the impact that data quality problems can have,” said Ted Friedman, principal analyst for Gartner, in the May 17, 2004 edition of InformationWeek. Friedman was commenting on the results of his research on data quality published that week, which revealed that a quarter of major international companies are working with poor-quality data.
Friedman went on to say that bad or incomplete data can have an enormous impact on major Information Technology initiatives, including Business Intelligence (BI) rollouts. BI, he said, “is the best example of the old adage, ‘garbage in, garbage out.’ How can they (executives) have a high confidence in the decisions when the data is bad?”
Beyond causing bad head-office decisions, poor quality data has been responsible for countless lower-level disasters in direct marketing, accounting and shipping departments that didn’t involve executives, but nonetheless caused grievous damage to companies’ bottom line results and customer credibility. The pain points are as numerous as the departments within each enterprise that depend upon high quality data to succeed.
Gartner’s recent report on this topic shines a spotlight on what had been often disregarded as a minor annoyance, or something that could be cleaned up after BI or other important enterprise data initiatives were deployed and tested. But that amounts to closing the proverbial barn door well after the livestock has run loose.
That philosophy has also created a climate in which IT and business managers often don’t even know how bad their data quality is. More importantly, those managers don’t have the means of knowing what they don’t know, or what it’s ultimately costing their company.
One attempt at solving this problem came from early data profiling theories and the resulting tools that were based upon them. These rudimentary profiling tools act like radar scanning the sky for large flying objects. With the foreknowledge of what each record within a suspect data base should contain – say, seven completed fields listing PRODUCT ID, PRODUCT NAME, PRODUCT DESCRIPTION, STOCK ID, ORDER-QUANTITY, UNIT PRICE and SUPPLIER-ID – the profiling tool scans the database for blank fields or incongruous data (like UNIT PRICE in the ORDER-QUANTITY field), and merely reports the results.
|