- Incomplete data—During system requirements definition, we rarely bother to gather the data requirements from down-stream information consumers, such as the marketing department. For example, if we build a system for the lending department of a financial institution, the users of that department will most likely list Initial Loan Amount, Monthly Payment Amount, and Loan Interest Rate as some of the most critical data elements. However, the most important data elements for users of the marketing department are probably Gender Code, Customer Age, or Zip Code of the borrower. Thus, in a system built for the lending department, data elements, such as Gender Code, Customer Age, and Zip Code might not be captured at all, or only haphazardly. This often is the reason why so many data elements in operational systems have missing values or default values.
- Nonintegrated data—Most organizations store data redundantly and inconsistently across many systems, which were never designed with integration in mind. Primary keys often don’t match or are not unique, and in some cases, they don’t even exist. More and more frequently, the development or maintenance of systems is outsourced and even off-shored, which puts data consistency and data quality at risk. For example, customer data can exist on two or more outsourced systems under different customer numbers with different spellings of the customer name and even different phone numbers or addresses. Integrating data from such systems is a challenge.
DATA QUALITY RULES
There are four categories of data quality rules. The first category contains rules about business objects or business entities. The second category contains rules about data elements or business attributes. The third category of rules pertains to various types of dependencies between business entities or business attributes, and the fourth category relates to data validity rules.Business Entity Rules
Business entities are subject to three data quality rules: uniqueness, cardinality, and optionality. These rules have the following properties:Uniqueness—There are four basic rules to business entity uniqueness:
- Every instance of a business entity has its own unique identifier. This is equivalent to saying that every record must have a unique primary key.
- In addition to being unique, the identifier must always be known. This is equivalent to saying that a primary key can never be NULL.
- Rule number three applies only to composite or concatenated keys. A composite key is a unique identifier that consists of more than one business attribute. This is equivalent to saying that a primary key is made up of several columns. The rule states that a unique identifier must be minimal. This means the identifier can consist only of the minimum number of columns it takes to make each value unique—no more, no less.
- The fourth rule also applies to composite keys only. It declares that one, many, or all business attributes comprising the unique identifier can be a data relationship between two business entities. This is equivalent to saying that a composite primary key can contain one or more foreign keys.






