Data

6 Dimensions of Data Quality

Omar Nousseir

12 Jun 2022 • 2 min read

Data quality is crucial for data analysis and insights. Many business decisions are data driven, so poor data quality will lead to poor decision making. This consequently leads to companies lacking trust in their insights. For example, Gartner estimates that on average companies lose $15 million a year due to bad data quality.

The purpose of this blog then is to outline the 6 key metrics of data quality to use when assessing and evaluating data quality for your organisation.

Accuracy

Accuracy is essentially checking if the current value is the correct value. For example, when checking if an employee address is the correct address, ask yourself the following questions:

Are there any incorrect spellings in the address?
Is the address up to date?

This last question refers to the link between accuracy and time. Over time, accuracy of data decreases because for example, a person can move to a different address and so this needs to be updated in the relevant database management systems.
Completeness

A dataset is complete if it satisfies the business definition of what it means to be comprehensive. This largely depends on what the data is needed for, as it can determine which fields are mandatory or optional.

For example, if an electoral register is missing eligible voters. This can lead to a bias election. Another example, if a customer database is missing email addresses which are crucial for sending marketing emails.

When checking for completeness of your data, ask yourself the following question:

Is all the required data available?
Consistency

Consistency is checking that all instances of a real world object or event are consistent across multiple business systems.

For example, a consumer goods company may store different information on a particular customer such as invoices, loyalty schemes or subscriptions in different systems. Hence, it is important to have a single source of the truth, so that an update in one system for example, customer address, is reflected in all other systems.
Timeliness

Is the data available when needed?

For example, when forecasting revenue growth for the next quarter, it is sufficient to just use the previous quarter results. However, when a doctor needs to diagnose a patient and make a decision on treatment, they would need up to date blood test results.
Validity

Does the data comply with the specified data type, format and range?

An example for each type:

Data Type - a name field must contain only letters, not characters or numbers.
Format - a mobile number field must be 11 digits in length.
Range - a credit card expiry date field can not be in the past.
Uniqueness

For example, in a customer dataset there are transactional records for John Dorie and Johnny Dorie. Although the names are different, they are in reality the same people. Therefore a unique key, often called a Surrogate Key, is required to ensure that every real world object or event is represented only once.

Sign up for more like this.