Why is Data Quality important?

A decision can be totally wrong if the quality of the data is bad. Businesses can lose customers, miss opportunities, and downgrade their reputation & revenue. Due to the variety of sources, and intermediate data processing, a quality check is becoming increasingly relevant for both research and businesses.

What happened to Hawaiin Airlines?

Let's start with a recent incident in 2019 that downgraded the rating of an airline company massively. Due to poor data management and quality, customers of Hawaiian Airlines were charged with loyalty points instead of USD. That charged tens of thousands of dollars (up to 650k USD) to some of the customers. To fix that, the business not only lost its reputation but also faced huge financial losses by being forced to declare massive amounts of goodwill paychecks to the affected.

The data quality can also affect us directly. Our spending habits, traveling records, and social media information are all feed to algorithms while calculating Schufa or Credit scores. However, most of the data are collected from 3rd party agencies and these agencies work in volume instead of quality. This can ultimately affect us by miscalculating credit scores due to bad data.

Identify Data Quality issues

It can be not so straightforward to identify data quality issues. However, there are certain measures to assess the quality of data.

Accuracy of sources and Reliability
Most data are coming from multiple sources and it's very difficult to understand which sources are good and which sources are suspicious. Identifying good sources is the first step to ensuring quality data. External market research firms or data brokers could be the most vulnerable sources.

Reliability assessment can be done by checking the rating of sources and cross-checking the data coming from different sources. Many organizations can not trust the data and that causes a huge ( more than 70%) of their acquired data kept unanalyzed!

Data Availability

After you verified data sources, you have to ensure the data is complete and if not, how to make it complete so that you don't have customer impact due to missing information.

A very common technique is to define what is the need and what will enhance the analysis. Most of the time, data acquisition or data transformation is done to meet the demands.

Data Patterns

Suspicious data deviate from good data by having distinguishable patterns. That means mismatched data pattern needs to be re-assessed by domain experts to have an intuitive decision of acceptability. Data visualization tools play a major role in identifying patterns and eliminating of bad data.

Useful information

The same data can be ideal for some analyses and useless for others. Therefore, its the duty of the data person to take information based on usability. Statistical modeling and hypothesis testing play crucial roles in identifying correlations and separating relevant information from others.

Same time, data loses its usefulness with time and therefore recent data is always weighed more than past data and contributes to improving the data quality.

Lack of Uniqueness

Duplicate information can have a negative impact on the overall analysis. It would not only require extra space & effort but also pose serious maintainability risks as it causes direct data integrity issues. That is why data governance is becoming an integral part of all data-driven organisations.

Consequences of bad Data

We already saw what could happen to businesses.

Poor Prediction.

Financial Loss.

Time Wastage.

Efficiency reduction.

Reputational degradation.

Therefore, data auditing, data literacy, and internal training are now a must for employees of organizations to stay up-to-date with the market. At every phase of the development cycle, data quality checks can help reduce poor performance at a later stage.

Every company will be a data company one day and it's never too late to start learning and become a part of the bigger data community.