Are your data and system owners confident about the quality of their data or do they base their judgement on a combination of anecdotal evidence and finger in the air measurement?
If you are ignoring the quality of data that lies beneath the surface and are simply relying on what is staring you in the face, chances are you’re facing a data quality reality gap.
This post explores a real life example of a data quality gap and what it could mean to the financial, operational and strategic health of your business.
I often ask data managers or data domain owners how good their data is and it often surprises me how confident most of the responses are.
“Very good, we have very few reported issues, I would say about [pick a figure from 90-100%] of our data is [spot on | perfect | excellent | correct | accurate | right]”
(Misplaced) Data Quality Confidence
Many years ago I met with a data owner of a large inventory system who provided a typically confident response:
“We know the data is excellent because we never get any bad reports back from the field staff. If the data was poor quality we would know about it so I was surprised to hear you’re doing an audit actually, I don’t quite see the point.”
I was performing a baseline data quality assessment prior to a migration and the data owner was confused, and somewhat defensive, about my presence.
In the eyes of this data steward their data was perfect. By their benchmark.
Her metric for defective data was the amount of irate calls she received every week. As the phone never rang she assumed that the data was of high quality. The occasional visual inspection and routine maintenance also confirmed her belief that this data was of excellent quality for the job in hand.
And in many ways, she was right. The data was fit for purpose.
Localised Data Quality
Engineers would come back from their site visits and update the records, often omitting vital information. However, for their purposes the quality of data was absolutely fine.
It didn’t matter that they recorded the exact location of the asset. They had been servicing it for years so as long as they knew what building to go to and which floor to search they could easily find it.
It didn’t matter what the power rating of certain equipment was because that didn’t really relate to their job anyway.
It didn’t matter what format the unique equipment identifiers were stored in because they could still figure out which equipment related to which id simply by looking at the free form text that had been written. Sometimes the last 3 digits was all they needed, so that was all that was stored.
So the phone never rang for the data steward because there were no issues.
Until we examined the data.
What Lies Beneath
In one site we found that a system error had replicated a piece of equipment several hundred times. The steward admitted that she’d seen something similar before but that the field force obviously knew it was a mistake so ignored it.
There were over 7,000 different formats for recording one type of identifier.
A field accuracy survey of one site found that 30% of the equipment was inaccurately recorded. Expensive items of equipment were simply not registered in the database and equipment that had been decommissioned and transferred offsite was still stored as being actively in service.
All of these issues clearly had a marked impact on the future progress of the migration but undoubtedly were having an even bigger impact on the business. For many, many years.
There were other issues but the point of this post is to demonstrate the gap between data quality perception and data quality reality.
Clearly there are several factors at play here that enabled long-term defective data:
- The system allowed free-form text so a wide variety of formats could be entered
- Data quality and business rules were not in place or monitored
- The data steward was adopting an incorrect set of criteria for data quality measurement
- Field workers were not trained or educated on the impact of their data entry habits
- There was no formal, simple and measurable way for the field force to report data defects while they were onsite or back at base
- There was no incentivisation to encourage the field force to improve data quality levels, they were instead measured on their speed and productivity in the field, a conflicting measure as administrative chores were non-value added work in the eyes of the engineers
The cost and effort to implement an improvement in each one of these areas is actually quite minimal.
Change the system to accept a defined domain of formats, 1 man day
Data Quality Rules
We built a set of data quality rules and a simple monitoring engine for this process in 2 days, the rules were very simple in fact
Adopting the wrong measurement criteria
Using 2) this would have been resolved easily
Training field workers
There were only 15 fields that were really critical to the business, creating an online training course and follow-on measurement would have been relatively easy to create, tougher to administer admittedly
Creating a simple web-based report would have been easy, we actually thought of the idea of using an automated message service where the engineer could simply leave a message with the issue, this is then set via email to the support centre, costs about £200 per month in the UK plus additional costs for a support team to resolve the issues
Incentivisation to improve
Each time data was changed there was a log of the engineers name so penalties or rewards could be invoked, obviously a penalty system was preferable but based on the vast sums of money wasted every year there could be cash incentives
I’m not playing down the importance of change management here. To get several thousand field engineers to change their data habits is no mean feat and that is a topic for another day. However the main lesson of this post is to look at areas of your business where a reality gap is starting to form.
- Which parts of your information landscape have ageing legacy data that underpins your business model and is not currently measured?
- Which systems are due for decommission and will need to be migrated?
- What areas of your workforce input business-critical data but have received no training in basic data quality habits?
- Which of your systems have no data governance or stewardship and are measured on outdated or inappropriate criteria?
There are data quality reality gaps in every business. They should not be feared but embraced.
They may just offer a goldmine of financial and productivity improvements.
What are your views? Are there data quality reality gaps in your business? What techniques are you adopting to address them?