Search the site
Subscribe to Data Quality Pro

 via email            RSS Feed

external resources
« Share Your Story on Data Quality Pro.com | Main | How to set data quality goals any business can achieve »
Thursday
Oct222009

Does Your Business Suffer From a Data Quality Reality Gap?

image Are your data and system owners confident about the quality of their data or do they base their judgement on a combination of anecdotal evidence and finger in the air measurement?

If you are ignoring the quality of data that lies beneath the surface and are simply relying on what is staring you in the face, chances are you're facing a data quality reality gap.

This article explores a real life example of a data quality gap and what it could mean to the financial, operational and strategic health of your business.

 

Does Your Business Suffer From a Data Quality Reality Gap?

I often ask data managers or data domain owners how good their data is and it often surprises me how confident most of the responses are.

"Very good, we have very few reported issues, I would say about [pick a figure from 90-100%] of our data is [spot on | perfect | excellent | correct | accurate | right]"

Many years ago I met with a data owner of a large inventory system who provided a typically confident response:

"We know the data is excellent because we never get any bad reports back from the field staff. If the data was poor quality we would know about it so I was surprised to hear you're doing an audit actually, I don't quite see the point."

(I was performing a baseline data quality assessment prior to a migration and the data owner was confused, and somewhat defensive, about my presence).

In the eyes of this data steward their data was perfect. By her benchmark.

Her metric for defective data was the amount of irate calls she received every week. As the phone never rang she assumed that the data was of high quality. The occasional visual inspection and routine maintenance also confirmed her belief that this data was of excellent quality for the job in hand.

And in many ways, she was right. The data was fit for purpose.

Engineers would come back from their site visits and update the records, often omitting vital information. However, for their purposes the quality of data was absolutely fine.

It didn't matter that they recorded the exact location of the asset. They had been servicing it for years so as long as they knew what building to go to and which floor to search they could easily find it.

It didn't matter what the power rating of certain equipment was because that didn't really relate to their job anyway.

It didn't matter what format the unique equipment identifiers were stored in because they could still figure out which equipment related to which id simply by looking at the free form text that had been written. Sometimes the last 3 digits was all they needed, so that was all that was stored.

So the phone never rang for the data steward because there were no issues.

Until we examined the data.

 

 

In one site we found that a system error had replicated a piece of equipment several hundred times. The steward admitted that she'd seen something similar before but that the field force obviously knew it was a mistake so ignored it.

There were over 7,000 different formats for recording one type of identifier.

A field accuracy survey of one site found that 30% of the equipment was inaccurately recorded. Expensive items of equipment were simply not registered in the database and equipment that had been decommissioned and transferred offsite was still stored as being actively in service.

All of these issues clearly had a marked impact on the future progress of the migration but undoubtedly were having an even bigger impact on the business. For many, many years.

There were other issues but the point of this post is to demonstrate the gap between data quality perception and data quality reality.

Clearly there are several factors at play here that enabled long-term defective data:

  • The system allowed free-form text so a wide variety of formats could be entered
  • Data quality and business rules were not in place or monitored
  • The data steward was adopting an incorrect set of criteria for data quality measurement
  • Field workers were not trained or educated on the impact of their data entry habits
  • There was no formal, simple and measurable way for the field force to report data defects while they were onsite or back at base
  • There was no incentivisation to encourage the field force to improve data quality levels, they were instead measured on their speed and productivity in the field, a conflicting measure as administrative chores were non-value added work in the eyes of the engineers

The cost and effort to implement an improvement in each one of these areas is actually quite minimal.

  1. Free-form text = Change the system to accept a defined domain of formats, 1 man day
  2. Data Quality Rules = We built a set of data quality rules and a simple monitoring engine for this process in 2 days, the rules were very simple in fact
  3. Adopting the wrong measurement criteria = Using 2) this would have been resolved easily
  4. Training field workers = There were only 15 fields that were really critical to the business, creating an online training course and follow-on measurement would have been relatively easy to create, tougher to administer admittedly
  5. Defect reporting = Creating a simple web-based report would have been easy, we actually thought of the idea of using an automated message service where the engineer could simply leave a message with the issue, this is then set via email to the support centre, costs about £200 per month in the UK plus additional costs for a support team to resolve the issues
  6. Incentivisation to improve = Each time data was changed there was a log of the engineers name so penalties or rewards could be invoked, obviously a penalty system was preferable but based on the vast sums of money wasted every year there could be cash incentives

I'm not playing down the importance of change management here. To get several thousand field engineers to change their data habits is no mean feat and that is a topic for another day but the main lesson of this post is to look at areas of your business where a reality gap is starting to form.

  • Which parts of your information landscape have ageing legacy data that underpins your business model and is not currently measured?
  • Which systems are due for decommission and will need to be migrated?
  • What areas of your workforce input business-critical data but have received no training in basic data quality habits?
  • Which of your systems have no data governance or stewardship and are measured on outdated or inappropriate criteria?

There are data quality reality gaps in every business. They should not be feared but embraced.

They may just offer a goldmine of financial and productivity improvements.

 

What are your views? Are there data quality reality gaps in your business? What techniques are you adopting to address them?

 

 

Useful Resources

 

How to set data quality goals any business can achieve

15 Tips for transforming knowledge-workers into a data quality task force

How to create a data quality framework or data quality methodology:Essential resources to get you started

Lean techniques to help your data quality improvement initiative (Part 1: Time Value Maps)

Lean techniques to help your data quality improvement initiative (Part 2: Little's Law)

How To Create A Data Issue Assessment Process: Expert Interview With Ken O'Connor

Introduction to Guerilla Data Governance: An interview with Mike Meier

Data quality survey of senior management highlights the gulf between intention and action

DEBATE: Should Business Workers Use Data Quality Tools?

Creating An Internal Data Quality Community: Introduction (Part 1 of 4)

Reader Comments (4)

This is indeed a major problem, Dylan - ever since I've been working in data quality, the first issue has always been persuading people that there is a problem with their data quality. If they don't realise that there is a problem, no action will be taken to resolve it.

The eternal optimism many companies have about their data was demonstrated clearly in a survey commissioned by Capscan last year, and about which I wrote a white paper called "Data Quality: Perception versus Reality". It can be downloaded free from http://www.grcdi.nl/whitepapers.htm

Oct 22, 2009 | Registered CommenterGraham Rhind

Yes, I remember that well, I've actually included it in the resources section above, it clearly adds some statistics to my "gut feel" that this is endemic in practically every business, recommended reading without a doubt.

Thanks for adding your comments Graham, surveys like this can be incredibly useful for spreading awareness.

Oct 22, 2009 | Registered CommenterDylan Jones (Editor)

I work in local government and I see this a lot.

We have a lot of field workers, some are excellent and diligent with their data entry, others are very sloppy and it creates a lot of work for us on the data management side to regularly keep on top of. UNlike your example above we do get a lot of support calls but there is still a head in sand mentality on resolving this, lots of quick fixes and back end changes mostly. There is rumblings of software purchase to look at resolving some of the problems because we;ll be integrating with other systems soon.

Like the point you make that none of this really costs a great deal cash wise, but we're really lacking leadership, it's the same old prpoblem, no-one is prepared to stand up and take the reins, probably for fear of failure.

Great article, though, can really relate to the message.

Oct 22, 2009 | Unregistered CommenterJ.Lewes

I think your point is well taken that among the rank and file, folks imagine just one purpose for data and therefore don't consider the ramifications that their shortcuts and inconsistencies can cause. There's a whole data governance education process that has to take place - and it's not an easy sell.

Seems maybe though that at the senior management level, data quality isn't necessarily taken for granted any more - here's an excerpt from a blog post on a recent study that leads me to believe that the problem at least is getting some acknowledgment:

"A new Information Difference Research Study, The State of Data Quality Today, released in July provides a stark picture of the data quality dilemma still faced today. This study, sponsored by Pitney Bowes Business Insight and Silver Creek Systems, found, in a survey of 193 businesses across Europe and North America, that fully one-third of respondents rate their data quality as “poor at best” – and only 4% indicated that it was “excellent”"

The post did, however, go on to say that the study also found:
"* 42% have made no effort to measure or monitor the quality of their data
* 63% have no idea what poor data quality may be costing them"

So there is more educating to do at all levels...

Source: http://ebs.pbbiblogs.com/2009/08/05/is-data-quality-old-news/

Oct 25, 2009 | Unregistered CommenterM Ellard

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>