Accelerate your data quality, data governance or MDM career with a featured interview on Data Quality ProTIP: Want to be featured on Data Quality Pro (and get to the top of Google)?

Print to Page   |   Contact Us   |   Report Abuse   |   Sign In   |   Become a member
News & Press: Industry News

Are You Afraid to Say Goodbye to Your Data?

22 July 2010   (3 Comments)
Posted by: Dylan Jones
Share |

There was an interesting comment regarding the percentage of valueless data in organisations that was posted out of the MIT IQ conference last week, picked up on Mark Goloboy’s blog post Five New Ideas From 2010 MIT Information Quality.

According to Mark, the figure quoted at the conference was 60-90% of data is valueless but is this a realistic value?

Are Companies Afraid to Say Goodbye to Their Data?

Based on a number of data migration projects I’ve been involved with I would have to wholeheartedly agree with the kind of figures quoted in Mark's article.

As an example, when identifying data in scope for a migration I typically start from the premise that ALL data is out of scope unless someone can justify its existence. (This forces the emphasis back on the business to justify their use of the data).

In most cases at least half of the information was found to have limited value and could be cut from the target system, typically to significant cost savings as every item of data incurs modelling, data quality, data mapping, transfer coding and extensive validation.

The Causes of Growth

Why have data volumes grown so excessively? There are plenty of reasons:

  1. Storage is affordable and accessible
  2. Data warehouses need feeding with data (but how much of that data is transformed to actions?)
  3. Applications are designed that have attributes/data structures that are bloated and in many cases redundant
  4. System silos lead to replication of corporate data
  5. Mergers and acquisitions are commonplace, data often comes with the deal
  6. There is no archive strategy

I believe the main reason data volumes are growing though is simply because of the last point, organisations are not very good at developing an archival strategy to remove stale data.

The impacts of this growth are numerous:

  • Increased staffing costs to tune and manage the data
  • Additional cooling/infrastructure costs
  • Reduced query performance
  • Backup windows become compromised
  • Data integration and data migration become far more complex and costly
  • New IT projects take longer and are more prone to failure
  • Slower performance lowers knowledge worker productivity and increases costs

The Impact of Stale Data on Data Quality Management

There is a danger in assessing data quality across stale data as it can dramatically skew your findings.

If the data quality was found to be poor historically (perhaps there was a lack of completeness in the past but now there are far less data "gaps”) we may incorrectly assume that our improvement process is working correctly.

I recall an organisation that upon receipt of their new data profiling tool pointed it at their billing system.

They were horrified to find thousands of historical errors in tariff coding, product code allocation and many other issues. The problem was that the company had shifted their business model from offering products to focusing far more on services. In addition, many of the customers incorrectly billed in the past had terminated their accounts. By taking a data quality assessment of this historical data the company were in fact providing no real insight into data quality across their current business model.

Yes, they discovered they had badly designed processes but a workshop with the knowledge workers confirmed the same insight within a few minutes. What they should have been focusing on originally is how data quality impacts their business TODAY.

Designing an Archive Strategy - Getting re-use from the Data Quality Team

There are a number of techniques common to the data quality practitioner that can play a useful role in the decommissioning of your corporate data:

  • Information chain mapping:Help identify the flow of information across the enterprise so that any downstream data consumers can be assessed for potential impact from decommissioned data
  • Data profiling:Analysing the statistics of data elements (records/attributes) can help identify redundant data that can be eliminated
  • Data matching/relationship discovery: Can help identify dependent data in disparate systems so that a synchronised process of data removal can take place
  • CRUD analysis: Identifying which applications Create, Read, Update or Delete data is of great importance when determining which datasets can be archived

So What Next for Data Quality and Data Growth?

Archiving data is nearly always initiated by IT. If you’re on the business side start the discussions now and play your part because there are significant benefits to the business community in archiving off data and by waiting for something magical to happen without your involvement means it will simply never get done.

Note that we’re talking about archiving, not deleting, typically on a readily accessible medium.

The data can still be maintained in an offline store for compliance or reporting requirements but particularly if you want to reduce the costs of your data quality management efforts and create a more effective workforce, it may just be the time to collaborate with your IT colleagues and begin the essential activity of creating an archive process.

If you find after several months that on no occasion did you need to dip into the archive to retrieve some past information it may be time to archive to tape, store offsite and cut loose for good.

Comments...

Dylan Jones says...
Posted 11 October 2011
@Larisa - it's a good point you make about data governance, archiving strategies shouldn't be tactical they should be built into the fabric of the organisation. Every item of data must have a defined life cycle and of course a fond farewell must be included, great comment, thanks for dropping by. @Jim - what an incredibly insightful comment! Thanks for really extending this, I particularly liked your point about the data requirements of an organisation are in constant flux, just as we need to eliminate stale data we also have to be open to adapting our systems to new information, when we do this it's probably a good indication that some legacy data is past its sell by date. Great comments folks, thanks again.
Jim Harris (OCDQBlog.com) says...
Posted 11 October 2011
Great post, Dylan. One of my favorite Tom Redman quotes is: “It is a waste of effort to improve the quality of data no one ever uses.” Timeliness, as a data quality dimension, is defined by Danette McGilvray as: “A measure of the degree to which data are current and available for use as specified and in the time frame in which they are expected.” Data is an abstract description of the real-world. Not only is there a digital distance between data and the entity or event that it describes, but there is also a time lag between when the real-world changes and the data is updated. Danette McGilvray calls this information float, which can be manual or electronic: “Manual float is the delay from when a fact becomes known to when it is first captured electronically.” “Electronic float is the time from when a fact is first captured electronically to when it is moved or copied to various databases that make it available to those interested in accessing it.” Information
Larisa Bedgood says...
Posted 11 October 2011
This is a great point you bring up. So much data gets "swept under the rug" so to speak. However, the repercussions are that this data still exists and clogs up the database. Data validation initiatives certainly should be the first step in weeding out the good from the bad. By implementing a data governance system and archiving old data, businesses will be pleasantly surprised on how much time and money can be saved. Thanks for sharing.

Search Data Quality Pro
Please sign in here >
Data Quality Journal
Event Calendar

06/02/2012 » 07/02/2012
Gartner BI Summit, London, 6-7 Feb

08/02/2012 » 09/02/2012
Gartner MDM Summit, London, 8-9 Feb

08/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

15/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

22/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

Online Surveys
Popular Demos