Accelerate your data quality, data governance or MDM career with a featured interview on Data Quality ProTIP: Want to be featured on Data Quality Pro (and get to the top of Google)?

Print to Page   |   Contact Us   |   Report Abuse   |   Sign In   |   Become a member
How to transform your ETL tool into a data quality toolkit
Share |

How to transform your ETL tool into a data quality toolkit




ETL Data Quality

Many organisations have some form of ETL (Extract-Transform-Load) tool in their business but how many are really exploiting their potential?

ETL products have been around for many years and have more than proven their worth for shunting data around the enterprise but it is surprising how many organisations have failed to recognise their potential as a data quality tool.

In this post we will present the case for using ETL tools to help get your data quality initiative off the ground.

We're not claiming they will be a direct replacement for the high-end data quality platforms but they might just give you an interim solution that will help you identify some quick wins that could pay for that more expensive data quality suite you've been struggling to get budget for.



Author: Dylan Jones
Published: May 29, 2009

Editorial Categories: [TB.3] Data Quality Rules and Requirements



Transforming your ETL tool into a data quality toolkit

In order to determine whether ETL tools can deliver a useful data quality function we need to come up with some form of benchmark. We need to classify the typical data quality tasks a business can undertake and assess whether a typical ETL tool can deliver against these needs.

To help us classify these needs we're going to use the "Data Quality Assessment" framework created by Arkady Maydanchik in the book of the same name.

Aside from being an essential read for any data quality practitioner, this book provides an exhaustive list of data quality rules that organisations should ideally be monitoring on a regular basis.

In short, this provides an excellent framework for our ETL benchmark.

After assessing each rule against the capabilities of known ETL tools we created a simple compliancy rating against each rule. Compliant, partial compliance, non-compliant. We discovered that nearly 70% of all the rules Arkady covers in his framework can be supported by ETL tools. 23% are partially compliant and only 7% are non-compliant.


What next?

1. Understand the underlying principles of data quality assessment

Clearly, building a data quality framework with ETL is feasible but you must build out from a robust data quality assessment framework. We recommend the framework developed by Arkady Maydanchik as there is nothing that we're aware of that comes close for depth and completeness.

If you want to learn more, here are some options:

  1. Read some of Arkady's work here on Data Quality Pro
  2. Get the book on Amazon
  3. Complete the online Data Quality Assessment tutorial provided by Arkady at eLearningCurve


2. Trial data quality improvement on high payoff areas of the business

If your organisation has ETL licenses, spare development resources and a reasonable understanding of data quality rules management we can now begin to trial some data quality initiatives.

But where to start? The very place that ETL tools are typically deployed is a great place to go hunting.

I've lost count of how many horror stories we have heard on our sister site Data Migration Pro where teams ignore the need for sound data quality management and data discovery, opting to dive straight into ETL development.

Our advice is to use the ETL tools earmarked for the integration or migration development to instead discover, validate, improve and control the data first.

Already finished your data migration or data integration? Why not use your ETL tool to monitor ongoing data quality health in the new target environment?

There are literally hundreds of scenarios where you can deploy ETL tools as a data quality workhorse in any organisation. Monitoring data quality rules along an entire customer service fulfillment chain for example is a great place to start.

Some of these articles should help inspire you to action:


3. Focus on creating capital to re-invest into ongoing data quality improvements

At the start of this post we highlighted the fact that long-term, it really does pay to invest in a full-blown data quality platform. The total universe of data quality rules you can manage with specialist tools is clearly greater than those of ETL tools.

What we have prescribed here is a simple "stepping-stone" approach.

By leveraging ETL tools already in your organisation you can target the low-hanging fruit which can provide much needed cash for additional staff and specialist technology to help you expand your data quality initiatives.

Case in point: One telecommunications company in the UK did exactly what was prescribed in this post. They looked at some of their most business critical processes and identified several key processes that were experiencing data quality failures at different stages of the service chain.

They implemented an improvement programme using an ETL tool that they were familiar with and had spare licenses. They modelled data quality rules and business process rules across the full life cycle of expensive fibre provisioning and created a management reporting dashboard that highlighted anomalies.

The cost-savings were dramatic. Not only did the customer retention increase as they eliminated process defects, they slashed scrap and rework costs and increased the amount of spare equipment that would have previously been re-purchased at considerable cost. All of this created significant capital that could be re-invested in other improvement initiatives.



Useful Resources Related to this Feature:

[TB.3] Data Quality Rules and Requirements  
DQ and Business Rules Explained with Ronald G.Ross Link  more ] Administration 04/11/2011
Data Quality Rules Process for Data Migration Link  more ] Administration 07/10/2011
Data Quality-Centric Data Migration, John Morris-2 Link  more ] Administration 07/10/2011
Data Quality-Centric Data Migration, John Morris-1 Link  more ] Administration 07/10/2011
Data Quality Rules: General Attribute Dependencies Link  more ] Administration 07/10/2011
Data Quality Rules: Rules for Historical Data Link  more ] Administration 07/10/2011
Data Quality Rules: Attribute Domain Constraints Link  more ] Administration 07/10/2011
Data Quality Rules: Integrity Constraints Link  more ] Administration 07/10/2011
Data Quality Rules: State-Dependent Objects Link  more ] Administration 07/10/2011
Search Data Quality Pro
Please sign in here >
Data Quality Journal
Event Calendar

06/02/2012 » 07/02/2012
Gartner BI Summit, London, 6-7 Feb

08/02/2012 » 09/02/2012
Gartner MDM Summit, London, 8-9 Feb

08/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

15/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

22/02/2012
Online DQ Business Bootcamp: Practical techniques & support for DQ business owners and professionals

Online Surveys
Popular Demos