Search the site
Subscribe to Data Quality Pro

 via email            RSS Feed

external resources
« Necessity of Conceptual Data Modeling for Information Quality | Main | Introduction to Guerilla Data Governance: An interview with Mike Meier »
Thursday
Jun252009

Rethinking Data Quality: The Need for a Data Quality Profession

image In this guest post Arkady Maydanchik and David Wells, founders of the eLearning Curve, discuss the strong need for a data quality profession to create a sustainable improvement in the quality of data in modern organisations.

 

Rethinking Data Quality: The Need for a Data Quality Profession

 

Data quality is a persistent problem. Quality has been an issue since the dawn of the IT profession and becomes increasingly challenging as the volumes of data increase and the uses for data expand.

Information technology has advanced in so many areas. Why is it that we can’t overcome the data quality problem? Maybe the barriers lie in the way that we think about data quality.

Common approaches to data quality problems include processes, projects, and software products. Data quality processes involve activities such as data quality assessment, root cause analysis, and data cleansing. Quality improvement projects employ processes within a structure of business justification, planning, prioritization, resource allocation, execution, and monitoring. Data quality products primarily perform the relatively basic tasks of matching, de-duplication, address standardization, etc.

Processes, projects, products – each of these contributes to the efforts to improve data quality. But they haven’t solved the problems individually or collectively. To really make substantial and sustainable differences in the quality of data we need to take a different approach.

We need to think of data quality as a profession. We need to have data quality professionals.


A Data Quality Profession

You might argue that we already have IT professionals doing data quality work. True, but therein lies the problem. IT professionals doing data quality work is not the same as data quality professionals doing that work. That may appear to be a subtle difference, but it is really quite significant. The significance becomes apparent when you consider the nature of professions.

Wikipedia describes a profession as "a vocation founded upon specialized educational training" – a good place to start understanding the significance of a data quality profession.

How many data quality training classes have you attended?

How many have your peers attended?

Many will answer one or two, but they may have to search old memory to recall the data quality class taken back in the 1990’s. The state of data quality education for most IT professionals hardly represents extensive learning and depth of knowledge.

A profession is more than simply a category of employment. Professions are distinguished by several characteristics including a defined body of knowledge, best practices, and mistakes to avoid. Let’s examine each of the characteristics to consider the nature of a data quality profession.

 

Data Quality Body of Knowledge

The data quality body of knowledge is something that is evolving in a field that is still emerging as a profession.

Much work remains to be done to organize an agreed upon body of knowledge. Even more must be done to create comprehensive education programs which would offer data quality professionals strong fundamental knowledge and practical skills necessary to face the multitude of data quality challenges. It is apparent, however, from the literature and the efforts to date, that data quality is a broad and deep field.

Its body of knowledge encompasses topics such as:

  • Quality principles – TQM, Deming, Six Sigma, etc.
  • DQ processes – profiling, assessment, correction, repair, prevention, etc.
  • DQ methods – procedural and rule-based data quality
  • DQ technology – data cleansing, matching, de-duplication, standardization, etc.
  • DQ teams – collaboration, communication, roles, responsibilities, accountability, etc.
  • DQ projects – assessment, data cleansing, process improvement, etc.
  • DQ in IT projects – data warehousing, application development, data migration, master data management, ERP implementation, business intelligence, etc.


Data Quality Best Practices

It is impractical to attempt an exhaustive list of best practices in data quality. Many experienced practitioners will agree that the following items belong on the list:

  • Data quality problems usually result from systemic and global causes. Local fixes don’t work.
  • Fix problems, not symptoms. Root cause analysis is important.
  • Measurement and monitoring are essential parts of any data quality program.
  • People, processes, and technology are all sources of data quality defects.
  • Data quality management is a multi-disciplinary field that demands both business and technical knowledge and skills.
  • Designated roles, responsibilities, and accountabilities are fundamental to data quality management.
  • Designate authority to make decisions and take actions that is commensurate with the level of responsibility and accountability.
  • Distinguish between data ownership, stewardship, and custodianship and recognize the importance of each role.
  • The business must be actively engaged, involved, and participative for data quality efforts to succeed.
  • Sustained data quality is achieved with long-term programs, not short-term projects.


Mistakes to Avoid

Similar to best practices, mistakes to avoid are many. It is impractical to attempt a complete listing here. Among the common mistakes are:

  • Saying that data quality is “everybody’s responsibility” – that is the same as saying nobody is responsible.
  • Treating data quality as an IT problem.
  • Lack of a business case for data quality efforts.
  • Seeking the easy fix or the “silver bullet.”
  • Reactive data quality management using a repair-only approach.
  • Lack of data quality standards – especially for master data and shared reference data.
  • Absence of measurement or lack of targets that give the measures context.
  • Lack of expertise, both business and technical.
  • Believing that data knowledge or data modeling skill is an acceptable proxy for data quality skills.
  • Insufficient measures – counting defects but failing to quantify cost or impact.


In Conclusion

This short post cannot possibly describe all of the complexities and challenges of managing data quality.

A quick look at the scope – body of knowledge, best practices, and common mistakes – makes it apparent that data quality is a challenging and multi-faceted field. And it needs to be recognized as a distinct field that overlaps with but is not contained in any of IT, data warehousing, business systems, database development, or information systems.

It is time that data quality becomes a profession and skilled practitioners make the leap to becoming data quality professionals.

 

What do you think of this post? Do you agree with the authors? Why not add your comments in the section below and add to the debate?

 

Author Profiles

Dave Wells is a consultant, author and educator with nearly four decades of IT and data management experience. He is a co-founder and Director of Education at eLearningCurve LLC, a provider of on-demand technical training via the Internet. You can contact Dave by email at david.wells@eLearningcurve.com

Arkady Maydanchik is a co-founder and Managing Director at eLearningCurve LLC. He is a recognized practitioner, author, and educator in the field of data quality and information integration.

(Please note, this article first appeared in TDWI Flashpoint, March 2009).

 

Useful Resources

 

eLearning Curve Website

Data Quality Rules: General Attribute Dependencies by Arkady Maydanchik

Data Quality Rules Tutorial: Rules for State-Dependent Objects by Arkady Maydanchik

Data Quality Rules by Arkady Maydanchik: Rules for Event Histories

Data Quality Rules by Arkady Maydanchik: Rules for Historical Data

Data Quality Rules by Arkady Maydanchik Relational Integrity Constraints

Data Quality Rules by Arkady Maydanchik: Attribute Domain Constraints

Reader Comments (10)

I would like the authors to expand on their assertion that is is a mistake: "Saying that data quality is “everybody’s responsibility” – that is the same as saying nobody is responsible.", with which I profoundly disagree.

This may be down to semantics. An assertion that not everybody is ULTIMATELY responsible for data quality ENTERPRISE-WIDE I could agree with. But if we don't inculcate the idea that each person within a company has ultimate responsibility for the quality of the data they deal with, during the period that they are dealing with it, then we will always be busy with "reactive data quality management using a repair-only approach."

A sales director in a company might be ultimately responsible for sales enterprise-wide, but each of the sales staff understands that they are largely responsible for the sales that they make - they affect the bottom line of the company and their jobs. It should be no different with data quality. A call-centre worker should be made to understand that data quality affects the health of the company they are working for, and, taking into account the tools that they have at their disposal, they are responsible for the quality of the data being gathered by them from the customer and entered into systems by them, during the time that they are dealing with it. That this is not the case is often down to conflicting requirements, such as that as many customers as possible are dealt with in the shortest possible time, and that that is how the effectiveness of the agent will be measured.

I cannot see how this approach, where each person is made to understand the importance of, and their responsibility for, data quality, could ever make people feel that "nobody is responsible".

Jun 25, 2009 | Registered CommenterGraham Rhind

I would like the authors to expand on their assertion that is a mistake: "Saying that data quality is “everybody’s responsibility” – that is the same as saying nobody is responsible." I agree. If you do not assign "responsibility" or accountability for something, nobody will step up to the task. I would make the statement that Data Quality is the responsibility of data creators and data consumers. Those application and business processes that create/modify/read the data must also be held accountably for it quality as measured against a defined set of requirements.

Jun 25, 2009 | Unregistered Commenterwalt zaremba

Good article - thanks! I think there's an element of truth in the responding comment on responsibility - but that the issue is one of semantics - and in actuality both the article and response are dead on.

Yes, everyone has a part to play in data quality - it's akin to saying that within a society, everyone has a responsibility to uphold the law and to point out its flaws, if any. Data quality depends on everyone who touches it - how they input it, update it, and use it.

However, the article's right, too. At the end of the day, using that society analogy again, we need designated professionals who specialize in setting DQ rules (akin to judges), and advocates of the rules (lawyers), and enforcers of the rules...and bringing in people who are specifically focused on the DATA rather than the tech part of things makes a ton of sense...

Jun 25, 2009 | Unregistered CommenterM Ellard

I agree that there is a need for a dedicated Data Quality professional. However, how does one go about becoming a Data Quality professional?

I've been a data integration professional for many years and have run into data quality issues over and over again. However, any attempts at the time to cleanse the data didn't fit into the project plan, and so we integrated bad data (I did some cleansing in the ETL, but by no means was it comprehensive).

From my experience I would like to become a Data Quality professional and feel that my data integration background is a good start. However, where do I go from here?

There are articles, and books, and now eLearningCurve, but will that qualify me as a Data Quality professional, or is there some other step to be "certified"?

Jul 4, 2009 | Registered CommenterCharles Burleigh

Let me address the statement, "Saying that data quality is “everybody’s responsibility...”. When management "says" such a statement it is like a mission statement or slogan of the week. It will be ignored by everybody including the management that said it because it is not actionable. Nobody in the organization has any idea how they can have data quality and since nobody has been fired, the data must be good enough.

Notice the statement doesn't say anything about the state of data quality. It doesn't say, to improve, keep the status quo, or define what data quality is. An ambiguous phrase like this will be ignored by most people.

Another aspect is that while everybody is responsible, nobody has the authority to enforce DQ nor the resources to be effective.

Beyond the need for a Data Quality profession, I think organizations need to establish formal Data Quality departments modeled after the ISO 9001 standards or something similar. The primary organizational feature is the depart head reports directly to the CEO. The DQ department is NOT under the CIO, the CFO, or any other department. This helps insure that quality related activities and decisions are independent and unbiased. Such a department has a (reasonable) budget and the requisite authority in addition the responsibility to audit and enforce any policies and procedures.

Jul 9, 2009 | Unregistered CommenterRuss Ingersoll

In response to Charles' note about how do I become a data quality professional?

It's a great point Charles and I think it is really no different to any other profession.

Education is certainly improving in our field. We've obviously been posting regular tutorials here and passing on techniques which directly help the profession and of course Arkady and his partners at eLearningCurve are completely changing the game when it comes to more formal, structured education using the online format and training from recognised experts in our field. I know they're signing up other data quality trainers too.

I actually spoke to Arkady recently and he did point out that their examination system is tough, I was incredibly relieved to hear that because it obviously dictates the need for real-world experience and an application of the skills he teaches.

I know the IAIDQ have planned to launch a certification scheme since last year, and I believe it is still moving forward.

I used to be responsible for interviewing and hiring data quality professionals and I found a massive disparity in the quality of people's skills. Lots of people had done some basic analysis work and were calling themselves DQ professionals (gurus in some cases!) and yet when I posed simple problems and asked for a demonstration of fundamental techniques, most really struggled.

So, I would say a DQ professional must possess both an extensive personal toolkit combined with a demonstrable experience of applying those skills.

Anyone can read a book or complete a course but to be professional you must have applied your trade, it is no different to any other sector in that regard.

My advice is to work with a progressive organisation that has a real vision for data quality, not just something like a data warehouse cleansing project that can leverage your ETL skills.

If you look at some of the past data quality conferences there are scores of organisations that have presented, I have a list of them if you want it, I would reach out to them and get their opinion, find out what skills they would expect of you to work in their team. I know your background so I know you would transition very well indeed and would be a good asset from day 1.

Get your blog moving, write about what you have learned and the journey you're on, the community will help you get your message out there and hopefully you'll attract the attention of an organisation looking for those type of skills in a data quality context.

In the meantime I would go back through data quality pro and complete every tutorial we have posted, use the free tools we have created and those of others we list and start to get your head around what a data quality initiative really looks like. Don't be afraid to reach out to other members for help.

Use our community marketplace, why not post an OFFERED or WANTED ad for your services, I will put that in a prominent place in the newsletter and new magazine we're launching, heck I'll even do a mailshot if it will help!

One parting point, you've clearly got the right mindset and the motivation, you know it's not just about cleansing and sophisticated tools, this is very important and hopefully a progressive organisation will help you develop your skills moving forward.

Jul 10, 2009 | Registered CommenterDylan Jones (Editor)

Any business process will starts from the best quality data. Based on my experience in Oil and Gas business, I personally agree that data quality is “everybody’s responsibility”, meaning everybody, at least, who knows there is wrong data, he or she must inform the accountable person (Data administrator or Data manager or user) to fix the problem (s).

Why Data administrator/ manager:
He/ she has the responsibility to administer or manage the data as a data quality controller in the database

Why user (s):
He/ she is technically/ professionally well known about the data as a data quality assurancer. Geologist should know about their well data, Geophysicist must know about their seismic data, Librarian must know about their catalogue or physical data, etc

It is well known in some companies, the data administrtor is the place for complaining if there is something wrong with the data. Too bad!!

Jul 10, 2009 | Unregistered CommenterPrajuto

I fully agree that data quality professionals are needed to support sustainable data quality initiatives. Some of the key business benefits cited by customers include increased confidence in data, improved reporting, and improved efficiency through process improvement based on higher quality data (http://www.informatica.com/products_services/data_quality/Pages/data_quality_benefits.aspx ).

So given the importance of data to a successful business, it is critical to have business and IT roles within the organization who are certified to design, implement and maintain data quality processes. In fact, the International Association for Information and Data Quality (IAIDQ) is currently in the process of defining a professional certification (see http://www.iaidq.com/main/ciqp.shtml), including roles and processes.

Responsibilities, roles, processes and structures have evolved over time. The natural barriers to progress have included lack of awareness, lack of business buy-in, lack of recognized data quality processes and lack of availability of relevant tools for the business and IT. Over the last few years, many of these barriers have been overcome. Business-IT collaborative processes have been defined (See Data Quality Demo: http://vip.informatica.com/?elqPURLPage=1426 ). The business is responsible for understanding the source data, defining data quality targets, specifying data quality business rules, managing exceptions (e.g. the failed records as part of a batch process), monitoring data quality (e.g. > 98% for key attributes within an enterprise process) and performing root cause analysis. IT is responsible for implementing data quality rules and deploying those rules within the IT data infrastructure (Informatica Data Quality and Data Integration Brochure: http://www.informatica.com/INFA_Resources/br_dqanddi_6787.pdf).

So we can see that business-IT collaboration process is very important for the sustainability of data quality processes, which in all cases involves a culture shift to a “get it right first time” culture. To support a cultural shift, a data quality certification program is a potential catalyst to drive change. So I look forward to the availability of IAIDQ data quality certification programs to support the sustainable data quality initiatives thereby ensuring ongoing business value generation.

All

The IAIDQ's Certified Information Quality Professional certification programme is still moving forward towards final delivery of a holistic syllabus that addresses the broad scope and depth of the various skills required to be an effective information (or data) quality professional by the end of this year.

The bottleneck we have at the moment is primarily one of funding, which we are working through at the moment. We have received massive support from individual practitioners and have raised a significant amount of funds in that way. The generosity of spirit and strong words of support we received have motivated the team working on the certification still further.

We are now waiting on further sponsorship to help us cover the not insignificant costs of producing a globally applicable robust certification syllabus with an internationally recognised certification partner. We are hopeful that that we will be clearing that bottleneck soon (and Tommy - Informatica are more than welcome to throw their weight behind the wheel here).

We have a number of research reports in the pipeline as by-products of the extensive and intensive research that was done in the early part of 2009 as part of the design of the certification framework. We are intending to publish these soon as follow ons to our Salary Survey and Job Satisfaction Report.

The key point about the IAIDQ's CIQP syllabus/framework is that it will not replace existing books or courses but will provide a structured benchmark for skills and knowledge that we intend will be the benchmark to aim for. It will also require more than just "book learning" to become a CIQP.

Sep 1, 2009 | Unregistered CommenterDaragh O Brien

A great article. I fully understand why the author is say what is he is saying.

However, I believe that founding a data quality profession is the one measure that is bound to perpetuate data quality issues.

Why?

Because the ultimate aim of a data quality professional is to eliminate data quality problems - i.e. to make themselves redundant! Do you know of any profession, or can you even imagine one, who would be committed to this? I thought not.

The existing problems with data have their roots in IT when data got separated from function. This gave rise to business analysts who modelled function (which they incorrectly termed 'process' - but that is another problem) and and entirely separate set of people who modeled data.

Data only exists in an enterprise to support function. No other reason.

So, following the principles of TQM, if we assure the quality of the data being created and transformed by functions, we assure the quality of all data.

More succinctly, manage the functions and you manage the data.

Function and data are inextricably linked, so the only way to eliminate data quality issues is to eliminate the separation of the roles of business analyst and data analyst.

These roles most be combined.

May 17, 2010 | Unregistered CommenterJohn Owens

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>