DEBATE: How should data governance and data quality work together?
Many organisations have started to recognise the need for corporate data governance. However, there are still a number of grey areas that surround the interconnect between data quality management and data governance initiatives.
In this post we want to open a debate on the relationship between these two disciplines.
Information quality expert C.Lwanga Yonke has responded to a similar debate started on the IAIDQ member forum, providing a detailed account of how he sees the relationship playing out.
We welcome further debate as this is clearly an issue that many organisations are going to face as they increase in information management maturity.
DEBATE: How should data governance and data quality work together?
I originally posted the following message to the IAIDQ member forum to get this debate moving:
There appears to be some confusion over the breakdown of responsibilities between data quality and data governance so I'd like to open it out for debate.
What are your views.?
- Should they be integrated?
- Can they be run separately?
- Is there a master and servant relationship? If so - which is the
- master?
- Does it vary for different organisations based on their maturity?
- Have you implemented both in unison - what are your experiences?
-----------------------------------------------------------------------
(C.Lwanga Yonke starts this debate with the following comments):
(a) The 2008 IAIDQ State of Data Governance Report dealt directly with these topics. Here is a sample of the survey findings:
Overall Objective of DG effort (p. 18)
· Improve data quality: 80% of respondents
Comment: this clearly shows that most DG programs out there are aimed at improving data quality.
Relationship between DG and DQ leadership (p. 28)
· DG and DQ are led by the same person: 37% of respondents
· DG and DQ are lead by different people who report to the same manager: 17.5%
· DG and DQ are led by different people who report to different managers: 19%
· There is no specific person in charge of DQ: 17.5%
· Other: 9%
Comment: I am not sure but I guess that this is a function of organization size. The survey analysis did not explore this angle.
Primary activities of DG efforts (p. 20)
· Standardize data definitions across the organization: 71%
· Define and standardize common business rules across the organization: 54%
· Select and charter DQ improvement projects: 50% of respondents
· Measure the cost of low quality data: 25%
· Measure the value of high quality data: 23%
The DG report and the Ask-The-Expert webinar slides are still available on the IAIDQ webpage http://iaidq.org/publications/
(b) My own thoughts:
I think DQ cannot exist without some DG. In fact, one of my first tasks as a DQ Manager was to put in place a data governance/data stewardship program.
That program put in place a structure for decision authority and rules about matters impacting data quality and data management (definitions, data quality requirements and rules, etc.)
As for servant/master relationship, here DG would be servant to DQ (as another data point, our CIO believes and proclaims to everyone that the most important function of his department is to deliver quality information to his business customers.. – that’s actually the formal text of the IM&T Department’s mission). Indeed we are rather DQ-centric here.
I guess for larger organizations, the two functions can be separated, but IMO they must closely work together.
Dylan also asked: ”does it vary for different organizations based on their maturity?” Great question. I would imagine that the more mature the organization, the greater the realization that DQ and DG must be combined under the same roof.
What the DG report also highlights is the absolute lack of consensus out there about what DG is and is is not, and about what activities are covered by DG and are not. There is still plenty of work in this area.
-----------------------------------------------------------------------
What do you think about the relationship between data quality and data governance and the questions we raised above? Why not add your comments below.
C. Lwanga Yonke is a seasoned information quality and information management expert and leader. He has successfully designed and implemented projects in multiple areas including information quality, data governance, business intelligence, data warehousing and data architecture. Lwanga Yonke is a founding member of the International Association for Information and Data Quality (IAIDQ) and currently serves as an Advisor to the IAIDQ Board.


Data Governance
Reader Comments (15)
First of all, I really appreciate the fact that this question is even up for debate. It seems to me that Data Quality (DQ) is not talked about much directly in enterprise initiatives, but instead remains in the silent background of projects focused on Business Intelligence (BI), Master Data Management (MDM) or Data Governance (DG).
I personally gained a much more comprehensive understanding of DG at the Data Governance Summit at TDWI World Conference Chicago 2009.
To briefly answer the questions above:
I believe that DQ and DG should (and must) be integrated. DQ can be run separately from DG (for example, cleaning up a customer list for a mass mailing program), however DG can NEVER be run separately from DQ.
As for the master/servant relationship, I see the data and its users as the master and both DG and DQ as their servants. However, I could understand an argument being made for DG as the master because of the organizational (i.e. non-data related) aspects of DG that are necessary for coordinating business processes and communicating effectively with the people involved.
As for does it vary based on organization maturity, well obviously yes. DG (again because of its non-data related aspects) requires more organization maturity in order to be successful. Both DQ and DG can be negatively affected by data silos and organizational fiefdoms, however DG can NEVER be successful without breaking through those barriers, whereas DQ (although not recommended) can be somewhat successful in isolation.
I agree with Lwanga (and the IAIDQ report) that there remains a lack of consensus out there about what DG is and is not. However, I am encouraged by the growing frequency of these discussions.
Personally, I think that the two initiatives can be run separately, but it sounds logical to me that an organization needs Data Governance in place before a Data Quality journey can commence.
With DQ, you need a sound organization to be in place that really owns the data and has the authority to make decisions. With a DG program, an organization can establish these roles like data owner and data steward and assign people to data objects. To my opinion, DQ has bigger chance for success when this is all in place.
If for example there is a DQ issue with credit limits for customers, who is the owner of that data: Marketing&Sales or Finance? This needs to be clear before DQ can start.
In brief: first have the organization in place (policies, processes, roles & responsibilities) and then start a DQ initiative.
In my ideal world, regardless if the organization is CMM "Initial" or "Optimized", both activities should be managed by one department: an independent department that is responsible for master data and all related aspects, assisted by business data owners and data stewards.
Comment by Gary Buttsworth:
"I see this as Data Governance being the development and implementation of policies for data quality compliance or standards. These are what should inform and enforce data quality standards in practice."
Posted by Tom Jesionowski:
"You really posted the data management equivalent of the chicken or the egg question. The two practices tightly integrate with one another. Data governance cannot succeed without data quality components and data quality initiatives cannot succeed without data governance process and policy connecting business objectives.
I have a bias towards Data Governance being the lead activity though, and here is why. Data exists to serve the business process, and not the other way around. Data governance provides linkage from the business processes and functions to the underlying data through the implementation of policy and process. Data quality may be one of the KPI’s for DG, but DG would not likely be a KPI for a DQ program."
One of the biggest distinctions between data quality projects and data governance is the TIME dimension, in my opinion. Companies may complete a project with a data quality component, but data governance only happens when you bake data quality processes into your corporate strategy. To do that, your corporation will need to take additional steps. These might include setting up a team who is responsible for managing data, documenting standard processes for handling data, policing those policies as new IT projects are developed, defining data models that work across business units, etc. Data governance is about the long term strategy of ensuring corporation-wide data quality. It’s coordinating a series of data quality projects to meet a business objective. It’s about empowering all your business users, no matter where they sit in your corporation, with information they can rely upon.
Fernando Martins wrote:
Data quality is part of data governance. To me it's obvious that data quality is a must, otherwise the "garbage in, garbage out" rule will apply.
They can be run integrated or separately, that will have to be decided by the architect depending on the business and infrastructure.
But all the work performed by the data quality process, should be sent back, if possible, to the data sources so that they also can benefit from it.
Organizations have some trouble in recognizing that their data quality is actually lower than they think, and therefor the data quality process is always seen as "unnecessary" or "some adjustments" during the integration process will be enough.
Fortunately, this is starting to change and organizations do realize the importance of data quality and do something about it.
Mike Walsh wrote:
Without Governance, why bother trying to enforce Data Quality (and by extension, useful reporting and analytics of data points).
Setting up a Data Quality contract, agreeing on some data quality standards and making statements that quality is important are all great. Without a process and people identified to enforce data quality all of those statements are just pointless plans.
Does every company need a full time "data steward"? I think not... All companies that care about their data and make decisions based on that data (are there any that don't?) need to think seriously about even basic forms of data governance.
Things like:
- Data Quality Contracts with consequences (Chargebacks? Call out at a leadership/steering committee meeting?)
- Auditing processes to ensure said DQ contracts are indeed followed (and by all parties, even those with the "keys" to the databases involved)
- Empowering individuals or departments with the role of "Data Steward" (even if not with the actual buzzword title ;-) ).
Good discussion point though. A lot of companies can do a lot better making Data Quality real. The results of bad data early in the process has huge ramifications on reporting and line of business applications.
This means that a company needs to understand this early and be willing to spend the time and money to do it right the first time. Otherwise every project (or attempt to initiate data quality after the fact) will have a high chance of failure (or being forced to spend the time and money later without the budget or resources for it).
Just my $0.02
Mike Walsh
www.straightpathsql.com
If I may comment, as a non data quality professional who is just starting to learn about the subject...
My background is in business process management, and in this arena it is governance that embeds the increases in quality delivered by process improvement projects.
Governance brings accountability; put simply, if there is an issue, whose head is on the block? In the process world, this is called process ownership, which seems to be analogous to the concept of a data steward. The two concepts can actually interact in the improvement and maintenance of data quality - data is changed by processes, so if a data quality issue arises, then the data steward should be looking for likely suspects amongst the process owner community!
I suspect that, as with processes, you can do quality improvement work without governance, just don't expect any improvements to stick.
Chris Vu wrote:
Data Quality and Data Governance are inextricably linked from any data management angle.
An acceptable level of data quality depends on the purpose that a data domain is being used for.
For example, a customer object used for a mailing list by a retail store may apply a medium level of data quality control. That same customer object (model) used by an airline to identify dangerous individuals will require a high level of accuracy, consistency, precision, etc - all attributes of data quality.
Data quality is a primary objective for effective Data Governance, defined as the organization structure, policies, procedures, and enforcement of standards for a given set of data domains.
Data governance can be applied to data sets without addressing the data quality - other factors such as security, privacy, availability, etc. are other drivers for effective data governance. However, a high level of Data Quality is nearly impossible to sustain with a sustained, data governance program.
Jeff Hart wrote:
Data governance defines the ownership, strategy, standards and prioritisation of data quality ensuring that often scarce resources are used in the most effective way and achieve the best possible outcome. In what order, by what mechanism (ie preventative versus reactive / corrective) and against what business rules are all key areas of data quality that are defined through data governance. Data governance brings order and control to what could otherwise be an infinite and sprawling exercise and is particularly important while the economic environment is tight.
It seems the consensus is that many agree that you really can't have data quality without data governance. It doesn't seem logical to have one and not the other. The tools should be integrated for better data migration.
Both should be brought into a larger tent--Quality. Businesses that have a Quality Department/Function typically have an outward, customer/product focus. When we fail to recognize that the business is composed of many internal customer/product boundaries, we also fail to get the bulk of the Quality ROI.
All governance is about consistency and the avoidance of unpleasant surpirses. Quality problems are perhaps the most unpleasant of surprises no matter wehther you purchase the product in a retail outlet or are handed the product by the person in the next cubilce.
Over-specialization is destroying us. For more exploration of these concepts, see http://www.bi-keep-it-simple.blogspot.com.
I think people are on the right track linking Data Governance and Data Quality. No need to rehash above. My one input would be that Data Quality should be the measuring stick for the success of Data Governance programs.
I do feel that some people align Data Governance too closely with data modeling and data object definition. There are examples in the comments above. Sure clean data is the end goal, but Data Governance is the journey to get there. Data Governance is more of a cultural shift starting with 1) Aligning business strategy toward a common goal; 2) Building definitions, re-engineering processes & updating systems to represent those aligned business strategies; 3) Defining data objects to store the common definitions; and finally 4) Measuring success based on quantitative analysis of Data Quality. Without that, Data Governance becomes a theoretical data modeling exercise, and all of our work is minimized. I'm sure this is more of a top-down approach than the classical definition, but it's where Data Governance will go next.
More thoughts on the topic... http://www.markgoloboy.com/category/data-governance/
Cheers!
Mark
While Data Governance and Data Quality are linked and have a relationship, I disagree with them being run by the same lead as they are not necessarily the same in purpose and role.
Data Quality should be done by "data" folks who have the tools to massage and manange the data in excruciating detail. This is the embodiment of the policies and procedures that Data Governance would implement.
Data Governance types may be either business folks making the business decisions/case for how the data is to be utilized or past data folks - usually works best if there is a combination of both types.
They discuss ownership, business rules, and enforcement of data quality but they are not the ones cleansing, reproducing, maintaining large files.
Where it seems to go sideways is at the individual user level. Are they a person enforcing, being dedicated to by Data Governance or are they a Data Quality Assurance person? And the answer is YES because they are the user and that is where the data gets squirrelly, not with the data folks or the governance folks.
That unholy trinity of user-owner/data mgr/data gov type are responsible for the quality of data and enforcement of data governance edicts. That's how you get the ridiculous statement "Everyone is responsible for data quality" and yet no one is truly held accountable. Clear definitions of roles and responsbilities will make the overall difference in the success of any data program and while everyone has input and a role, not everyone has the same role and responsibilities.
We need to identify a clear model which describes the scope, relationships and services agreement between DQ and DG. I see a clear dependency of DG on DQ as an enabler and a control, and DQ forming a subset of the DG implementation.
I further agree with previous comments which state that DQ can succeed in isolation - to a degree.
In my opinion, real world DM operational models need to be defined using a common vocabulary, tested and then analyzed. This will provide an environment to compare and evolve the preferred models.
Until then, for an effective DM program to succeed, I believe you need DG with DM as a critical sub-component.
To answer the questions:
- I believe they should be integrated
- I believe that only DQ can run separately
- I believe that If they are both active then DG is the master
- I believe DQ will precede DG in the maturity model
- We have used DQ to successfully build the case for DG and are targeting DM as a key enabler and control for the DG