Search the site
Subscribe to Data Quality Pro

 via email            RSS Feed

external resources

Data Quality Forum (General Discussion) > Data Quality is not an IT issue 

I am sick and tired of reading statements as: Data Quality is not an IT issue – it’s a business issue. Come on. Of course it is a business issue. Who ever claimed otherwise. Also ERP is a business issue, CRM is a business issue, BI is a business issue and Data Quality is imperative in making ERP, CRM and BI successful.

But if data is electronically stored there is technology involved – right? And if technology is involved I find it fair to discuss different kind of technology products, architecture, deployment tactics, algorithms and so on.

Many challenges are better and more cost effective solved with technology. That’s why we use word processors with spelling check. If I didn’t have a word processor with English spelling check you would regard me as even more stupid as you did already.

In the old days data was entered mainly by staff inside the company. That is rapidly changed these days.

Take an example where a customer is making a self registration on the internet and then he makes a spelling mistake causing your relational database not to link your customer to all the offline transactions he has made earlier. What will you do?

• When you by luck discover this you breaks into his living room, whip him and tell him never ever to do that again? OK, just kidding.
• Assign an army of office workers to check all the self registrations? I guess you just ruined the business case for self registration.
• Deploy a software component capable of catching similar but not exact entries? Oh no, that’s technology – shame on you :-)

Thanks – I feel much better now.

Henrik,

I completely agree with your initial point. I hear it all the time as well "Data is owned by the business" or "Business involvement is key to you success". However, very often that's as detailed as the advice gets. Few organizations are able to clearly articulate what a data ownership or responsibility means and what the job entails. And that's stopping them from being successful in the data quality management journey. So when we work with various organizations, we often spend a good amount of time in the early phases to make sure a clear strategy or approach to data ownership or governance is defined, implemented and communicated.

On your second point, I also tend to agree. Let's not in the eager to involve the business -push IT out of the equation. You're right, I also see a significant IT aspect of effective data quality management. And the most successful organizations I have seen (in terms of DQM) are the ones that have managed to build that strong partnership between business and IT, that reflects the fact that one could not tackle the issues without the other.

Feb 27, 2009 | Registered CommenterThomas Ravn

Henrik,

First of all, I completely agree with your points and I share your frustration.

However, I feel that I must explain myself since I am one of those guilty of occasionally (in the words of the poet Walt Whitman) sounding my barbaric yawp over the rooftops of the world:

“Data Quality is not an IT issue – it is an issue for the entire company”

I admit that I usually start out by stating that data quality is a business issue and not an IT issue. However, by this I do NOT mean “business” in the context of only the business side of the organization and I certainly do NOT mean that the technical side of the organization (i.e. IT) or technology in general has nothing to offer in the way of solutions – I am certainly not a Luddite opposed to the use of technology. I am first, foremost, and proudly a techo-geek of the highest order.

I mean “business” in the context of the entire organization, both internal and external. The “internal” organization is the employees at every level of every department. The “external” organization is everyone that the organization interacts with in the real world (i.e. customers, vendors, suppliers, etc.).

The challenge that I have encountered in my career has NOT been convincing people what the true nature of reality is regarding the problem of data quality. I have found that most people do understand the complexity of the problem.

The challenge has been the GIANT CHASM that exists between the problem and its attempted solution.

This where I have seen too many data quality initiative fail when IT is assigned to solve the well understood problem by “throwing technology at it.” These projects do NOT fail because using technology was the wrong approach. I have seen beautifully architected, wonderfully coded, elegantly implemented technical solutions result in complete and utter failure.

The reason is that data quality solutions require a holistic approach involving people, process, and technology.

Data quality initiatives can only be successful when the entire organization combines to face this “business issue” united by collaboration, guided by an effective methodology, and of course, implemented with amazing technology.

Feb 28, 2009 | Registered CommenterJim Harris

Thomas and Jim – thanks – I am pleased you have taken the posting in the right spirit.

Actually – if I google for the heading in exact words it’s a blog post from a close colleague that hits the top.

The last 3 years I have been with a data quality tool vendor. If I look at all the implementations we have made in my territory, many have been anchored in IT. When I analyse the reasons I reach the following conclusions:

• Most data quality tool implementations deals with the most frequent data quality challenge around being business contact duplication and related standardisation issues which is not regarded as a very specific problem to a given organisation. A bit like you don’t involve the entire business in having a spell checker in your word processor.
• The business says to IT: “We have a problem, could you please suggest a solution. If you have any questions, just ask”.
• Some implementations are second wave where you replace earlier solutions with new solutions. Some objectives are better automation where you replace simple match code functionality with more sophisticated approaches. In other cases solutions with external service providers are replaced or supplemented with embedded functionality – often as SOA components today. I think everybody feels the IT flavour in this.
• IT people and departments around are not isolated islands with absolutely no clue about what is going on in the business. On the contrary many internal IT professionals have comprehensive domain knowledge and are among the few in the business who has a holistic view on the entire business obtained during years of troubleshooting all over.

In his excellent book “Data Driven: Profiting from Your Most Important Business Asset” Thomas Redman (a.k.a. the “Data Doc”) explains this common misconception (excerpt from Chapter 7, pages 170-171):

“Many people automatically assume that data and, to a lesser degree, information are largely “systems issues” and therefore the natural province of IT…people reach this conclusion with good reason…first, most people access (much) data and information through their computers…second, to define new data or effect any changes to existing data, people have to go through IT…third, IT often leads, or at least appears to lead, business intelligence, enterprise resource planning, customer relationship management, and data warehousing projects that promise better reports and higher quality data…finally, IT always seems willing to take on data clean-up projects, in effect assuming responsibility for erroneous data.”

Mar 1, 2009 | Registered CommenterJim Harris

Not sure what the point of the post is, Henrik. I am guessing that since we use tools to solve the issue with data quality you are implying it is also a technological issue. If I can digress for a second here, assume that your child has an issue (as a result of parenting) and you seek a therapist and try to overcome the issue. If the issue isn't resolved - is this a parenting issue or a bad therapist? In the context of your post, the parent is the business and the therapist is the technology. Even if you get the best technologies unless there is ownership and awareness of the issue, data quality will always be a business issue. Technology has, and always, will be an enabler to solve business issues.

On to your registration example, this is a benign form of data issue that can be easily owned by IT (since it is a common incident). Whether the solution is to use drop-downs or do a transformation based on use input, the logic is business driven. Bad logic will drive bad results - again shifting the onus to the business and not on tool selection. In more serious forms of data quality (lets say code standardization), technology can only solve the routine things - typo's, historical coding issues etc. New or unknown data anomalies are always explained better by business rather than IT. How do we tell two transactions belong to the same customer, can only be answered by the business. Even if you have the greatest tool in place.

Whatever the DQ tool, they all do some level of cleansing (for lack of a better word). Depending on the maturity of the data in the enterprise, some tools are a bad fit while some are great. And that is why the buy-in for the tool selection comes once IT has understand what the business issue is. Just to make my point clear, most forms of data quality is a business issue. Technology will always be the greatest enabler, for the enterprise.

PS: Hi, Tom. Remember me? :-)

Mar 2, 2009 | Registered CommenterVasuki Kasturi

VK, my point is that I think everybody knows that data quality is a business issue. Also I think everybody knows that data is owned by the business. We all know that solutions must be designed with business logic. Everyone is aware that you are getting nowhere without management buy in and so on.

Having that in place could we please get on with what’s next then.

Thomas (who has moved to the US and probably is getting used to be named Tom) is with a consulting company and there for states that “when we work with various organizations, we often spend a good amount of time in the early phases to make sure a clear strategy or approach to data ownership or governance is defined, implemented and communicated.”. Great enabling stuff.

Also 84% of the respondents in the published survey today here on dataqualitypro are interested in topics relating to data quality methodology and process.

But also 75% are interested in reviews of data quality technology.

When you say “technology can only solve the routine things” I think this is even very important. Having solved all the routine and frequent things with technology makes it possible for knowledge workers to concentrate on the unusual cases.

I will challenge the sentence “How do we tell two transactions belong to the same customer, can only be answered by the business, even if you have the greatest tool in place.” Tools are getting mature and may be supported by probabilistic learning and external reference data, making them as hard to beat as a chess game (talking from own experience in both areas).

Also I have elaborated on the “what is a customer” question here:

http://www.dataqualitypro.com/data-quality-forum/post/679755#post680570

All,

We've all fallen into the trap laid by HR to stop us rising up against them. This distinction between Business and IT is entirely artificial. When I describe to people my career path and projects, they often say "Oh, so you're in IT then.", to which I say "No, I work on the 'business' side"... which confuses them long enough for me to make my getaway.

Jim Harris is right when he talks about "The Business" being the entire enterprise. The challenge for information quality professionals is to break down or break through that artificial barrier as it is nonsensical and doesn't actually help all that much (unless you are in HR and are seeking a pigeon hole to put people in on the Org chart).

What we need to achieve is a position where the non-IT people (let's call them pencil wavers) and the IT people (lets call them spanner wielders) are both pointed towards a set of goals, including information quality goals, that are aligned with the priorities of the business company (and I use that word on purpose), which, in general, can be boiled down to making money, staying in business and keeping customers happy enough to make return purchases or not defect to a competitor. Peter Drucker said that the ultimate goal of any business is to make more money than it spends or owes, and all other goals are secondary to that.

The Pencil Wavers need to tackle the 'softer' issues of human processes, and the clear mapping and definition of WHAT needs to happen (governance, data governance, process definitions) WHEN to move work through the organization in a way that is aligned with the goals of the company. The Spanner Wielders need to make sure that the plumbing (systems, data transfer processes, software development life cycle processes, IT governance) works in a way that is aligned with those self-same goals and supports the Pencil Wavers.

When it comes to identifying if two records are the same, it is the Pencil Wavers who need to define the rules for what a customer is and how important certain facts are in identifying and distinguishing one from another. Whether those rules and decision algorithms are contained in the organic supercomputer that sits between a keyboard and a chair clicking buttons (a person), or are embedded in a probabilistic self-learning matching tool with access to the same external reference data and base rules, is (I would argue) irrelevant. (And my experience with matching software and the Data Protection Act recommends to me that you don't place all your reliance on an inference engine when determining matches).

It is only by Wavers and Wielders working ,together that the COMPANY's information quality problems can be solved. As Deming said, we need to break down barriers and put everyone to work to ensure the transformation. If that means that the Spanner Wielders bring some "second wave" improvements in how the objectives are met technically, or the Pencil Wavers improve their processes/training etc. to improve how objectives are met with some "second wave" changes, all the better - that's called Continuous Improvement and helps us hold and increase our gains from better quality information.

By the way... if anyone objects to my use of the labels "Pencil Waver" and "Spanner Wielder", all I'll ask is this: Are they any less sensible and objectionable and relevant than the labels "Business" and "IT"?

I've been on the receiving end of the "Data is the Business's responsibility" argument from an IT organization that wasn't aligned with the quality objectives of "the business" in an organization culture that reinforced the artificial and nonsensical division between "Business" and "IT". The net result was that information quality didn't improve because the key changes in the 'plumbing' that needed to be made weren't made and the understanding in the IT organization of the data quality tools that were brought in by the "Business" wasn't there to enable them to properly assess the impact of system changes.

For the record, I'm officially a "Pencil Waver" who has successfully pushed through information quality change, but was also customer # 1 for a leading DQ software vendor and have lead technical evaluations of DQ software and am a power user of a particular vendors tools (to the point where I've shown them how to do things they didn't think were possible). I've also administered Unix systems and programmed parsers using nothing but Visual Basic. As a result, I am now allowed by the Irish Guild of Spanner Wielders (the Irish Computer Society) to wield a small ornamental spanner on special occasions.

I'm also the Director of Publicity of the IAIDQ, whose membership is approx 50/50 Pencil Waver to Spanner Wielder and where we've had a number of articles, conference sessions, and webinars on some of the issues raised in this debate over the past 5 years.

Mar 2, 2009 | Registered CommenterDaragh O-Brien

Daragh, I follow your point about the artificial distinction between business and IT. I have also always found it less useful – perhaps also because I started my glorious career right in the middle of these HR terms.

I will however challenge one of your sayings: “Whether those rules and decision algorithms are contained in the organic supercomputer that sits between a keyboard and a chair clicking buttons (a person), or are embedded in a probabilistic self-learning matching tool with access to the same external reference data and base rules, is (I would argue) irrelevant.”.

From a cost point of view the difference could be considerable – and as the business purpose indeed is to make money, this is the business issue above all business issues.

Henrik

It looks like in the interest of brevity I lacked clarity...

My point in the section you challenge is that the "Pencil Wavers" need to take responsibility for defining and understanding the rules for "what is a customer match". They may choose to go the expensive route of having people reviewing each match, or the Spanner Wielders may (having been aligned with the goals of reducing cost and improving customer intimacy) bring forward the 'technical' solution of codifying the rules in a probabilistic matching engine and letting the machines do the thinking, thereby reducing time and cost of matching.

The Pencil Wavers may choose to hand over all the matching, the "high confidence" matches or none of the matches to a machine. However they remain responsible for the framework of rules that would be applied whether by a fleshy supercomputer or a silicon processor. The Spanner Wielders, working towards the same set of overall objectives, support the Pencil Wavers in meeting the objectives as efficiently as possible through collaborative work.

And by breaking down the silly barrier between Business and IT and getting both groups working together to the same goal total cost to the Company from poor quality information/data is reduced.

I hope that clarifies what I was trying to say.

Mar 3, 2009 | Registered CommenterDaragh O-Brien

Daragh, we are aligned.

One approach often used when matching customer master data is that the silicon processor divides the material to be inspected into 3 pots:

A: The positive automated matches. Ideally you take samples for manual (fleshy :-) inspection for continuous quality assurance by counting and evaluating any false positives.
C: The negative automated matches. Quality assurance samples may be used for continuous improvement by counting and evaluating any false negatives – but this is harder.
B: The dubious part selected for manual inspection. Results may be part of probabilistic learning and there by reducing the B pot over time.

What does this mean, "Not an IT Issue" or "A business issue"?

The first problem you always encounter in the matter of definitions.
When you start out without clear definitions of what "business" or "IT" really is, you can always shift the onus around.
Likewise, if you don't have a formal definition of what Data Quality actually is, you won't ever be getting anywhere.
And without formal definitions of Data Quality Management as an organizational issue, it doesn't matter whether IT or business, things won't get resolved either way.

So, the first action should be to standardize what people mean when they talk about DQM.

When you go by "who can resolve Data Quality problems", DQ is not an "IT issue", and it's not a business issue, either: It's a corporate issue and everyone's invloved.
So, everyone's right and nobody's right. Simple as that.

For example, as long as people in Customer Care are feeding junk into the database like entering surnames as first names and vice versa, you can't expect IT to resolve the issue. They need to get their processes straight!
However, IT may as well provide the technology that issues warnings when people "forget" to provide parts of their address or tryto slip in abbreviations.

IT usually can not know what business plans to do with the data and which corporate purpose is being served by the data. Business has to let IT know what the data is supposed to mean. On the other hand, IT must then do their best to provide business with the technology to get the data that is required.
Nobody can do it without the other.

Data Quality, as all quality, is always relative. "Quality" is defined as the degree by which the target matches expectations. Can IT tell the business what they are allowed to expect?
Business defines what high quality data (or the "perfect data") is, and IT does it.
In case of any discrepancy, bring everyone on a round table and work out a practical solution.

IT can provide everything to enable data quality: from the storage systems, to access protocols, automation of processes, monitoring, quality reports and even error correction. But it's not their job to define what an error actually is!

Cheers,
Mike

May 14, 2009 | Registered CommenterMichael Kuesters

Michael,

I agree with your point that "DQ is not an IT issue and it's not a business issue, either: It's a corporate issue and everyone's involved."

My blog post on this topic was: You're So Vain, You Probably Think Data Quality Is About You

Best Regards...

Jim

May 18, 2009 | Registered CommenterJim Harris