How to Deliver Lean Data Quality
Featuring Mark Humphries
Mark Humphries is an information quality practitioner and business analyst at Essent in Belgium.
What could you achieve with no budget, no data quality technology and no specialist data quality team members assigned to your project?
Read this inspiring interview featuring Mark Humphries of Essent and gather vital techniques on delivering tangible benefits with limited resources by focusing on waste elimination and clever assignment of existing staff.
Dylan Jones: What is the background to your project Mark?
Mark Humphries: At the beginning of 2010 I was coming to the end of a data migration. For the data migration to be successful, we needed a step change in the quality of all our data, especially with regards to master data.
This was because we had redesigned nearly all of our business processes and tightened up a lot of rules. As a result a lot of our old data just wouldn’t fit into our new business model as supported by our new customer care & billing system.
As an organisation we had done an awful lot of data cleansing work, and were on the point of going live and reaping the benefits of the improvements to business processes, applications and of course data.
At this time, I started to look at how we go further, at how we build on the foundations that we had laid to make further improvements and really embed data quality within the organisation.
We had just dragged ourselves up from level 1 (aware) to level 2 (reactive) on the maturity curve. I wanted to work out how we get to level 3 (proactive).
I had read quite a bit of the material available, and realised that one of the things that we needed were Data Stewards but I also knew from painful experience that competing quality initiatives just don’t work.
Rather than try and set up a new structure and methodology with the promise of solving all data quality problems forever and starting from a blank sheet, I looked around to see what we already had and how we could extend it.
I also looked at the people who I thought would make good data stewards. What were they doing and how could we extend their roles rather than creating new roles?
All the candidates I identified were Business Process Analysts or BPA’s. Suddenly it hit me like a train, because I had been pondering for a long time why most people understand process thinking but don’t understand data thinking.
The answer was to come at data from a process point of view.
Our BPA’s have been applying Lean techniques for some time, so I thought I would try to extend what they were doing with a view to improving data quality. I sold it to them by explaining that I was adding a data dimension to their work. It has worked like a charm.
From my side I have built a DQ dashboard and identified BPA owners for each metric within it. The BPA’s see the dashboard as providing valuable insight into where and why their processes are failing and together we work on reactive cleaning actions as well as more proactive changes to fix the processes to stop things going wrong.
Dylan Jones: What Lean techniques did the BPA’s have to implement to address the data quality dimensions?
Mark Humphries: We use very few formal tools.
All the BPA’s have followed a day’s training in Lean as implemented by Essent. The emphasis is more on the overall goals of Lean in terms of constantly looking for ways of eliminating waste and realisation that however well things are going, there is always room for improvement.
Each BPA has a business process that they are responsible for. Whenever a problem crops up it is the BPA who does the root cause analysis and defines any improvements that are needed.
Improvements could be to the process, the training, the applications or the data, of course.
Dylan Jones: Were there any Six Sigma activities involved in the improvement of data quality?
Mark Humphries: One thing that we do use is the DMAIC cycle, which I believe is common to both Lean and Six Sigma. What I have seen is that DQ techniques are very relevant in the Measure and Control stages of the cycle.
Let’s say that we have found a problem with contracts that prevents us from billing, we are already half way through the Define stage, and we have seen that poor data causes a process to block.
The next thing to do is generate a check query so that we can Measure the size of the problem. This is our baseline, and we now know if it is a small, medium or large problem. In doing this we also find plenty of examples for the Analysis stage. Then the BPA’s will propose a solution or a range of solutions, and these recommendations will be prioritised along with other proposals depending on impact and resources. The check query lives on as a Control, and is built into the DQ dashboard.
Dylan Jones: Were there any challenges for the BPA’s in switching from addressing data quality as opposed to their traditional process quality improvement tasks?
Mark Humphries: Surprisingly not, and I like to think that comes because I have encouraged them to keep the process central. Their focus is on optimising the processes, not on optimising the data. I come to them with evidence that their processes are not working under certain circumstances, or I help them to measure the size of problems that they have discovered, as well as helping in the analysis by trawling through the data looking for clues.
There are two challenges that I have seen where people struggle.
- One is when there is a handover from one team to another, where poor quality of data in the output from one team causes problems for another team as input. Agreeing ownership is sometimes a minefield. But at least by measuring it, and by demonstrating that a particular problem is not going away or is growing, we have a solid basis for escalation.
- The second is the frustration when problems aren’t solved quickly. Each time we discover a new problem there is a burst of energy and enthusiasm. When we can get that problem cleaned up and fixed structurally everyone is happy. Sometimes the problems are difficult and stubborn, and then the enthusiasm runs out. Keeping the problem visible on the dashboard is once again very important to make sure it doesn’t disappear from view.
Dylan Jones: Have you found instances where business workers were undertaking wasteful activities that they perceived to be “part of the working day”?
We found plenty of these.
A good example was how we used to handle contract extensions. This was a highly manual process, whereby every month IT would make the list of contracts that were due for expiry and they would pass this to the extensions team, who would check it, and then transform the list to the required format for the mail merge, as well allocating the new product codes for the extensions based on existing product codes. The same extension team would also do all the extensions for customers who didn’t reply to their extension letter.
This was a good example of a process that was not only highly wasteful in terms of efficiency, but also resulted in lots of data problems, either because customers didn’t get extended or they got extended with the wrong product.
In the last two years we have successfully automated a lot of sub-processes so that there is much less of this copy/paste work.
But there was an unexpected side effect. Because our employees now spend more time proportionally on handling exceptions and less time on the copy/paste work, their perception changed, because they only ever saw problems.
The perception was that the improvements weren’t working because each person now spent more time solving problems and less time doing “business as usual.”
It took some time to even notice that this was happening and understand why.
Dylan Jones: In terms of measurement, analysis and control, what type of technology do you use to discover and measure defects?
For addresses we use a tool called Road 65. It’s a Belgian tool which allows for the specifically Belgian problem that some streets have official names in two languages. This allows us to validate all addresses on entry, as well as enabling us to do batch validation on existing addresses. One of the things that we monitor is the number of addresses which haven’t been validated –there are always going to be some, but we need to keep it down to a level which is consistent with the causes of the exceptions, such as new streets which aren’t yet known to Road 65, or street names within holiday parks, which aren’t official addresses.
Then for smart matching of customers we use Human Inference. This is a tool that I like very much because of the quality of the matching in the first place, but also the flexibility and ease of use in terms of tuning the matching rules. It is also very fast. We use it in batch to detect double customers, and find that Human Inference can detect about twice as many doubles as a straightforward SQL query looking for identical names and addresses.
Then we have developed some in house tools for validating our data against third party data. As an energy supplier we need to stay in synch with the distribution grid operators. So we have some applications which compare snapshots which should be identical and bring out the inconsistencies.
After that the main tool is SQL. I know that there are lots of tools available for doing profiling and for validating business rules in data, but I haven’t been convinced of the added value over SQL. At the end of the day you start with a business rule, and you want to see where it is broken. Turning this into SQL can be cumbersome, but it’s a one-time cost, and SQL skills are easy to come by, so there is no barrier to entry or extra licence costs.
Dylan Jones: Can you tell us more about your Data Quality Dashboard approach? How have you constructed this and what information do you share with it?
Mark Humphries: I have to confess that the dashboard is limited to validity issues, and these are based on issues that we know block particular processes.
A good example is the billing process. Everything has to be just right in order for the billing process to work – the product has to agree with the customer group and region, all items within a product need to have valid prices for the period being billed, etc., etc.
So for billing we have assembled a number of check queries which highlight invalid input data. The end result is a series of SQL queries with the results assembled into a spreadsheet. For each of our major processes we have a range of these validity queries for both input and output data. Each query has an owner and target thresholds so that we can apply traffic lights to the results.
Every month I sit down with the teams and we go through their results in details. Where possible we look at possible process improvements to eliminate the problems that we find.
Then I give a briefing to the Line of Business Directors, on all the red traffic lights. The emphasis in the briefing is what is going wrong, why, what the impact is and what can be done about it.
It’s a relatively simple approach, but it works very nicely. The operational teams get insight into what’s going wrong with their business and are very enthusiastic about making improvements, and the directors get an insight into the impact that poor data quality has on their business, which means that they are willing to free up resources to improve it.
As we move up the maturity curve, we will eventually reach the limits of what we can do with the concept of validity. I am hoping to go further with concepts such as correctness, timeliness and relevance. But we’re not quite ready for that yet.
Dylan Jones: Finally, by not focusing solely on the data but a much broader group of root-causes, do you find that your team has become more effective at delivering real improvements?
Mark Humphries: First of all, I don’t actually have a team (or a budget for that matter), and that’s what got me looking at things from a process perspective.
After a big migration, during which we had raised the quality of data considerably through major cleaning efforts, I found myself tasked with data quality, but without any resources. That’s when I started looking round and wondering how I could influence the organisation as it was instead of setting up a data quality team in competition with the rest of the organisation.
The BPA’s belong to the Lines of Business, so I have been trying to influence them and extend their work to look at data through persuasion.
But, yes, I do think that considering problems holistically, rather than dividing things into process or data has been a more effective approach. Ultimately most people working at the coal face of an organisation aren’t that analytical, and don’t tend to break a problem up into the same terms as the IT department or business consultants or data quality consultants.
A problem is a problem is a problem, and they are looking for the best way to fix it. What I have seen is that most problems really do have their root causes partly in process, partly in the software application and partly in the data. The relationship between process and data is very close. Any process that you look at will be triggered by the arrival of new input data, and will then combine that with existing master data and reference data to produce new output data.
Any shortcomings in the new input data, existing master and reference data or the process itself will result in incorrect output data.
Image credit: cc Flickr bogenfreund
Mark Humphries is an experienced Information Management expert who applies both data and process thinking to solve real problems with working solutions.
Mark adds value by using Data Quality techniques to find problems that no one knew existed, and then applies Process Improvement techniques to implement sustainable solutions that fix the root causes.
Mark is convinced that the data and process communities need to work better together if they are to realise their full potential.
In a good year Mark adds €1M to the bottom line.
Contributions by Mark Humphries