How to Create a Data Quality Rules Process for Data Migration, with John Morris

Data Migration Projects require a comprehensive data quality rules approach - John Morris (author of Practical Data Migration) shares insights on how to get it right.

Data Migration Projects require a comprehensive data quality rules approach – John Morris (author of Practical Data Migration) shares insights on how to get it right.

Creating a Data Quality Rules Process for Data Migration

with John Morris

Author – Practical Data Migration
Founder – iergo

Data migration projects have an extremely high failure rate and one of the principal causes is a lack of effective data quality management across all aspects of the project.

To help members reverse the trend of project failure we have enlisted the support of John Morris, author of Practical Data Migration and creator of the PDM v2 Data Migration Methodology, the only industry standard methodology for data migration currently available. John also provides great advice via his BCS Blog.

In this interview, another installment in the series, John Morris provides detailed insight into the way data quality rules are managed within his unique Practical Data Migration methodology.

Dylan Jones: Welcome the 2nd online coaching session with John Morris from Iergo.

The objective of this call is to follow on from the previous call focused on the misconceptions of DQ within data migration projects.

To summarise the misconceptions we covered in the last session: 

  • Our DQ tools can fix all our data issues (in many cases yes, but there are situations where tools alone cannot solve DQ issues)
  • We want perfect quality data in our new target system (an aspiration but due to · time pressures unlikely)
  • The project team should be responsible for prioritisation (the business should own and lead this with joint collaboration)
  • Some records will always fail (done correctly with best-practice there should not be any migration failures)

This week we will start to look at some best-practices around the Data Quality Rules Process.

Data Quality Pro: I read your blog recently John (click here for John Morris blog) and I noticed that you broke data down into four main gap areas:

  • Reality issues (what is in the data store fits all the criteria for that store but doesn’t actually map onto the real world)
  • Gaps between a legacy data stores view of the world and it’s own data (data gaps between legacy data stores data and it’s own data model)
  • Gaps between a legacy data stores model and the collective data model for all the legacy systems
  • Gaps between the collective model for the legacy systems and the model for the target system

(See for details)

How does this help us in Data Quality Control?

John Morris: It helps because it illustrates, yet again that solely relying on data quality tools in a data migration project is not a complete approach. 

In every data migration there are gaps, I’ll explain the different ones in turn.

Reality Gaps: These are differences between what’s in the database and the real world. Tools cannot tell you whether the data refers to reality so you need to go back to the business for support.

So this is the first area where you will need to go back to the business to get a single version of the truth. The alternative of course is that you assume that what is in the database is accurate.

You transfer it and it only comes to light when all the workarounds the business created to screen out this data are removed and the business becomes exposed to the problems that were previously fixed and now appear.

Legacy Data Store Gaps (Single Legacy): In most systems you encounter of any age they will have their own structure and there will be gaps between this structure and the data within it.

Perhaps there have been fixes or previous migrations so you can anticipate there are gaps between what’s inside the database and the data model.

These kind of gaps can be discovered using DQ tools.

Legacy Data Store Gaps (Multiple Legacy): If we put all our legacy data stores together then we also find gaps between the structures of the legacy models.

Those too can probably be uncovered using DQ tools but you will certainly need to go back to the business because where there are different views or models of the data then business knowledge is critical as to deciding how the gaps should be managed.

Legacy to Target Gaps: This is the one that most people spend their time and money working on. This is where your ETL tool will come into its own resolving data omissions that were never present in the legacy system and need to be present in the target.

Things like this can also be managed by modern DQ tools either directly within the ETL tool or within the overall process.

However, you will still need to go back to the business as to the “approrpriateness” of data then only the business community can solve these.

Finally the semantic issues of data mean that the project has discovered uses of the data that aren’t clearly understood so we need local business knowledge to decipher what the data really means.

Tools can detect these semantic issues but cannot always fix them. There are some data gaps that will only become apparent by including the business in the discovery process.

So you will need to have substantial business involvement in the process. That is why I designed the Data Quality Rules process to integrate significant involvement from the business community.

Dylan Jones: One of the things we see a lot from members is the question of priority, what are your thoughts on this?

John Morris: One of our client employees raised an important point the other day. They are a business domain expert and posed the question:

“In the migration project, how much time are you going to ask of us because we have our existing workload to manage?”

My response to that is precisely because everyone is under pressure this is why these people need to be involved.

We need to know how to plan and budget so that needs to be done in concert with our business colleagues.

There are two approaches to prioritisation.

One is the scorecard approach where the high priority jobs rise to the top. The problem with this is that it assumes unlimited timeline and resources, which we don’t have.

By bringing the business people into the prioritisation process we will prioritise the activities that are most significant to the business.

At the end of the day no company needs, wants or can afford perfect quality data so let’s all make a joint decision on why we can’t migrate that data and support the business and systems until such time that it can be corrected.

Dylan Jones: Let’s explore your particular approach to the Data Quality Rules (DQR) process and what that means to the project.

John Morris: Sure, the thing about DQR is that it is both a mechanism for managing DQ issues and fallouts that need to be addressed before the migration delivery and its also a way of bringing the business intelligence from the business for how to address prioritisation and resolve data gaps etc.

The way we do it is to set up a cross-matrix team that will handle all incoming DQ issues and this is typically chaired by a data migration analyst. It should have on it business domain experts, technical experts and other key data stakeholders such as regulatory etc. The key people are the business and technical experts.

If you start analysing data before the target systems are even defined you first look at the reality gaps, then the legacy data gaps.

So by the time we introduce the target environment we already have a firm understanding of what our DQ issues are without having to wait for the target structures are. It is a common misconception that we can’t do the migration design until we know the target structure. This is wrong because the big data items that drive the business rarely change substantially, logically both environments will require similar data so we can get a head start using DQR, reality gaps, legacy analysis etc.

This is especially the case if you take a business-led approach because they will tell you what data is important to the current and future business process. 

So for example:

  • When I find a data defect during the DQR process I create a simple form.
  • We first give it an ID number eg DQR-001.
  • Then we add a simple name for use in sending emails so its quickly recognisable.
  • Then we add a more detailed description. Here a qualitative statement is useful eg. “We think that our hotel data is inaccurate data”.
  • Next we need a more quantitative description that quantifies what the impact of this issue is eg. 10 hotels out of 100 are incorrectly labelled etc.

Dylan Jones: This is an important point because having the business integrated the process means they can determine, from the qualitative assessment whether to pursue the issue. DQ tools throw up lots of issues but by having the business involved in the DQR process it means the project team don’t go off fixing issues that are not critical.

John Morris: That’s right, next we have a method statement that says we are going to ignore this issue.

The method statement informs what we are going to do about it.

The method statement also defines exactly what metrics and activities will be required to help us satisfy the closure condition.

I prefer to have my DQR’s to point at being 100% correct, not zero defect free, it just seems better moving towards a reportable figure.

This means we can prioritise our DQR’s so that we can report:

“We have ten open issues that are 90% complete” so we start to add a lot more control and add to our spreadsheets and reporting tools. At the next level of management we can now provide accurate assessments of where we are.

Prioritisation is really important and we need key data stakeholders to be involved on the DQ board.

The board is chaired by the DQ analysts supported by the business domain experts and your technical experts with everyone getting the same vote.

I like to hold my DQR meetings once a week and with phone technology and teleconferencing that’s not an issue.

The list of DQR’s are circulated and collectively we prioritise and suggest solutions to our DQR issues. We’re not trying to solve them there, this is just a control board.

There are typically a number of actions out of this:

  1. We ignore the issue and migrate “as-is”
  2. We’ll “fix it in the migration tool” which is risky because you are fixing it at the very end so there is limited fallback if you hit problems
  3. Fix it in the source, either by fixing it using some technology or going back to the business. Quite often this has to involve business input and this comes at a cost but if you’ve got the right people in the DQR meeting and they can see the benefits it’s likely to prioritised correctly

I find that when you have the right mix of people with accurate cost and benefit analysis defined by the business then you won’t be fixing problems that are not critical to the business.

Dylan Jones: Let’s have a question from our listeners, John from London asks:

“If a business expert or customer technical expert is reluctant to share their knowledge and get involved with the migration project, what techniques can we use to get them involved?”

John Morris: I find it rare that when you get involved with people that they don’t want to see high quality data in the target system.

Before you do any work you need to find out who the data owners are. I first get them to go through a structured questionnaire which is basically a “System Retirement Policy”.

This is a formal statement from the data owner identifying all the conditions that must be met for their system to be decommissioned.

If you do that at the beginning of the project it makes the change real for people. So if I can say to a data owner that if I meet all these conditions we can shutdown their system you find that engagement is a lot easier.

There are other tricks you can do. By having all your DQR’s pointing aiming towards 100% you can do neat things like having races where different teams have cars competing against each other representing the number of DQR’s to complete.

In all the 100’s of different DQR sessions I can think of very few occasions where anyone was deliberately obstructing the project and in all that time I’ve never had to escalate to sponsors.

The lesson is that once you make it real to them, they will commit.

The way I explain it to the data owners and business stakeholders is that it is your data and your responsibility, I know I can get support from the technical stakeholders so we’ll be able to physically move the data, it just might not be any good for the business.

Dylan Jones: Another question from John in London:

“Do you feel change management specialists have a role to play in data migration projects”

John Morris: Absolutely, I always feel more comfortable having change managers on the project. Data Migration is all about change and if you can work closely with the change management team you will definitely get better results.

Another thing I’m doing here is identifying the official and unofficial power structures in the project and engaging with them.

I’m also focusing on delivering small, tightly controlled, mini-projects that can be more easily tracked and reported on.

So, yes I endorse the point of view. A healthy working relationship with the change management group improves your likelihood of success.

Dylan Jones: A couple of more questions from U.S member, Jenny:

“When should the Data Quality Rules process kick off?”

John Morris: Ideally in the best run projects I’ve been involved in, the data migration part of the overall program should start at the same time as the main implementation project and the DQ work should begin right from the outset.

Let’s say we’re doing a SAP implementation, we should start doing data migration activities immediately.

So we would kick off with things like system retirement policies, building up the network of data stakeholders. Don’t underestimate how hard this is. In a big organisation with many projects and time pressures it isn’t going to be easy to find everyone so this will create some elapsed time.

So start the DQ work asap. The reality checks can be carried out before you even discover what the target system looks like.

As Dylan has pointed out, you won’t have a shortage of data issues so the sooner you can start the better the team and relationships you will create. So when you do have crises when time is tight and you start hitting data issues then you already have that virtual team to kick in.

Dylan Jones: A final question from Greg:

“How frequently should you perform the DQR meetings?”

John Morris: I like to do DQR’s once a week.

It goes into people’s diaries and is easy for people to remember and get used to the routine. Initially, the sessions are long but once people get into the hang of it, even on a very large project the duration of the sessions come down.

Dylan Jones: Thanks John, for those who couldn’t get their questions in on the call please use the “Ask the expert” feature to keep posting your questions into us and I hope you found the session of value.

Image credits: cc Flickr John Carlteon

John Morris, Author of Practical Data Migration

John Morris, Author of Practical Data Migration

John Morris

Johny Morris has 25 years experience in Information Technology spending the last 12 years consulting exclusively on Data Migration assignments for the likes of the BBC, British Telecommunications plc, and Barclays Bank.

His book “Practical Data Migration” is widely regarded as the leading Data Migration methodology – PDM. Johny co-founded the specialist data migration consultancy, Iergo Ltd and is on the expert panel of Data Migration Pro.

Publications by John Morris

Practical Data Migration

By Johny Morris