Identifying Duplicate Customer Records: An MDM Case Study With Dalton Cervo of Sun Microsystems

Dalton Cervo, MDM expert and author

Dalton Cervo, MDM expert and author

In any MDM initiative duplicate records will present a major challenge but when an organisation attempts to consolidate 800+ systems into a single customer hub the challenge is made all the greater.

We recently caught up with Dalton Cervo, Customer Data Quality Lead at Sun, to learn how his organisation is tackling an MDM initiative of this scale.

Tip: Dalton has since published an excellent MDM book with fellow author Mark Allen titled: Master Data Management in Practice – Achieving True Customer MDM (see below)

By Mark Allen, Dalton Cervo


we have been…consolidating customer data from over 800 disparate legacy applications into a Customer Data Hub (CDH)…

— Dalton Cervo


Data Migration was a huge challenge…we had to adapt all legacy systems idiosyncrasies to a common structure

— Dalton Cervo


D&B (Dun and Bradsteet) data helped us identify the company structure in a way that we could search our system for all records related to a given organization

— Dalton Cervo


Customer Data Governance (CDG) has been key to the entire MDM program

— Dalton Cervo


We have been able to establish a company wide customer hierarchy that helps us with more accurate and complete data to better understand our customers

— Dalton Cervo


All records in this master system have a mapping to their equivalent record on a legacy and/or spoke systems

— Dalton Cervo

Data Quality Pro: Dalton, can you tell our readers a little about your background?

Dalton Cervo: I’m the Customer Data Quality Lead at Sun Microsystems. I’m part of Customer Data Steward and a member of the Customer Data Governance team responsible for defining policies and procedures governing the oversight of master customer data.

I am on a business area now, but have worked on IT at Sun and in the aerospace industry for over 15 years.

I believe the technical and business experiences give me an edge, and helps me bridge both sides to achieve best results.

Data Quality Pro: Briefly explain the background of the project?

Dalton Cervo: At Sun Microsystems, we have been through a massive Master Data Management (MDM) project, consolidating customer data from over 800 disparate legacy applications into a Customer Data Hub (CDH).

The ultimate goal is to have a single source of truth to enable a 360 degree view of the customer.

Data Quality Pro: What has been the toughest part of that initiative, the logistics of getting that many owners onboard or the sheer scale of the technical challenge?

Dalton Cervo: I can say that we were lucky at Sun because this Master Data Management (MDM) effort was mandated by our top executives including the CEO. Needless to say, that was crucial in mobilizing the entire organization, and getting everybody’s attention and commitment.

However, as you can imagine, we went through a phase of resistance from multiple groups. Fear of change is always a constant, especially when you have a project of this magnitude, challenging just about every process in the organization. Managing change was probably the toughest part, and Sun had to get representation from every single group to help disseminating and evangelizing the new solution.

Data Quality Pro: You point to duplicated data being one of the major technical challenges, were there any other technical headaches in this project?

Dalton Cervo: Yes, we had several.

Data migration was a huge challenge, where we had to adapt all legacy systems idiosyncrasies into a common structure. We had to perform a large data cleansing effort to avoid the data from falling out, and not being properly converted.

Since this was a phased approach, we had to create interfaces between the new system and other legacy systems while the transition was not completed.

We had to change existing or create new standards, policies, and procedures. Business teams simply couldn’t operate the same way anymore, and had to adapt to new nomenclature and controls.

Data Quality Pro: What are the main phases you have adopted to resolve the duplicate customer problem?

Dalton Cervo: We have adopted four main steps in this effort, which are:

  1. Identify potential duplicates
  2. Collect detail data for scoring
  3. Review results and get approvals
  4. Consider disposition and execute actions
Methodology for Identifying Duplicate Customers (by Dalton Cervo)

Methodology for Identifying Duplicate Customers (by Dalton Cervo)

Data Quality Pro: What are the main activities in the “Identify potential duplicates” phase?

Dalton Cervo: The primary goal of this activity is to find records in our system that represent a given legal organization and its subsidiaries.

We start with a list of our top customers and partners. Using multiple sources (D&B, Wikipedia, OneSource, etc) we research the company structure as to identify all name variations we need to search in our system.

Similar names are grouped together as a potential duplicate.

Data Quality Pro: You make use of Dun & Bradstreet data in this first phase, how important was this in validating your data?

Dalton Cervo: D&B data helped us identify the company structure in a way that we could search our system for all records related to a given organization.

Data Quality Pro: What problems did the merger & acquisition of customer accounts cause you?

Dalton Cervo: That is a huge problem. It is a constantly changing environment, and simply looking at a snapshot of an organization today is not sufficient. We need to look at how an organization has evolved so we don’t miss any records. We could very likely have obligations with an organization that no longer exists, or better yet, it exists under a different identification.

Data Quality Pro: With the amount of mergers taking place at present and other factors, does this mean that this first phase is continual?

Dalton Cervo: Yes, certainly. This is definitely not a one time effort, and we have established sustaining plans and procedures for carrying this forward on a regular basis.

Data Quality Pro: Talk us through the second phase – “Collect Detail Data for Scoring”

Dalton Cervo: The objective in this phase is two-fold: collect information about relevant attributes associated with a given organization, and the actual scoring. The scoring is based on the collected data, and the higher the score, the more complete the record is.

Generally, high scoring records are more likely to survive, while low score records will be merged into the survivors. But it is not a perfect science, so the score is used as a first decision point. However, if scores are similar, the analysts will look at the detail data and potentially other factors to make a final decision.

Data Quality Pro: In this second phase, who was responsible for building these rules? What skills and experience did they need?

Dalton Cervo: The rules were created by a Senior Business Analyst, extremely familiar with the data and the business processes at Sun.

It was then reviewed and approved by Customer Data Governance (CDG), the global Customer Data Maintenance (CDM) team, and myself.

You very much need somebody in that capacity to be able to define solid rules. In our case, we were lucky to have a single person with that much knowledge.

But I can see organizations having to get multiple people to achieve the same results.

Data Quality Pro: Once again mergers and acquisitions can be challenging when aggregating detail records – was this a problem for identifying the correct detail attributes to include for a de-duplicated customer?

Dalton Cervo: For the detail data, we started from the previous phase. Therefore, this phase is just as good as the first one in terms of aggregating the data. If we did a good job previously, the aggregation will be solid. However, if our grouping in the first phase was not good, then we will see an impact in this phase as well.

Data Quality Pro: What are the main activities in the “Review Results and Get Approvals” phase?

Dalton Cervo: In this phase, we present the business with the recommendations for which record(s) we should keep, and which one(s) we should eliminate. This normally involves getting consensus from Sales, Finance, Marketing, and Support teams. The ultimate goal by minimizing duplicates is to lower operational costs and maximize analytical capabilities, so multiple groups have a vested interest in this effort.

Data Quality Pro: You mentioned a “Customer Data Governance” group, this is very progressive, in what other ways does this group help the MDM initiative?

Dalton Cervo: Customer Data Governance (CDG) has been key to the entire MDM program. Bringing multiple systems together has a direct and positive benefit on operational activities as well as analytics and business strategies. However, it raises questions about control. Before, each group used to define their own policies and procedures since there was relatively few dependencies. With everybody looking at a single source, any minor change has a big impact. CDG is critical in defining, monitoring, and enforcing controls for the better of the entire organization.

Data Quality Pro: This process obviously takes time and has a number of manual elements that support the technology framework, do you ever find that different departments (eg. Finance, Support, Sales, Marketing) simplycannot resolve legal entity issues because their business rules or business models differ?

Dalton Cervo: Yes, I believe you are absolutely right.

We have found that different groups need different levels of granularity when it comes to legal entity.

Some need visibility into a particular department within an organization, while others only care about the ultimate parent organization. We have reduced this problem by adding customer hierarchy to our data.

Additionally, even though we are now mostly operating from a single source, we still have some “bad habits” from before.

It will take time until we have a true single view of the customer, although we have made a tremendous improvement in the last couple of years.

The first and probably hardest step has been taken, and it is a matter of time now to bring all these models together, and hopefully achieve the so illusive “legal entity.”

Data Quality Pro: How has creating one source of truth benefitted these departments so far?

Dalton Cervo: MDM or a single source of truth has two major objectives:

Improve operational performance
Achieve strategic objectives through better analytics

Getting there is very painful, and it will probably get worse before it gets better. But I believe multiple departments have already realized benefits.

From a business operational perspective, we had very complex business processes customized to particular regions.

We have achieved a much more uniform practice now, with increased efficiency and reduced costs. IT costs are much lower since they don’t have to support so many disparate systems.

Regulatory compliance is easier achieved. Finally, we have definitely seen a huge improvement on data analytics.

We have been able to establish a company wide customer hierarchy that helps us with more accurate and complete data to better understand our customers.

Data Quality Pro: Please talk our readers through the “Consider Disposition and Execute Actions” phase

Dalton Cervo: Once we get the proper approvals from the previous phase regarding what records to keep and which ones to merge, we can move forward into execution.

However, since our MDM project is not entirely finished, we still have certain constraints due to existing interfaces to not yet EOL’d legacy systems and some spoke systems that will continue to exist.

Therefore, in this phase, we take in consideration those constraints, and make a determination on what actions can be taken.

What can be properly executed at this time is assigned to the maintenance teams, and the rest is queued for later.

Data Quality Pro: It’s interesting that although you have the necessary technology, you cannot deduplicate the data automatically because of potential impacts to legacy systems and data feeds, is the plan therefore to decommission the 800+ systems completely?

Dalton Cervo: Yes, the plan is to do that.

So far, we have completed 3 out of 5 phases of our MDM initiative. Once phase 5 is completed, the dependency to legacy systems will end. But we simply can’t wait until then so we have to start now with our data deduplication and be as proactive as possible.

Furthermore, in addition to these 800+ systems, we have certain spoke systems that will continue to exist. CRM, for example, is one of them. Certain activities, including data deduplication, will have to be a coordinated effort between our single source of truth and the spoke systems when needed even after the entire MDM project is completed.

Data Quality Pro: What kind of process and skillset was required to trace all these dependent feeds and spoke systems to ensure your changes did not harm their integrity?

Dalton Cervo: Again we needed a Senior Business Analyst very knowledgeable about our business process, and we also needed IT engineers familiar with the interfaces.

All records in this master system have a mapping to their equivalent record on a legacy and/or spoke systems. Our subject matter experts basically created a set of rules defining when it is acceptable to merge records from one source into another based on when legacy systems will be EOL’d.

Based on that definition, we know if we can merge records now or later.

Data Quality Pro: What lessons have you learned from this MDM initiative?

Dalton Cervo: I learned that communication is key. As much as Sun invested in Change Management and Training, we still had groups not fully prepared for the transition.

We could also have done a better job cleansing the data even more at the source before converting to the new system, but we had very aggressive schedules, which made it very difficult.

As I said before, chances are it will get worse before it gets better.

We are not quite finished yet, and we probably would have done a few things differently, but it was well worth the effort.

We can look forward to:

  • improved risk management
  • increased operational ef; ficiencies and reduced costs
  • better decision making, spend analysis and planning
  • increased information quality and business productivity
  • improved regulatory compliance
  • consistent reporting leading to better customer knowledge and customer service

Summary of Key Points:

  • MDM was mandated by CEO and executive team, crucial for obtaining widespread commitment
  • Change management must be managed from the outset and encompass all areas touched by the initiative
  • Data migration is a major challenge when linking disparate legacy systems to a common structure
  • The MDM challenge of Sun was to decomission legacy systems and create an entirely new master system, all whilst ensuring service continuity
  • Don’t underestimate change to existing business processes, new policies and procedures are necessary
  • External reference data is vital for ensuring accuracy and a surrogate source of information
  • Remedial work such as data deduplication is a continual activity so build in ongoing processes
  • Even with the help of advanced data quality technology, data matching still relies on skilled manual intervention
  • Eliminating duplicates is as much a business as a technical, activity, in Sun this required cross-organisation effort and approval
  • Data governance is absolutely pivotal to MDM success, as the number of dependencies increases so does the need for enterprise-wide policy enforcement
  • Single customer view is not straightforward, different business units require different representations so don’t expect overnight acceptance
  • Don’t underestimate the impact of interfacing systems to the legacy stack and consequently the new system, places major challenges and constraints on the project
  • Effective communication is key, all parties must be fully prepared and under change management
  • Although the initiative is partially complete, MDM has directly helped Sun improve operational performance, reduce costs, improve decision making and spend analysis, increase information quality, raise productivity, improve regulatory compliance and deliver improved customer knowledge and customer service

Dalton Cervo, MDM expert and author

Dalton Cervo, MDM expert and author

Dalton Cervo

Dalton has over 20 years experience in software development, project management, and data management areas, including architecture design and implementation of an analytical MDM, and management of a data quality program for an enterprise MDM implementation. Dalton is a Senior Solutions Consultant at SAS DataFlux, helping organizations in the areas of data quality, data integration, and MDM.

Prior to DataFlux, Dalton served as the Data Quality Lead for the customer data domain throughout the planning and implementation of Sun Microsystem’s enterprise customer data hub.

Dalton has extensive hands on experience in designing and implementing data integration, data quality, and hierarchy management solutions to migrate disparate information; perform data cleansing, standardization, enrichment, and consolidation; and hierarchically organize customer data.

Dalton contributed with a chapter on MDM to Phil Simon’s book, The Next Wave of Technologies – Opportunity in Chaos.

Dalton is a member of the Data Quality Pro expert panel, has served on customer advisory boards, and is an active contributor to the MDM community through conferences and social media vehicles (blog, twitter). Dalton has BSCS and MBA degrees, and is PM certified.