External Reference Data - An Overview
In this post, Henrik Sørensen provides an overview of the different types of external reference data, discusses how they can benefit data quality and how cloud computing may accelerate their take-up.
External Reference Data - An Overview
I have a car GPS Navigation System. I have bought it for 100 EUR. It knows about cities, streets and house numbers in the whole of Europe. That’s more knowledge than attached to most party data in solutions around.
My guess is that exploiting external reference data as an important element in achieving optimal data quality will increase heavily in the following years, and cloud computing will be a main driver.
I do know that you probably won’t find any source that holds the truth, the whole truth and nothing but the truth. Stories about car GPS’s telling you the wrong way are available. But I couldn’t get around without and it gets better and better.
In the following I will focus on external reference data supporting party data – customers, prospects, suppliers, members and other roles.
Types of reference data
Some reference data relates to an entity as a whole and some reference data relates to one or more of the attributes describing an entity.
Business Directories
These directories hold information about organisations – private business’s, governmental bodies and so on - and often also specific employees in these organisations. Typically mandatory registration by the authorities in each country is the basic or in fact the very source of these directories.
Some specific directories are more person than organisation related like the ones holding healthcare professionals.
Many business directories are country specific – but some are consolidated over many countries. I have seen the D&B worldbase and the Eurocontactpool in the latter category.
Actuality, completeness, depth, coverage, standardisation, history tracking and other metrics varies between directories and between countries.
These directories may support the data quality related to your B2B activities.
Consumer/citizen directories
These directories may support the data quality related to your B2C activities.
Availability, pricing and rules differ a lot between countries. Setting up integration is a different project for each country where you have activity. If you are a governmental body or a regional authority you have some options, if you are a financial organisation you may have some options not available for other private organisations and so on – all depending on local rules.
Phone book white pages are also kind of consumer directories – with DQ issues.
Location directories
These directories cover physical addresses typical in hierarchies with:
- Countries
- States if applicable
- Districts typically identified with postal codes often having a hierarchy build in
- Thoroughfares – streets
- Entrances – house numbers
- Units – apartments
Also here availability, coverage and depth vary between countries and areas within countries.
Synonyms and other word lists
Lists of first names by country telling about frequency, probability with gender, related nick names and common abbreviations are used around as well as other lists of words relating to names and addresses.
Data quality capabilities
External reference data may support data quality in several ways.
Deduplication
External reference data is a very valuable source in avoiding false positive and false negative matching. Some examples:
- 2 consumers with exactly same data but where the address is far from unique may be marked as a dubious match
- 2 business records with totally different names but recognised as the official name and the trade style of the same legal entity may be marked as a confident match
Using unique keys from reference sources as organisation ID’s and citizen ID’s where possible helps a lot. But be sure about the precise uniqueness.
Actuality
Having a running integration with external reference data sources helps maintaining change of names, relocations and many other activities going on behind your back.
Completeness
Enrich your data with missing information and replace incorrect information – this will increase the value of a very important asset – your business partner inventory.
Relationship
Building relations between your party data is heavily supported from external reference data. Relating consumers in households, building family trees between business records as well as relations between consumers and small businesses is very valuable in supporting business intelligence.
Back end and front end
Integration with external reference data are often built in as back end processes, but front end integration is becoming more and more common.
Preventing poor data quality as close to the root as possible is the best strategy. This could be that instead of typing the name, address and other data of a business record you simply pick it from a business directory. In other cases you only suggest addresses from a valid list. If you don’t have a complete source, you may make room for exceptions – but then be sure to deal with this hopefully only fraction of all entries.
The future in the cloud
Accessibility with rich external reference data sources and relating pricing is going to change a lot simply by the rule of numbers.
One integration effort will be made the benefit of many and the cost of collecting reference data will be shared by many, like the small cost of a full European location directory sitting in the front of my car – and millions of other cars.











Industry Viewpoint
Reader Comments (1)
Excellent article Henrik!
The effective use of external reference data can make a huge difference in data quality initiatives.
Thanks for providing a great overview on this important subject.
Best Regards…
Jim