I recently caught up with Peter Benson, project leader of ISO 8000 Data Quality and Executive Director of the Electronic Commerce Code Management Association (ECCMA).
Peter talks about the latest advances with ISO8000, its growth in both public and private sectors and how it can be practically deployed.
Dylan Jones: It’s been a while since we last interviewed you about the progress of ISO 8000 so, for anyone who is unaware of the standard, can you provide an introduction to what it is and how it can benefit organisations?
Peter Benson: ISO 8000 is the international standard for quality data. It defines quality data as portable data that meets stated requirements.
Standards are common specifications and the purpose of standards is to make it easier to describe goods or services by replacing the detail of the specification with reference to a standard.
In practice standards are designed to be used in contracts. Standards do this by making it possible for a third party to independently validate that the goods or services are compliant with a standard specification.
When a software application or a data cleansing service provider claims that their product or service is ISO 8000 compliant you can be confident that the data can be delivered to you in an application independent format and that all the data specified in the data requirement has been provided.
If an application or a data service does not claim to be IS 8000 compliant then it is highly probable that the data can not be separated from the application or the service without substantial loss of value (for example a csv export) and you will not be provided with a specification of the data or provided with all the code tables. Basically you will be ‘locked-in’ to the application or service.
Dylan Jones: How has ISO 8000 progressed since we last spoke early in 2013? For example, are there any environmental factors, such as new regulations, that are helping develop the uptake of ISO 8000?
Peter Benson: There is a general recognition of the importance of quality data and a slow but steady increase in the awareness that claims about data being ‘quality data’ are often less than genuine.
Governments all over the world are realizing that data is not only the key to compliance verification but that the more accurately and formally they define (state) their requirements for data the easier and the cheaper it is for them to automate compliance verification.
We can expect that governments will continue to push for quality data and we are seeing the first signs of government agencies requiring ISO 8000 quality data.
Dylan Jones: For a business leader new to data quality and the ISO 8000 standard can you describe a practical use case for execution of ISO 8000 using a dataset they would be familiar with?
Peter Benson: ISO 8000 really comes into its own when a business leader has concerns about the quality of the data in their organization.
Asking IT for assurances on the quality of the corporate data is very different from asking for assurance that corporate data complies with ISO 8000.
In order to comply, a company needs to have formally stated its requirements for data and both these statements and the data itself must be encoded using a corporate dictionary. It is actually easier than it sounds but it does require that IT actually write down the data requirements which are defined by the data users.
Surprisingly this is less common than you would expect. IT may have its data models and metadata registries which are great but they rarely present these in a way that data users understand which are typically data entry forms or reports.
A data requirement is a statement of the data required to perform a specific function, if the data meets this requirement it is quality data but only from the perspective of this defined function it may not meet the requirements of another function where it would not be considered quality data.
Almost all companies today use one application or another to support the function of Customer Relationship Management (CRM). The application does not need to be complicated and spreadsheets work well too. All CRM applications start with defining the data elements, Name, Address, Telephone number and then typically include activity tracking fields.
ISO 8000 simply requires that the data elements and coded values be explicitly defined. Anyone who has ever built a spreadsheet for names and addresses knows how quickly the project can head south as they find work arounds for more and more exceptions to the point where the original column headings make very little sense.
ISO 8000 is a method that seeks to keep the metadata and the data in sync and this starts with not just naming the column but providing a definition. To some degree this is similar to a programmer documenting their code. With better column labelling and definitions to back the labels up it becomes easier to combine the data with other data.
Here again ISO 8000 provides guidance on how to track the source of data, this is called provenance. Companies and government agencies are also realizing the importance of provenance, portable data that meets stated requirements is the first grade of quality, the next is portable data that meets stated requirements, from a declared source.
Dylan Jones: What do you see in the future of data quality?
Peter Benson: I think we have reached the stage where most companies understand the value of quality data and we are starting to see better and lower cost applications for cataloging and data cleansing that are using common industry templates and description rules to create industry standard descriptions.
This will make it easier to exchange data. I can remember when the first desktop spreadsheet, Visicalc, hit the market at $100 and when what is now Quickbooks came out at $26. They both made a big difference to accounting.
ECCMA has just released an ISO 8000 community cataloging and data cleansing application that accesses the ECCMA library of industry standard cataloging templates and the ECCMA Open Technical Dictionary (eOTD) as well as default description rules that can be easily modified to create consistent names descriptions in any product information management, Master data, maintenance planning or procurement application.
It is a very low cost tool so perhaps it will have a similar effect. However as to your question about what is coming next, the answer is the automation of data validation made possible by large scale identifier resolution.
Dylan Jones: What do you mean by identifiers and how are they used in automating data validation?
Peter Benson: Identifiers are strings of characters created by an organization to represent a collection of data.
Your driver’s licence number, your passport number, your car registration number, your email address, your telephone number, credit card numbers, part numbers, serial numbers, batch numbers, these are all identifiers.
What is important to remember is that all identifiers are created to link to a collection of data; they are the aliases for the data. Identifier resolution occurs when the identifier is ‘resolved’ back to all or part of the data it represents. When you automate identifier resolution you enable automated data validation.
Let’s try an example. I am in a car accident, nobody is hurt but there is some damage to my car and I am going to have to fill in some insurance paperwork.
The first identifier is the vehicle registration number. Currently I have to take it on faith that it is a valid licence plate and the one that belongs to the vehicle. With a simple public resolution service I should be able to look up the licence plate with its authoritative source, the vehicle registration authority of the state in which the vehicle is licensed; we know the capability exists as it is used by the police on a regular basis.
I am not looking for private information but only to see if the licence plate data matches what I can readily see; the make model and year of the car for example.
The resolution should also provide me with the Vehicle Identification Number (VIN) – another mandatory identifier that is displayed on all vehicles and included on the vehicle registration that is required to be in the vehicle.
By also resolving the VIN number with the authoritative source, the manufacturer, I can readily validate the make model and year of the car. I can do the same with the drivers licence and the insurance certificate – another document I am required to ask for in the case of an accident.
Before I leave the scene all the data has been validated and verified all thanks to identifier resolution. Of course the app on my mobile phone could do infinitely more by recording the location of the accident and prompting me to take photos of the scene, as well as any damage, while at the same time verifying the validity of the other party’s insurance policy and completing the report for me.
ECCMA is working on the ECCMA Quality Resolution Registry (eQIR) as a registry that will perform a function similar to the Domain Name Server (DNS) that you use to convert a domain name to the IP address of the web or email server. The eQIR points to the IP address of an ISO 8000 server that uses ISO 22745 as its data exchange protocol.
There is a new white paper on the ECCMA website that explains the process in more detail.
About Peter Benson
Peter Benson is the Executive Director of the Electronic Commerce Code Management Association (ECCMA).
Peter served as the chair of the US Standards Committee responsible for the development and maintenance of EDI standard for product data; he was responsible for the design, development and global promotion of the UNSPSC as an internationally recognized commodity classification and for the design of the eOTD, an internationally recognized open technical dictionary based on the NATO codification system.
Peter is the project leader for ISO 8000, the new international standard for data quality, and for ISO 22745, the new standard for open technical dictionaries and the exchange of characteristic data.
Peter is an expert in distributed information systems, content encoding, master data management and the generation of high quality descriptions that are the heart of today’s ERP applications and the next generation of high speed and high relevance internet searches.