In this article, data governance expert Mirjam Visscher provides details of how to improve your business definitions using a Definition Quality Indicator process.
I have built business glossaries for companies who want to get a grip on the meaning of the words they use.
I didn't write the definitions by myself; I asked experts in the matter at hand to write a definition that could be understood by an interested 18-year old.
Some of these definitions were golden: they taught me something new and interesting about a subject I never knew before.
Others were strange, unintelligible or left me with more questions than answers.
Some filled all fields of the glossary; others lacked even a data owner.
The good ones I published, the bad ones went back to the writer.
To decide about the quality of a definition, I used my own understanding as a yardstick.
Luckily, the experts trusted me and accepted my judgment. But I wasn't satisfied and wanted a more transparent measure.
To be more objective, I developed a Definition Quality Indicator (DQI).
What is a Definition Quality Indicator (DQI)
The DQI is a compound number of several weighed criteria, based on the different fields in the definition.
In the picture below, you can observe the different weights per field.
Most fields have two weights: one is awarded when the field contains any information, the other weight being awarded when the content meets a specific criterion.
How did I come up with those numbers?
The numbers above fall somewhere between scientific and arbitrary selection.
I reverse engineered them from a set of 200 definitions, including the golden ones and some very poor definitions. I then analyzed what made them good or bad.
Here are some findings from this exercise.
A good definition is neither short nor long
Usually, long stories take time to read, and they can often cloud key points.
I have seen definitions which complete reference lists: I sent them back without hesitation. They are too long. A definition is never a place to manage your reference data.
Some examples to illustrate the point are useful, so I recommend you give two or three examples.
One hundred twenty characters should be enough in most languages, given that there is another field for a more elaborate explanation.
Sometimes a definition is too short, and a definition shorter than 40 characters is at risk of being too general in the Dutch language and the businesses I worked.
Please adjust the limits to the needs of your language and organization.
A definition belongs to one data domain
For each definition, there is one data domain, one owner and a steward.
The data domain, the owner and the data steward, should comply with your data governance organization.
At least, this was the understanding where I worked.
No matter how your data governance is organized please do not allow for other data domains than the existing ones without a very good reason.
A good definition has a lot of contextual information
If you and I read the word bank, chances are we have a very different picture in our heads.
If we read the word bank, combined with a report 'Blood donations in the US' and a data domain healthcare, implicitly we conclude the writer meant a blood bank.
Because the contextual information helps us to understand a more specific meaning of a word, I assigned relatively high weights to the fields Data Source and Used in Reports.
Whether you should use those fields in your organization depends on the level of metadata in other systems than your glossary.
How to build your DQI
A DQI is easy to create in Excel, Sharepoint or various tools found in the typical company.
The first column is a single function to measure whether the field has content.
To assess your requirements, you first need to create a list of the following:
- Allowed units
- Standardized report names and data domains
- Data owners and stewards
If you use forms for the stewards to write definitions, you can enforce the adherence to those lists.
How to use your DQI
The 5-point scale is my favourite. I divide the total of the awarded weights by 15.
Definitions with a DQI from 3 to 5 are ready to publish.
Definitions with a lower DQI get a return trip to the data steward who wrote it, together with feedback why the DQI is too low.
You can use the DQI to report on the state of the definitions in your organization.
Refer to the graph below as an example:
DQI by Domain
About Mirjam Visscher
Mirjam Visscher is a Dutch data governance expert and MBA teacher in Data Governance, Compliance, Risk and Ethics.
Her speciality is a non-invasive, agile approach for helping data owners and data stewards to get a better grip on their data.
She teaches how to write and assess clear definitions, data quality rules, data agreements and other metadata. In the MBA, she prepares her students for the role as a Data Governance Officer.
LinkedIn Profile: https://www.linkedin.com/in/care4data/