How to Create a Data Quality Firewall and Data Quality SLA

A Data Quality Firewall is Critical for Ensuring the Integrity of Your Most Vital Information Chains

A Data Quality Firewall is Critical for Ensuring the Integrity of Your Most Vital Information Chains

How do you prevent poor levels of supplier data quality impacting your organisation?

The answer is through the creation of a robust Data Quality Firewall and ongoing Data Quality SLA process. This post creates a blueprint for creating your own data quality firewall and ensuring your data quality SLA provides certified, high quality data throughout your organisation.

The Data Quality Essentials series identifies widely accepted data quality best-practices that your organisation should be adopting in an effort to improve and maintain data quality.

Image: cc Flickr alika 89

Creating a Data Quality Firewall and Data Quality SLA

by Dylan Jones, Editor

irst, a short story.

Back in 2005 I was consulting at a small utilities company running a data quality assessment project prior to a data migration.

I got everyone in a room and started a simple information chain workshop task, sticking a load of Post-It’s on the whiteboard mapping the people, process and data chains that mattered to this team.

We talked about the issues in the data. People were generally happy but one person seemed a little flat so I asked what they did. It was their role, every month, to take a vital piece of data from a major supplier of data (also their primary partner) into the business.

Each month they laboriously sat at a computer fixing defects in the data by hand. The same type of defects were fixed every single month.

Now, it’s obviously reassuring they had some kind of firewall but there were obvious flaws:

  1. The company refused to push back on the supplier and demand a data quality Service Level Agreement (SLA), (sometimes this is tough to do but in this case they really hadn’t exhausted the options for push-back)
  2. This person had huge amounts of undocumented knowledge in his head and was demoralised, a high employee churn risk and obviously a major challenge to the company if he quit
  3. Doing stuff over and over again in the name of data quality is just bad practice
  4. The lead times for clean up meant major hassles downstream

So what were they doing right? Well you could say that…

  1. Someone was assigned to this “firewall”, okay it was a screwed up process but at least there was ownership
  2. They were diligent, nothing bad really got through according to downstream users
  3. They were trapping the data at source, nothing sneaked past this guy until it was ready

The fact is that many organisations get the data supplier relationship wrong by:

  1. Applying blanket trust and assuming the data will be sound
  2. Applying manual hacks and tweaks downstream
  3. Applying a dedicated person or team (costly) at the point of entry

It’s incredible how many companies I meet that accept data from outside the corporate boundary and assume it is correct. 

Don’t make this assumption, take a proactive stance using these steps as an outline guide.

Simple Roadmap for a Data Quality Firewall and Data Quality SLA

When creating a Data Quality Firewall and SLA you may find the following tactics valuable:

  • Map all the information sources flowing into your organisation (we’re talking a Post-It session here, not a £20,000 data modelling tool – bring donuts, I find that helps get the creative juices flowing)
  • When information comes from an external source, identify whether the following exists:
    • Formal data specifications outlining things like field formats, frequency of delivery, expected values, permitted ranges
    • Escalation procedures or standard operating procedures for when defects are found
  • No documentation or process exists? Then create new documentation listing the elements above but also add:
    • Simple information chain diagram explaining where this information comes in and feeds to
    • Names and contact details for everyone involved with this data, both on the supplier and consumer side
  • Present this to the relevant stakeholder explaining your desire to improve this inbound information source with a view to:
    • Preventing embarrassment to the stakeholder when bad data impacts other business units (I would probably open with this)
    • Reducing the cost savings of building a process that eliminates endless amounts of scrap and rework activity (costs based on past incidents is ideal here)
    • Increasing overall perception of the stakeholders team as a highly professional, innovative resource (they will like this)
    • Reducing the lead times and improving various other metrics the stakeholder will be held accountable for
  • Get them to sign off that they are ultimately responsible for this process and you will provide them regular reports on how the process is working and the value that it (and they) are bringing to the business
  • Document all known issues with this inbound information source, surveys and casual conversations with data workers, DBA’s, app designers and anyone else who touches this data downstream are the way forward here
  • Profile the data using one of the many free tools now available
  • Rigorously document the data quality rules you feel the data should be managed against e.g.
    • Timeliness – the data must arrive between 9am and 10am every weekday morning
    • Completeness – the customer name and address fields must be populated, there must be a valid order number etc.
    • Formatting – the order number must be in the format of NN-LLLL-NN, with no exceptions
    • Overloading – each record must only have one entry in the part code, there must not be multiple part codes in the same field
    • etc….
  • Convert these data quality rules into a live monitoring process, e.g.
    • Use one of the many free data quality and data integration tools available
    • Use standard scripts in SQL or Unix, whatever your data processing platform uses
  • Implement the process and start to track the issues discovered
  • Run this process for at least a month, discovering issues and fixing manually
  • Create a comprehensive file of all the issues found and the impacts they’re having in your organisation
  • Approach the data supplier outlining your findings, the innovations you have made and the issues you are picking up
  • If no SLA or formal definitions and agreements around data quality exist, work with your stakeholder and the supplier to get something in place
  • Agree to share your technology and approach with the supplier so that they can improve their data quality (chances are it’s being supplied to other companies too)
  • Work together to iteratively create the most robust data quality firewall and SLA process possible

Sample Sections of a Data Quality SLA

Please note, this information is provided merely as an introductory guide without prejudice, it is not intended as legal advice, always seek professional assistance when creating legal documentation.

There are hundreds of templates for an SLA on the web but here is one example of how you could apply it to data quality:

1. Introduction

This Service Level Agreement (SLA) documents the agreed provision of service for the supply of data by [Supplier] to [Receiver]. This document provides a legally binding contract that stipulates the service and quality levels that will be enforced during the term of the service agreement between both parties.

2. Parties to the Agreement

Lists the people who have reviewed and approved the SLA.

3. Scope

High-level outline of the data and services to be provided.

4. Term

Indicate the start and end date of the SLA.

5. Conventions

List all the standards, terms and definitions that are referenced throughout the SLA. For example:

  • “Office Hours” refers to 9am to 5pm GMT.
  • “Working Day” refers to Monday to Friday except for UK designated holidays.
  • “CSV” refers to “Comma Separated File”, a raw text file format as described in this open standard:
  • etc…

6. Service Levels

Data Delivery Schedule: Indicates how frequently data should be supplied e.g Every working day between 9am and 10am GMT.

Data Delivery Specification: How will the data be structured? Which fields must be mandatory? What standard data types will exist? What CSV standard must be adhered to? What standards for identifiers and codes must be adhered to? How will the data be encrypted e.g zip file? What is the data standard and specification for attribute A, B, C, …,n?

Data Delivery Process: High-level diagram outlining the steps required for data delivery.

Escalation Process: When issues are found, how will the supplier be notified, what is expected of them?

7. Service Level Targets and Penalties

Describe each service level, it’s measure and it’s target. For example:

  • Measure: Timeliness
  • Description: Data must be supplied within the approved data delivery schedule
  • Target: 95% annual delivery within the approved data delivery schedule, 97% within approved data delivery schedule + 30 minutes, 100% within approved data delivery schedule + 60 minutes
  • Penalties: £10 for every minute delay up to 60 minutes. Past 60 minutes £1000 per hour delay.

8. Rewards

Outline any reward structures that the supplier receives for meeting their SLA targets.

9. Points of Contact

List of key personnel involved in the service from both parties.

10. Signatures

List of authorised signatures to make the agreement binding.

10. Appendix, Glossary and References

Provide additional resources such as sample file extracts, detailed glossary and links to other standards and external information that is relevant to the SLA.

The Moral of the Story

Okay, so this may deviate depending on your particular situation but I’ve used variations on this in the past and it works.

In several cases the data supplier had no idea their data was defective because no-one had raised any issues. For example, in my story above, no-one had pushed back on the supplier so the manual fire-fighting had continued for many months.

If you have implemented something similar then I would love to hear about it and hopefully feature your story on the site.

Useful Resources

You may find the following documents useful for constructing your own Data Quality SLA (several of these links point to documents for download so be sure to run your own virus scanner before use):

SLA Templates

Standards and Data Quality Specification