Creating a Data Quality Firewall and Data Quality SLA


How do you prevent poor levels of supplier data quality impacting your organisation?

The answer? Through the adoption of a Data Quality Firewall and ongoing Data Quality SLA process.

This post creates a blueprint for creating your own Data Quality Firewall and ensuring your data quality SLA provides certified, high quality data throughout your organisation.

Are you protecting your inbound data feeds?

Are you protecting your inbound data feeds?


In 2005 I was consulting at a small utilities company running a data quality assessment project prior to a data migration.

I got everyone in a room and started a simple information chain workshop task, sticking a load of Post-It’s on the whiteboard to map out the people, process and data chains that mattered to the team.

We talked at length about the issues people were seeing in the data.

People were generally happy but one person was quiet. So I asked what they did.

It transpired that this persons role was to take the same piece of data from a major external supplier of data into the business - every single month.

Each month, they sat at a computer, laboriously fixing data defects manually.

What’s more, it was the same type of defects that they had to fix, again and again.

It was reassuring they had some kind of ‘Data Quality Firewall’ to prevent inbound data quality issues, but there were obvious flaws in their approach:

  1. The company refused to push back on the supplier and demand a data quality Service Level Agreement (SLA). Yes, this can be a challenge but in this case they hadn’t exhausted the options for rejecting the data at all.

  2. This person had huge amounts of undocumented knowledge in his head and was demoralised. He was a high churn risk and obviously a major risk to the company if he quit.

  3. Doing stuff over and over again in the name of data quality is just bad practice (and extremely expensive).

  4. The lead times for clean up meant major hassles downstream as the data consumers waited.

So what were they doing right?

  1. Someone was assigned to this “firewall” (yes, it was a screwed up process but at least there was ownership)

  2. They were diligent (nothing bad got through, according to downstream users)

  3. They were trapping the data at source (nothing sneaked past this guy until it was ready)

The fact is that many organisations get the data supplier relationship wrong by:

  1. Assuming the data will be of ‘good enough’ quality, because they have a contract of supply

  2. Applying manual hacks and tweaks downstream (incurring costs the supplier should bear)

  3. Applying a dedicated person or team at the point of entry (instead of building automation and data quality monitoring to free up the labour and push the defects back to the supplier)

It’s incredible how many companies I meet that accept data from outside the corporate walls and assume it is correct, or simply believe ‘defects are reality, we can live with them’.

Don’t make this assumption.

Take a proactive stance using the following steps as an outline guide for getting started.


Simple Roadmap for a Data Quality Firewall and Data Quality SLA


When creating a Data Quality Firewall and SLA you may find the following tactics valuable.

Map the flow of information from supplier to consumer

Map all the information sources flowing into your organisation. If you don’t have budget for an expensive modelling tool, I find that a pack of Post-It’s and communal bag of donuts is sufficient to run your first modelling workshop.

Identify what data specifications exist (and create where they’re missing)

When information comes from an external source, identify whether the following exists:

  • Formal data specifications (e.g. field formats, frequency of delivery, expected values, permitted ranges)

  • Escalation procedures or standard operating procedures for when defects are found

Does no specification, documentation or process information exist?

Then create new documentation listing the elements above but also add:

  • Simple information chain diagram explaining where this information comes in and feeds to

  • Names and contact details for everyone involved with this data, both on the supplier and consumer side

Present your findings and recommendations for Data Quality

Present this to the relevant stakeholder explaining your desire to improve this inbound information source with a view to:

  • Preventing embarrassment to the stakeholder when bad data impacts other business units (I would probably open with this)

  • Reducing the cost savings of building a process that eliminates endless amounts of scrap and rework activity (costs based on past incidents is ideal here)

  • Increasing overall perception of the stakeholders team as a highly professional, innovative resource (they will like this)

  • Reducing the lead times and improving various other metrics the stakeholder will be held accountable for

  • Get them to sign off that they are ultimately responsible for this process and you will provide them regular reports on how the process is working and the value that it (and they) are bringing to the business

Recommend a Pilot Data Quality Management Initiative:

Document all known issues with this inbound information source, surveys and casual conversations with data workers, DBA’s, app designers and anyone else who touches this data downstream

Armed with this anecdotal data, create a robust data quality assessment process:

  • Profile the data using one of the many free (or commercial) tools now available

  • Rigorously document the data quality rules you feel the data should be managed against e.g.

    • Timeliness – the data must arrive between 9am and 10am every weekday morning

    • Completeness – the customer name and address fields must be populated, there must be a valid order number etc.

    • Formatting – the order number must be in the format of NN-LLLL-NN, with no exceptions

    • Overloading – each record must only have one entry in the part code, there must not be multiple part codes in the same field

    • etc….

  • Convert these data quality rules into a live monitoring process, e.g.

    • Use one of the many data quality and data integration tools available

    • Use standard scripts in SQL or Unix, or whatever your data processing platform uses

  • Implement the process and start to track the issues discovered

Review the impact of any data defects and request a more robus ‘Data Quality SLA’ with your data supplier:

  • Run this process for at least a month, discovering issues and fixing manually where applicable

  • Create a comprehensive file of all the issues found and the impacts they’re having in your organisation

  • Approach the data supplier outlining your findings, the innovations you have made and the issues you are picking up

  • If no SLA or formal definitions and agreements around data quality exist, work with your stakeholder and the supplier to get something in place

  • Agree to share your technology and approach with the supplier so that they can improve their data quality (chances are it’s being supplied to other companies too)

  • Work together to iteratively create the most robust data quality firewall and SLA process possible


Sample Sections of a Data Quality SLA

Please note, this information is provided merely as an introductory guide without prejudice, it is not intended as legal advice, always seek professional assistance when creating legal documentation.

There are hundreds of templates for an SLA on the web but here is one example of how you could apply it to data quality:

1. Introduction

This Service Level Agreement (SLA) documents the agreed provision of service for the supply of data by [Supplier] to [Receiver]. This document provides a legally binding contract that stipulates the service and quality levels that will be enforced during the term of the service agreement between both parties.

2. Parties to the Agreement

Lists the people who have reviewed and approved the SLA.

3. Scope

High-level outline of the data and services to be provided.

4. Term

Indicate the start and end date of the SLA.

5. Conventions

List all the standards, terms and definitions that are referenced throughout the SLA. For example:

  • “Office Hours” refers to 9am to 5pm GMT.

  • “Working Day” refers to Monday to Friday except for UK designated holidays.

  • “CSV” refers to “Comma Separated File”, a raw text file format as described in this open standard: http://mastpoint.curzonnassau.com/csv-1203/csv-1203.pdf

  • etc…

6. Service Levels

Data Delivery Schedule: Indicates how frequently data should be supplied e.g Every working day between 9am and 10am GMT.

Data Delivery Specification: How will the data be structured? Which fields must be mandatory? What standard data types will exist? What CSV standard must be adhered to? What standards for identifiers and codes must be adhered to? How will the data be encrypted e.g zip file? What is the data standard and specification for attribute A, B, C, …,n?

Data Delivery Process: High-level diagram outlining the steps required for data delivery.

Escalation Process: When issues are found, how will the supplier be notified, what is expected of them?

7. Service Level Targets and Penalties

Describe each service level, it’s measure and it’s target. For example:

  • Measure: Timeliness

  • Description: Data must be supplied within the approved data delivery schedule

  • Target: 95% annual delivery within the approved data delivery schedule, 97% within approved data delivery schedule + 30 minutes, 100% within approved data delivery schedule + 60 minutes

  • Penalties: £10 for every minute delay up to 60 minutes. Past 60 minutes £1000 per hour delay.

8. Rewards

Outline any reward structures that the supplier receives for meeting their SLA targets.

9. Points of Contact

List of key personnel involved in the service from both parties.

10. Signatures

List of authorised signatures to make the agreement binding.

10. Appendix, Glossary and References

Provide additional resources such as sample file extracts, detailed glossary and links to other standards and external information that is relevant to the SLA.

Previous
Previous

Operationalising Data Governance and BCBS239 – Interview with Taher Borsadwala

Next
Next

How to Create an Information Chain