If you watch the replay you’ll observe that it’s much more of a conversation about data governance, it’s challenges and ways to scale. And of course having Nicola on the webinar was great. She’s passionate about data governance and has many anecdotes/solutions on common problems you will encounter.
There were several questions that we couldn’t get to on the webinar and over the past couple of weeks several more have come across via email. We’ve decided to use these blog posts to follow up on those questions. (To ask questions of your own please feel free to contact us).
What suggestion do you have for measuring data accuracy? Completeness and time are only 2 dimensions used to measure data.
Conrad Chuang answers:
There are six core dimensions to data quality. In fact, DAMA UK wrote an entire whitepaper on Data Quality Dimensions which you can find here (DAMA UK DQ Dimensions White Paper R3 7).
The six described in the whitepaper are:
Completeness - Are all the data sets and data items recorded,
Consistency - Can we match the data set across stores,
Uniqueness - is there a single view of the data set,
Validity - does the data match the rules, and
Accuracy - does the data reflect the data set
Timeliness - The degree to which data represent reality from the required point in time.
Now, understanding accuracy (and its definition) will help us figure out how best to measure accuracy:
Accuracy: The degree to which data correctly describes the "real world" object or event being described.
And then, in the whitepaper, a small discussion on the definition
Ideally the "real world" truth is established through primary research. However, as this is often not practical, it is common to use 3rd party reference data from sources which are deemed trustworthy and of the same chronology.
What this means is that accuracy is often tied to reference data management. Measuring accuracy requires comparing your data values against that trustworthy source managed by reference data. While this seems to argue for a canonical set, remember not every use case in your organization has the same context since they’re not all describing the same “real world” or “event.”
Imagine that your marketing and compliance teams need accurate country codes. Without context, a standard such as ISO 3166 seems like an obvious choice. But it starts to get more murky once you realize marketing wants current country of residence (current countries only) while compliance needs country of birth (all current and past countries). For example, In 2016 it’s inaccurate to have a current country as PZ (The Panama Canal Zone) but it’s 100% accurate to have birth country as PZ.
Sorting out the contexts and policy and then assembling the set of reference data (which you’ll use to measure your data’s conformance) is the job of your data governance and data management teams. Evaluating your data’s conformance against that reference can help you assess accuracy and might be a path to measurement.
All that said … how much does accuracy matter?
For financial facts and figures accuracy matters a lot. However, for many master and reference data attributes and dimensions, consistency is much more important.
In fact, Nicola tells a pretty humorous story of a client who decided to use reserved ISO 3166 codes to describe regions around the world. While this could cause headaches (especially when ISO chooses to activate a reserved code). Those headaches can be avoided if the organization has deliberate data governance and data management measures in place to document this inaccurate, but consistent use. It does pose an interesting question, is inaccuracy acceptable if everyone is consistent (“literally”) in their inaccurate use?
If you'd like to learn more about data governance, data governance training and Nicola visit The Data Governance Coach. And if you'd like to learn more about Orchestra Networks offering please visit our DG landing page.