In our last post, I described a little bit about time and relationships. We noted that different kinds of master data have relationships with prior versions of themselves. Relationships enable you to see how prior versions have evolved into new ones.
This entry in our MDM Facets series is more focused on time in relation to versioning. In many of our initial discussions with prospects and customers, they focus on MDM as a source of accurate and up-to-date information. But as our discussions unfold, people often realize, that values for hierarchies, identifiers and attributes are valid between specific periods of time. Today we cover some of those questions to think about as you consider time in the context of master data management.
How are validity periods tracked?
Just because a piece of master data isn’t currently valid, it does not mean that it wasn’t ever valid. Consider the different identifier, same party problem. We often see this when someone’s name changes. For example, if we wanted to view the entire athletic history of Muhammad Ali, we would need to know that prior to March 4, 1965 records for Cassius Clay must be included.
Different identifier, same party (J)
If you squint hard enough you’ll notice that the key-evolution problem (as described in relationships) is really a more general case of the different identifier, same party problem. However instead of expanding to multiple new keys or combining multiple old keys–in your different identifier, same party problem you have a one for one relationship between the parties across time periods.
Validity periods are easier to see, but a bit more difficult to track, when you have the same identifier but different hierarchies issue. For example, consider a situation where your company changes the definition of the NE USA sales territory between 2011 and 2012. This change is harder to spot because there are no outward indications that the constituents of NE USA have changed.
Same identifier, different hierarchy (SIDH)
If you plan on using your MDM in conjunction with business intelligence tools, you’ll want to be aware of the fact that the territory shrunk (lest you blame the sales team for underperforming). But the complication is that not all MDM platforms track these kinds change easily — so you’ll want to make sure you test for this problem.
One final comment, you’ll notice that these kinds of problems really can’t be dealt with using traditional data quality tools. How would you manage to match Cassius Clay and Muhammad Ali? How do you test for a compositional change if the order management system lists territory as “NE USA” before and after Jan 1, 2012? These kinds of issues require an MDM system that can build the relationship and temporal structures that accommodate this information. And that tees up the next couple of questions.
What’s the strategy for dealing with ‘valid periods’?
There are a couple of strategies for dealing with temporal data. One of the most basic is to attach begin and end dates to each record in your dataset. These date qualifers represent when the information was valid. With qualifiers, it becomes possible to combine everything into one big table. This particular approach can be useful for code lists, especially if you want to keep the past and present codes all together in one place.
If you don’t have much data to manage and your structures are fairly simple, “flattening” the time dimension is a decent option. The problem is when you start to deal with more complex models with intra/inter-domain relationships and relationships across time. It becomes challenging to follow all the references and make sure you have all the bits and pieces (from the right vintage) in one logical view. It becomes a little bit like trying to visualize three-dimensional objects with only two dimensions.
Instead of “flattening” the time dimension, some systems allow you to maintain complete editions (aka versions or snapshots ) that contains all the valid information for a specific validity period. With an edition approach, you have all the data, from the right vintage, in one complete, isolated set.
Now, it is important to understand that multiple technical approaches can be used to handle editions. Some vendors require a separate historical platform, others create complete copies of the information, some use logs to re-create history and still others use changesets and references. As you can tell these approaches range from very to barely resource intensive. Lightweight solutions are much more likely to be maintained, which means you’ll much more likely to have that set of versions you need, when you need them. Of course, the best solution would be to have this behavior built directly into the system.
One last comment on editions. I think there are at least two main ways of making an edition. There’s the on-change style, where every major change drives the creation of a new snapshot. And there’s the periodic, or monotonic, style where editions are created at set time periods. I think the choice between on-change or periodic is dependent on the kind of data you’re mastering. Cases where updates are required immediately are a good case for on-change. Periodic editioning is very suitable for changes that are scheduled in advance and then implemented. For example, HR and finance data, lends itself very well to the periodic approach because most companies tend to announce corporate changes around reporting period boundaries (months, quarters, years).
Corporate functions master data tends to be confidential, this leads us to our final question for today…
How important is isolation?
Mergers, acquisitions, divestitures–corporate actions affect things like the organizations chart, chart of accounts, and the legal entity hierarchy of a company. The impact of change on the organization should be analyzed, modeled and approved well in advance of the announcement. After all, you want the change to move along smoothly. Staff tend to move into a holding pattern if they hear that the management team will be “working out the details” over the next quarter.
This means that we need to be concerned about future versions of master data in addition to the past and present. Given our strategies for dealing with valid periods, the confidential nature of the information rules out flattened approaches — since the past, present and future information would all exist simultaneously in the same space. Additionally, internal adjustments to corporate change tends to be negotiated — this means that the finance and HR teams might need to mock up multiple versions of the future. With a flattened approach and date as the only dimension you lose that ability to create multiple possible futures. Finally, having all the data in one place does cause a bit of a confidentiality problem — as you don’t want individuals stumbling across the proposed hierarchy before it has been properly communicated to the team. All these factors mean that the edition approach is much more appropriate for handling future data or ANY data that needs to be isolated from production.
These multiple versions of reality also exist in the “present” time as well (but not exactly as versions). For example, marketing, sales and legal/risk might want to view a large B2B customer using hierarchies based on their business context. And that will be the topic of our next entry.
By Conrad Chuang