• Conrad Chuang

Key Master Data Facets: Relationships

In a previous post on modeling I briefly touched on relationships when describing different data model patterns. In this post I'll be expanding on the topic.

Relationships define how the different elements in your master data connect to each other. Some of those connections would include relationships with attributes (a customer to gender); some represent hierarchies (a product to product line, a subordinate to a manager); still others are connections between domains (suppliers to products).

MDM expert Andy Hayler, who we'll be speaking to on the 19th of September, points out that companies often neglect the inter-domain, or multi-domain, relationships when starting their MDM programs. He thinks this could be because of the way MDM programs are executed. Because most firms start with a pilot project in a smaller area of the business, the multi-domain aspects don't appear up until they start to try and scale beyond the pilot. Beyond multi-domain what are some of the aspects of relationships that you should be aware of?

How complex are the relationships between the entities in your domain?

The level of interconnectedness between the entities in your domain drives complexity. This is easy to see in projects where you're dealing with HR, finance, chart of accounts, or even corporate hierarchies. But what about something like customers. Well consider these questions:

  • Do you target businesses or consumers?

  • How heavily "networked" or interconnected are your customers to each other?

  • Do those customer hierarchies matter?

There is a very large difference between managing the legal entity hierarchy of a large company (and its five thousand subsidiaries) vs. the flat, atomic records of retail customers. If you're dealing with B2B customers, the MDM platform you choose needs to be able to deal with the different kinds of legal vehicles available to the modern corporation along with all their attributes and interlocking relationships. For example, overseas subsidiaries typically have multiple parents with varying ownership percentages and rights. Unfortunately, many MDM platforms got their start in the B2C data integration world. What that means that some retrofitting (and perhaps hand coding or models and UIs) needs to be employed to deal with complex relationships.

This is why it's important to ask if these hierarchies matter in your use case. In some cases, hierarchies may exist between entities, but for your business they're not important. To wit, a health care provider might choose not to link a family's patient records together for privacy reasons. In contrast, some firms need to maintain customer hierarchies for risk reasons. Many of our customers, especially those in financial services, want to know total exposure to a name, like United Technologies. And this requires adding up all the exposures to that ultimate parent's subsidiaries (such as Pratt and Whitney, Carrier, Otis, and their children). In fact, in a recent Aite Group Study on Legal Entities Data Services, counterparty risk was one of the top reasons for focusing on legal entities.

What about relationships with reference data?

Reference data is often overlooked in MDM projects. But it is master data in its own right, and should have its own lifecycle, governance and management. Often MDM implementation teams will "support" reference data by creating a local copy of the information in lookup tables. Over time, and in the absence of a strategy, the local versions begin to diverge from each other. This can lead to some very real problems. For example, if an insurance company's quoting and underwriting systems access different, divergent sources of location data it is possible that the insurance quote might differ from the policy premium creating a customer service issue. To address this problem, many companies solve the issue ex-post facto. They use synchronization tools to keep the ever expanding array of local copies in sync. But this seems like making a complex problem even more complicated.

This is why it's important that your MDM platform has the ability to define cross-domain relationships. Referring to (rather than making a copy of) data from different domains means that each domain can be separately managed in its own lifecycle by individuals that are responsible and accountable for the accuracy of that data. Basically, passing-by-reference vs. passing-by-value makes sure that you are always in possession of the most up-to-date data. Additionally, it helps make sure that all your systems are using the same assumptions for decision making.

How will you visualizing relationships and hierarchies?

In the two previous sections, we covered the importance of defining both intradomain and interdomain relationships. A common use of these relationships is to create hierarchies. For example, using the parent-child relationship of a supplier to see if there are any potential hold-up issues. Examining your customers through an industry classification scheme like SIC, NAICS, or GICS to better focus your marketing message.

Being able to create and visualize hierarchies is a great tool for your business users. It help them understand the data, better understanding leads to better adoption and (ideally) project success. This means you'll want to make sure that there's minimal effort required for hierarchy management. How easy is it to create and manage hierarchies? Are hand coding or development efforts required to create hierarchies? Who creates them? What kind of security can be placed on the hierarchies themselves? Can the hierarchies be exported into other downstream systems? Lastly you may also want to find out if your platform permits use of the hierarchies to manage the master data itself. Can you add, move, detach nodes (and all their children)? It is possible to flatten sections of the hierarchy?

Do you need custom adaptations to your hierarchies?

Consider the different business users of your master data. Risk, legal, sales, and marketing all may wish to access data for a customer like General Electric. But what becomes immediately apparent is that these different users may not use the information in the same way.

Risk and legal may need to see the official legal hierarchy to determine assumption of liabilities

Sales may want to re-order GE based on sales territories,

Marketing may want to create a hierarchy to match their geographically based campaigns

The context of your business users may drive the need to create adaptations of your hierarchies. There are a whole host of interesting issues wrapped up in this problem, and that's why I plan on blogging on this topic at a future date.

What about time?

Some kinds of master data have relationships between versions. It's important to realize if you need to maintain inter-temporal relationships and if your MDM platform can handle it.

It's easy to see these inter-temporal relationships in reference data. For example between the 2007 and 2012 versions of the North American Industrial Classification System (NAICS) you can see examples where a single industry in 2007 became multiple industries in 2012…

For analytical purposes, having these inter-temporal, or crosswalk, relationships is important when you're making period on period comparisons. Much like business context, there's a lot more to cover with regards to time — and that is the topic of our next post.