Master Data Management Overview

Overview

Master data management is the process of finding or creating a single record that contains everything your organization needs to know about a particular person, location, product, supplier, business, or other entity. This record is referred to as the surviving record for an entity. (It can also be referred to as the master record or golden record.) The goal of master data management is the definition of only one master record for each entity that is important to your business. An entity is the core element that is used for business processes in master data management. This entity can be a single customer, product, site, account, or any other data element that is used within your business systems.

However, a record is not an entity. Across your enterprise, you might have many records that relate to a single entity. For example, you may have records for the same customer in many purchasing, ordering, fulfillment, marketing, and analysis systems. You may also have duplicate records for a customer within the same system. Master data management identifies the records that are related to a single entity and creates a single record per entity containing all the required data for that entity. All of the records that relate to an entity are referred to as contributors to that entity. (In some cases, they may also be called members of that entity.)

A data source (or simply, source) is a table from which contributing records are drawn. To be used in master data management, each record in a data source must have a key value that is unique for that source. Key values do not need to be unique across sources, nor do identical key values from multiple sources need to refer to the same entity.

Clustering is the process of matching contributing records to the entities that they represent. This process will create clusters, which are groups of records that have been identified as representing the same entity. In order to determine whether two records are in the same cluster, DataFlux Data Management Studio uses the clustering criteria defined for that entity. A clustering criteria is a set of rules that defines the data elements that need to match in order for two records to cluster. For example, a PERSON entity might cluster when first name, last name, and address match. You can have multiple sets of clustering criteria for each entity. In addition to clustering by name and address, your PERSON entity might also cluster when the ten-digit phone number matches or when the e-mail address matches.

Finally, entity resolution is the process by which a user can view clusters and create or accept master records related to those clusters. A data steward is a person with the authority to approve modifications in clustering conditions or selected clusters.

You can understand master data management better if you consider the following questions:

Why Perform Master Data Management?
What Can Be an Entity?
What Master Data Management Technologies are Available from SAS?

Why Perform Master Data Management?

Master data management enables you to relate data from all across your business enterprise in order to get a complete view of each entity. Too often, data exists in independent silos that do not share information. In order to get a complete view of an entity, you need to be able to either consolidate everything you know about an entity into a single record or be able to relate a record from one data source to other records for the same entity in other sources. For example, suppose a bank has customer data in their mortgage system, their business accounts system, their credit card system, and their personal checking system. Master data management enables you to relate the customer data from across all of these systems to determine how much business you generate from each individual customer.

What Can Be an Entity?

Any type of data that is important to your business and is not transactional in nature has the potential to be a master data entity type. In master data management, the user can create a new entity type or modify an existing entity type through the Entity Definition Editor.

An entity type is defined by three things:

Attributes - The data elements that are used by this entity. For example, a Person entity might have First Name, Last Name, Address, City, State, Postal Code, Phone Number, and E-mail address as its attributes.
Standardizations - In order to engage the power of SAS cleaning and matching technology in your master data management process, users can define the ways in which attributes will be cleansed and the match codes that will be generated from them. Clustering can then be performed on standardized fields or match codes rather than raw data. This greatly improves clustering accuracy.
Clustering Criteria - For each entity type, select one or more sets of fields that must match in order to identify records that belong in the same cluster.

All of these can be edited in the Entity Definition Editor.

What Master Data Management Technologies are Available from SAS?

DataFlux Data Management Studio has two main technologies for working with master data:

Entity Resolution - Enables you to use an entity resolution node to write the results of a clustering job to an entity resolution file. DataFlux Data Management Studio then provides an Entity Resolution Editor that allows a user to edit the entity resolution file and find the surviving record. For more information about this approach, see Overview of Entity Resolution.
SAS Master Data Management Foundations - Provides a step between Entity Resolution and full SAS MDM capability. This ability provides a metadata-defined, repeatable process for master data management on a small scale. SAS Master Data Foundations can provide a test environment for companies considering master data management for their enterprise. For companies with few data sources and only one or two data stewards, SAS Master Data Foundations may provide all of the master data management capabilities that you need.

In addition to the technologies in DataFlux Data Management Studio, SAS^® MDM provides an enterprise-focused, web-based tool for performing master data management on multiple entity types across your enterprise. This is the most powerful, flexible, and feature-rich implementation of master data management that SAS offers, but it might be more than some customers require. For more information about SAS MDM, contact your SAS representative.

The following table compares the capabilities of these master data management methods:

Capability	Entity Resolution	SAS Master Data Foundations	SAS MDM
Multiple data sources	Y	Y	Y
Editable entity definitions	Y	Y	Y
Entity resolution interface	Y	Y	Y
Defined load/clustering processes	N	Y	Y
Loading new data into existing project	N	Y	Y
Editable clustering criteria	N	Y	Y
Web-based Interface	N	N	Y
Relationship Support	N	N	Y
Entity modification after creation	N	N	Y
Split and merge clusters	Y	N	Y
Workflow integration	N	N	Y
Persistence of hub clustering rules	N	Y	Y
Multiple entities in a master data project	N	N	Y