DataFlux Data Management Studio 2.6: User Guide
Master data management is the process of finding or creating a single record that contains everything your organization needs to know about a particular person, location, product, supplier, business, or other entity. This record is referred to as the surviving record for an entity. (It can also be referred to as the master record or golden record.) The goal of master data management is the definition of only one master record for each entity that is important to your business. An entity is the core element that is used for business processes in master data management. This entity can be a single customer, product, site, account, or any other data element that is used within your business systems.
However, a record is not an entity. Across your enterprise, you might have many records that relate to a single entity. For example, you may have records for the same customer in many purchasing, ordering, fulfillment, marketing, and analysis systems. You may also have duplicate records for a customer within the same system. Master data management identifies the records that are related to a single entity and creates a single record per entity containing all the required data for that entity. All of the records that relate to an entity are referred to as contributors to that entity. (In some cases, they may also be called members of that entity.)
A data source (or simply, source) is a table from which contributing records are drawn. To be used in master data management, each record in a data source must have a key value that is unique for that source. Key values do not need to be unique across sources, nor do identical key values from multiple sources need to refer to the same entity.
Clustering is the process of matching contributing records to the entities that they represent. This process will create clusters, which are groups of records that have been identified as representing the same entity. In order to determine whether two records are in the same cluster, DataFlux Data Management Studio uses the clustering criteria defined for that entity. A clustering criteria is a set of rules that defines the data elements that need to match in order for two records to cluster. For example, a PERSON entity might cluster when first name, last name, and address match. You can have multiple sets of clustering criteria for each entity. In addition to clustering by name and address, your PERSON entity might also cluster when the ten-digit phone number matches or when the e-mail address matches.
Finally, entity resolution is the process by which a user can view clusters and create or accept master records related to those clusters. A data steward is a person with the authority to approve modifications in clustering conditions or selected clusters.
You can understand master data management better if you consider the following questions:
Master data management enables you to relate data from all across your business enterprise in order to get a complete view of each entity. Too often, data exists in independent silos that do not share information. In order to get a complete view of an entity, you need to be able to either consolidate everything you know about an entity into a single record or be able to relate a record from one data source to other records for the same entity in other sources. For example, suppose a bank has customer data in their mortgage system, their business accounts system, their credit card system, and their personal checking system. Master data management enables you to relate the customer data from across all of these systems to determine how much business you generate from each individual customer.
Any type of data that is important to your business and is not transactional in nature has the potential to be a master data entity type. In master data management, the user can create a new entity type or modify an existing entity type through the Entity Definition Editor.
An entity type is defined by three things:
All of these can be edited in the Entity Definition Editor.
DataFlux Data Management Studio has two main technologies for working with master data:
In addition to the technologies in DataFlux Data Management Studio, SAS® MDM provides an enterprise-focused, web-based tool for performing master data management on multiple entity types across your enterprise. This is the most powerful, flexible, and feature-rich implementation of master data management that SAS offers, but it might be more than some customers require. For more information about SAS MDM, contact your SAS representative.
The following table compares the capabilities of these master data management methods:
Capability | Entity Resolution | DataFlux Master Data Foundations | SAS MDM |
Multiple data sources | Y | Y | Y |
Editable entity definitions | Y | Y | Y |
Entity resolution interface | Y | Y | Y |
Defined load/clustering processes | N | Y | Y |
Loading new data into existing project | N | Y | Y |
Editable clustering criteria | N | Y | Y |
Web-based Interface | N | N | Y |
Relationship Support | N | N | Y |
Entity modification after creation | N | N | Y |
Split and merge clusters | Y | N | Y |
Workflow integration | N | N | Y |
Persistence of hub clustering rules | N | Y | Y |
Multiple entities in a master data project | N | N | Y |
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfDMStd_T_MDM_Over.html |