Master data management


Master data management is a technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets.

Drivers for master data management

Organisations, or groups of organisations, may establish the need for master data management when they hold more than one copy of data about a business entity. Holding more than one copy of this master data inherently means that there is an inefficiency in maintaining a "single version of the truth" across all copies. Unless people, processes and technology are in place to ensure that the data values are kept aligned across all copies, it is almost inevitable that different versions of information about a business entity will be held. This causes inefficiencies in operational data use, and hinders the ability of organisations to report and analyse. At a basic level, master data management seeks to ensure that an organization does not use multiple versions of the same master data in different parts of its operations, which can occur in large organizations.
Other problems include issues with the quality of data, consistent classification and identification of data, and data-reconciliation issues. Master data management of disparate data systems requires data transformations as the data extracted from the disparate source data system is transformed and loaded into the master data management hub. To synchronize the disparate source master data, the managed master data extracted from the master data management hub is again transformed and loaded into the disparate source data system as the master data is updated. As with other Extract, Transform, Load-based data movement, these processes are expensive and inefficient to develop and to maintain which greatly reduces the return on investment for the master data management product.
There are a number of root causes for master data issues in organisations. These include:
  1. Business unit and product line segmentation
  2. Mergers and acquisitions

    Business unit and product line segmentation

As a result of business unit and product line segmentation, the same business entity will be serviced by different product lines; redundant data will be entered about the business entity in order to process the transaction. The redundancy of business entity data is compounded in the front- to back-office life cycle, where the authoritative single source for the party, account and product data is needed but is often once again redundantly entered or augmented.
A typical example is the scenario of a bank at which a customer has taken out a mortgage and the bank begins to send mortgage solicitations to that customer, ignoring the fact that the person already has a mortgage account relationship with the bank. This happens because the customer information used by the marketing section within the bank lacks integration with the customer information used by the customer services section of the bank. Thus the two groups remain unaware that an existing customer is also considered a sales lead. The process of record linkage is used to associate different records that correspond to the same entity, in this case the same person.

Mergers and acquisitions

One of the most common reasons some large corporations experience massive issues with master data management is growth through mergers or acquisitions. Any organizations which merge will typically create an entity with duplicate master data. Ideally, database administrators resolve this problem through deduplication of the master data as part of the merger. In practice, however, reconciling several master data systems can present difficulties because of the dependencies that existing applications have on the master databases. As a result, more often than not the two systems do not fully merge, but remain separate, with a special reconciliation process defined that ensures consistency between the data stored in the two systems. Over time, however, as further mergers and acquisitions occur, the problem multiplies, more and more master databases appear, and data-reconciliation processes become extremely complex, and consequently unmanageable and unreliable. Because of this trend, one can find organizations with 10, 15, or even as many as 100 separate, poorly integrated master databases, which can cause serious operational problems in the areas of customer satisfaction, operational efficiency, decision support, and regulatory compliance.
Another problem concerns determining the proper degree of detail and normalization to include in the master data schema. For example, in a federated HR environment, the enterprise may focus on storing people data as a current status, adding a few fields to identify date of hire, date of last promotion, etc. However this simplification can introduce business impacting errors into dependent systems for planning and forecasting. The stakeholders of such systems may be forced to build a parallel network of new interfaces to track onboarding of new hires, planned retirements, and divestment, which works against one of the aims of master data management.

People, Process and Technology

Master data management is enabled by technology, but is more than the technologies that enable it. An organisation's master data management capability will include also people and process in its definition.

People

Several roles should be manned within MDM. Most prominently the Data Owner and the Data Steward. Probably several people would be allocated to each role, each person responsible for a subset of Master Data.
The Data Owner is responsible for the requirements for data quality, data security etc. as well as for compliance with data governance and data management procedures. The Data Owner should also be funding improvement projects in case of deviations from the requirements.
The Data Steward is running the master data management on behalf of the data owner and probably also being an advisor to the Data Owner.

Process

Master data management can be viewed as a "discipline for specialized quality improvement" defined by the policies and procedures put in place by a data governance organization. It has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing master data throughout an organization to ensure a common understanding, consistency, accuracy and control, in the ongoing maintenance and application use of that data.
Processes commonly seen in master data management include source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, data classification, taxonomy services, item master creation, schema mapping, product codification, data enrichment, hierarchy management, business semantics management and data governance.

Technology

A master data management tool can be used to support master data management by removing duplicates, standardizing data, and incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed.
Where the technology approach produces a "golden record" or relies on a "source of record" or "system of record", it is common to talk of where the data is "mastered". This is accepted terminology in the information technology industry, but care should be taken, both with specialists and with the wider stakeholder community, to avoid confusing the concept of "master data" with that of "mastering data".

Implementation models

There are a number of models for implementing a technology solution for master data management. These depend on an organisation's core business, its corporate structure and its goals. These include:
  1. Source of record
  2. Registry
  3. Consolidation
  4. Coexistence
  5. Transaction/centralized
    Registry
This model maintains a central registry, linking records across various source systems. It spots duplicates by running cleansing and matching algorithms, then assigns unique global identifiers to matched records to help identify a "single version of the truth". This model does not send data back to the source systems, so changes to master data continue to be made through existing source systems. When a single, comprehensive view of a customer is needed, it uses each reference system to build a view in real-time.
This model may be useful where an organisation has a large number of source systems spread across the world, and it is difficult to establish an authoritative source. It also enables analysing data while avoiding the risk of overwriting information in the source systems.
Consolidation
In this model, master data is generally consolidated from multiple sources in the hub to create a single version of truth, often referred to in this context as the "golden record". Any updates made to the master data are then applied to the original sources.
Consolidated hubs are inexpensive and quick to set up. This model is mainly used for analysis and reporting.
Coexistence
This model provides a "golden record" in the same way as the Consolidation model, but master data changes can happen in the MDM system as well as in the application systems. This tends to make deployment more expensive.
The main benefit of this style is that data is mastered in source systems and then synchronized with the hub, so data can coexist harmoniously and still offer a single version of the truth. Another benefit of this approach is that the quality of master data is improved, and access is faster. Reporting is also easier as all master data attributes are in a single place.
Transaction/centralized
This model stores and maintains master data attributes using linking, cleansing, matching and enriching algorithms to enhance the data. The enhanced data can then be published back to its respective source system. This requires intrusion into the source systems for the two-way interactions. Source systems can subscribe to updates published by the central system to give complete consistency.
The main benefit of this style is that master data is accurate and complete at all times while security and visibility policies at a data attribute level can be supported by the Transaction style hub. An organisation gains a centralized set of master data for one or more domains.

Transmission of master data

There are several ways in which master data may be collated and distributed to other systems. This include:
  1. Data consolidation – The process of capturing master data from multiple sources and integrating into a single hub for replication to other destination systems.
  2. Data federation – The process of providing a single virtual view of master data from one or more sources to one or more destination systems.
  3. Data propagation – The process of copying master data from one system to another, typically through point-to-point interfaces in legacy systems.

    Change management in implementation

Master data management can suffer in its adoption within a large organization if the "single version of the truth" concept is not bought into by stakeholders who believe that their local definition of the master data is necessary. For example, the product hierarchy used to manage inventory may be entirely different from the product hierarchies used to support marketing efforts or pay sales reps. It is above all necessary to identify if different master data is genuinely required. If it is required, then the solution implemented must be able to allow multiple versions of the truth to exist, but will provide simple, transparent ways to reconcile the necessary differences. If it is not required, processes must be adjusted. Without this active management, users that need the alternate versions will simply "go around" the official processes, thus reducing the effectiveness of the company's overall master data management program.