Tokenization (data security)


Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value. The token is a reference that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods which render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data.
The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data.
Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider.
Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information. Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number is replaced with a surrogate value called a token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use.

Concepts and origins

The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia, and scrip are terms synonymous with such tokens.
In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection.
In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations.
In 2001, TrustCommerce created the concept of Tokenization to protect sensitive payment data for a client, Classmates.com. They engaged Rob Caulfield founder of TrustCommerce because the risk of storing card holder data was too great if their systems were ever hacked. TrustCommerce developed TC Citadel®, where customers could reference a token in place of card holder data and TrustCommerce would process a payment on the Merchants behalf. This secure billing application allows clients to safely and securely process recurring payments without the need to store cardholder payment information. Tokenization replaces the Primary Account Number with secure, randomly generated tokens. If intercepted, the data contains no cardholder information, rendering it useless to hackers. The Primary Account Number cannot be retrieved even if the token and the systems it resides on are compromised nor can the token be reverse engineered to arrive at the PAN.
Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: “The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility.”
To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale systems, as in the , cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption for various service implementations in various documents.

Difference from encryption

Tokenization and “classic” encryption effectively protect data if implemented properly, and an ideal security solution will use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and they essentially have the same function, however they do so with differing processes and have different effects on the data they are protecting.
Tokenization is a non-mathematical approach that replaces sensitive data with non-sensitive substitutes without altering the type or length of data. This is an important distinction from encryption because changes in data length and type can render information unreadable in intermediate systems such as databases. Tokenized data is secure yet it can still be processed by legacy systems which makes tokenization more flexible than classic encryption.
Another difference is that tokens require significantly less computational resources to process. With tokenization, specific data is kept fully or partially visible for processing and analytics while sensitive information is kept hidden. This allows tokenized data to be processed more quickly and reduces the strain on system resources. This can be a key advantage in systems that rely on high performance.

Types of tokens

There are many ways that tokens can be classified however there is currently no unified classification. Tokens can be: single or multi-use, cryptographic or non-cryptographic, reversible or irreversible, authenticable or non-authenticable, and various combinations thereof.
The three main types of tokens are: Security token, or / Asset token Utility token / Utility token and Cryptocurrencies / Payment tokens.
In the context of payments, the difference between high and low value tokens plays a significant role.

High-value tokens (HVTs)

HVTs serve as surrogates for actual PANs in payment transactions and are used as an instrument for completing a payment transaction. In order to function, they must look like actual PANs. Multiple HVTs can map back to a single PAN and a single physical credit card without the owner being aware of it.
Additionally, HVTs can be limited to certain networks and/or merchants whereas PANs cannot.
HVTs can also be bound to specific devices so that anomalies between token use, physical devices, and geographic locations can be flagged as potentially fraudulent.

Low-value tokens (LVTs) or security tokens

LVTs also act as surrogates for actual PANs in payment transactions, however they serve a different purpose. LVTs cannot be used by themselves to complete a payment transaction. In order for an LVT to function, it must be possible to match it back to the actual PAN it represents, albeit only in a tightly controlled fashion. Using tokens to protect PANs becomes ineffectual if a tokenization system is breached, therefore securing the tokenization system itself is extremely important.

System operations, limitations and evolution

First generation tokenization systems use a database to map from live data to surrogate substitute tokens and back. This requires the storage, management, and continuous backup for every new transaction added to the token database to avoid data loss. Another problem is ensuring consistency across data centers, requiring continuous synchronization of token databases. Significant consistency, availability and performance trade-offs, per the CAP theorem, are unavoidable with this approach. This overhead adds complexity to real-time transaction processing to avoid data loss and to assure data integrity across data centers, and also limits scale. Storing all sensitive data in one service creates an attractive target for attack and compromise, and introduces privacy and legal risk in the aggregation of data Internet privacy, particularly in the EU.
Another limitation of tokenization technologies is measuring the level of security for a given solution through independent validation. With the lack of standards, the latter is critical to establish the strength of tokenization offered when tokens are used for regulatory compliance. The PCI Council recommends independent vetting and validation of any claims of security and compliance: "Merchants considering the use of tokenization should perform a thorough evaluation and risk analysis to identify and document the unique characteristics of their particular implementation, including all interactions with payment card data and the particular tokenization systems and processes"
The method of generating tokens may also have limitations from a security perspective. With concerns about security and attacks to random number generators, which are a common choice for the generation of tokens and token mapping tables, scrutiny must be applied to ensure proven and validated methods are used versus arbitrary design. Random number generators have limitations in terms of speed, entropy, seeding and bias, and security properties must be carefully analysed and measured to avoid predictability and compromise.
With tokenization's increasing adoption, new tokenization technology approaches have emerged to remove such operational risks and complexities and to enable increased scale suited to emerging big data use cases and high performance transaction processing, especially in financial services and banking. Recent examples includes Protegrity's Vaultless Tokenization, Privitar Publisher tokenization, and Voltage Security's Secure Stateless Tokenization technology and 's patented stateless tokenization solution SecurDPS. Vaultless tokenization and stateless tokenization have been independently validated to provide significant limitation of applicable PCI Data Security Standard controls to reduce scope of assessments. Stateless tokenization enables random mapping of live data elements to surrogate values without needing a database while retaining the isolation properties of tokenization.
November 2014, American Express released its token service which meets the EMV tokenization standard.

Application to alternative payment systems

Building an alternate payments system requires a number of entities working together in order to deliver near field communication or other technology based payment services to the end users. One of the issues is the interoperability between the players and to resolve this issue the role of trusted service manager is proposed to establish a technical link between mobile network operators and providers of services, so that these entities can work together. Tokenization can play a role in mediating such services.
Tokenization as a security strategy lies in the ability to replace a real card number with a surrogate and the subsequent limitations placed on the surrogate card number. If the surrogate value can be used in an unlimited fashion or even in a broadly applicable manner as with Apple Pay, the token value gains as much value as the real credit card number. In these cases, the token may be secured by a second dynamic token that is unique for each transaction and also associated to a specific payment card. Example of dynamic, transaction-specific tokens include cryptograms used in the EMV specification.

Application to PCI DSS standards

The Payment Card Industry Data Security Standard, an industry-wide set of guidelines that must be met by any organization that stores, processes, or transmits cardholder data, mandates that credit card data must be protected when stored. Tokenization, as applied to payment card data, is often implemented to meet this mandate, replacing credit card and ACH numbers in some systems with a random value or string of characters. Tokens can be formatted in a variety of ways. Some token service providers or tokenization products generate the surrogate values in such a way as to match the format of the original sensitive data. In the case of payment card data, a token might be the same length as a Primary Account Number and contain elements of the original data such as the last four digits of the card number. When a payment card authorization request is made to verify the legitimacy of a transaction, a token might be returned to the merchant instead of the card number, along with the authorization code for the transaction. The token is stored in the receiving system while the actual cardholder data is mapped to the token in a secure tokenization system. Storage of tokens and payment card data must comply with current PCI standards, including the use of .

Standards (ANSI, the PCI Council, Visa, and EMV)

Tokenization is currently in standards definition in ANSI X9 as . X9 is responsible for the industry standards for financial cryptography and data protection including payment card PIN management, credit and debit card encryption and related technologies and processes.
The PCI Council has also stated support for tokenization in reducing risk in data breaches, when combined with other technologies such as Point-to-Point Encryption and assessments of compliance to PCI DSS guidelines.
Visa Inc. released Visa Tokenization Best Practices for tokenization uses in credit and debit card handling applications and services.
In March 2014, released its first payment tokenization specification for EMV.
NIST standardized the FF1 and FF3 Format-Preserving Encryption algorithms in its Special Publication 800-38G.

Risk reduction

When properly validated and with appropriate independent assessment, Tokenization can render it more difficult for attackers to gain access to sensitive data outside of the tokenization system or service. Implementation of tokenization may simplify the requirements of the PCI DSS, as systems that no longer store or process sensitive data may have a reduction of applicable controls required by the PCI DSS guidelines.
As a security best practice, independent assessment and validation of any technologies used for data protection, including tokenization, must be in place to establish the security and strength of the method and implementation before any claims of privacy compliance, regulatory compliance, and data security can be made. This validation is particularly important in tokenization, as the tokens are shared externally in general use and thus exposed in high risk, low trust environments. The infeasibility of reversing a token or set of tokens to a live sensitive data must be established using industry accepted measurements and proofs by appropriate experts independent of the service or solution provider.