Universally unique identifier
A universally unique identifier is a 128-bit number used to identify information in computer systems. The term globally unique identifier is also used, typically in software created by Microsoft.
When generated according to the standard methods, UUIDs are for practical purposes unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is close enough to zero to be negligible.
Thus, anyone can create a UUID and use it to identify something with near certainty that the identifier does not duplicate one that has already been, or will be, created to identify something else. Information labeled with UUIDs by independent parties can therefore be later combined into a single database or transmitted on the same channel, with a negligible probability of duplication.
Adoption of UUIDs is widespread, with many computing platforms providing support for generating them and for parsing their textual representation.
History
In the 1980s Apollo Computer originally used UUIDs in the Network Computing System and later in the Open Software Foundation's Distributed Computing Environment. The initial design of DCE UUIDs was based on the NCS UUIDs, whose design was in turn inspired by the unique identifiers defined and used pervasively in Domain/OS, an operating system designed by Apollo Computer. Later, the Microsoft Windows platforms adopted the DCE design as "globally unique identifiers". RFC 4122 registered a URN namespace for UUIDs and recapitulated the earlier specifications, with the same technical content.When in July 2005
Standards
UUIDs are standardized by the Open Software Foundation as part of the Distributed Computing Environment.UUIDs are documented as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call " and more recently in ITU-T Rec. X.667 | ISO/IEC 9834-8:2005.
The Internet Engineering Task Force published the Standards-Track
Format
In its canonical textual representation, the 16 octets of a UUID are represented as 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters. For example:The four-bit M and the 1 to 3 bit fields code the format of the UUID itself.
The four bits of digit
M
are the UUID version, and the 1 to 3 most significant bits of digit N
code the UUID variant. In the example, M is 1
, and N is a
, meaning that this is a version-1, variant-1 UUID; that is, a time-based The canonical 8-4-4-4-12 format string is based on the record layout for the 16 bytes of the UUID:
Name | Length | Length | Contents |
time_low | 4 | 8 | integer giving the low 32 bits of the time |
time_mid | 2 | 4 | integer giving the middle 16 bits of the time |
time_hi_and_version | 2 | 4 | 4-bit "version" in the most significant bits, followed by the high 12 bits of the time |
clock_seq_hi_and_res clock_seq_low | 2 | 4 | 1 to 3-bit "variant" in the most significant bits, followed by the 13 to 15-bit clock sequence |
node | 6 | 12 | the 48-bit node id |
These fields correspond to those in version 1 and 2 UUIDs, but the same 8-4-4-4-12 representation is used for all UUIDs, even for UUIDs constructed differently.
requires that the characters be generated in lower case, while being case-insensitive on input.
Microsoft GUIDs are sometimes represented with surrounding braces:
This format should not be confused with "Windows Registry format", which refers to the format within the curly braces.
Encoding
The binary encoding of UUIDs varies between systems. Variant 1 UUIDs, nowadays the most common variant, are encoded in a big-endian format. For example,00112233-4455-6677-8899-aabbccddeeff
is encoded as the bytes 00 11 22 33
44 55
66 77
88 99
aa bb cc dd ee ff
.Variant 2 UUIDs, historically used in Microsoft's COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian. For example,
00112233-4455-6677-8899-aabbccddeeff
is encoded as the bytes 33 22 11 00
55 44
77 66
88 99
aa bb cc dd ee ff
.Variants
The "variant" field of UUIDs, or the N position indicate their format and encoding.- Variant 0 is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1988. The first 6 octets of the UUID are a 48-bit timestamp ; the next 2 octets are reserved; the next octet is the "address family"; and the final 7 octets are a 56-bit host ID in the form specified by the address family. Though different in detail, the similarity with modern version-1 UUIDs is evident. The variant bits in the current UUID specification coincide with the high bits of the address family octet in NCS UUIDs. Though the address family could hold values in the range 0..255, only the values 0..13 were ever defined. Accordingly, the variant-0 bit pattern
0xxx
avoids conflicts with historical NCS UUIDs, should any still exist in databases. - Variant 1 are referred to as
RFC 4122/DCE 1.1 UUIDs , or "Leach–Salz" UUIDs, after the authors of the original Internet Draft. - Variant 2 is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility" and was used for early GUIDs on the Microsoft Windows platform. It differs from variant 1 only by the endianness in binary storage or transmission: variant-1 UUIDs use "network" byte order, while variant-2 GUIDs use "native" byte order for some subfields of the UUID.
- Reserved is defined as the 3-bit variant bit pattern 111x2.
While some important GUIDs, such as the identifier for the Component Object Model IUnknown interface, are nominally variant-2 UUIDs, many identifiers generated and used in Microsoft Windows software and referred to as "GUIDs" are standard variant-1
guidgen
tool produces standard variant-1 UUIDs. Some Microsoft documentation states that "GUID" is a synonym for "UUID", as standardized in Versions
For both variants 1 and 2, five "versions" are defined in the standards, and each version may be more appropriate than the others in specific use cases. Version is indicated by theM
in the string representation.Version-1 UUIDs are generated from a time and a node ID ; version-2 UUIDs are generated from an identifier, time, and a node ID; versions 3 and 5 produce deterministic UUIDs generated by hashing a namespace identifier and name; and version-4 UUIDs are generated using a random or pseudo-random number.
Nil UUID
The "nil" UUID, a special case, is the UUID00000000-0000-0000-0000-000000000000
; that is, all bits set to zero.Version 1 (date-time and MAC address)
Version 1 concatenates the 48-bit MAC address of the "node", with a 60-bit timestamp, being the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time, the date on which the Gregorian calendar was first adopted.A 13- or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node. When UUIDs are generated faster than the system clock could advance, the lower bits of the timestamp fields can be generated by incrementing it every time a UUID is being generated, to simulate a high-resolution timestamp. With each version 1 UUID corresponding to a single point in space and time, the chance of two properly generated version-1 UUIDs being unintentionally the same is practically nil. Since the time and clock sequence total 74 bits, 274 version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.
In contrast to other UUID versions, version-1 and -2 UUIDs based on MAC addresses from network cards rely for their uniqueness in part on an identifier issued by a central registration authority, namely the Organizationally Unique Identifier part of the MAC address, which is issued by the IEEE to manufacturers of networking equipment. The uniqueness of version-1 and version-2 UUIDs based on network-card MAC addresses also depends on network-card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error.
Usage of the node's network card MAC address for the node ID means that a version-1 UUID can be tracked back to the computer that created it. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.
Version 2 (date-time and MAC address, DCE security version)
Version-2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence are replaced by a "local domain" number, and the least significant 32 bits of the timestamp are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local-domain numbers 0 and 1 are for user ids and group ids respectively, and other local-domain numbers are site-defined. On non-POSIX systems, all local domain numbers are site-defined.
The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values per node ID. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUIDs per node/domain/identifier can be generated per 7-minute clock tick, compared to 16,384 clock sequence values for version 1. Thus, Version 2 may not be suitable for cases where UUIDs are required, per node/domain/identifier, at a rate exceeding about one every seven seconds.
Versions 3 and 5 (namespace name-based)
Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.
To determine the version-3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Then 6 or 7 bits are replaced by fixed values, the 4-bit version, and the 2- or 3-bit UUID "variant". Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.
Version-5 UUIDs are similar, but SHA-1 is used instead of MD5. Since SHA-1 generates 160-bit digests, the digest is truncated to 128 bits before the version and variant bits are replaced.
Version-3 and version-5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search.
Version 4 (random)
A version 4 UUID is randomly generated. As in other UUIDs, 4 bits are used to indicate version 4, and 2 or 3 bits to indicate the variant. Thus, for variant 1 a random version-4 UUID will have 6 predetermined variant and version bits, leaving 122 bits for the randomly generated part, for a total of 2122, or 5.3 possible version-4 variant-1 UUIDs. There are half as many possible version-4 variant-2 UUIDs because there is one less random bit available, 3 bits being consumed for the variant.Collisions
occurs when the same UUID is generated more than once and assigned to different referents. In the case of standard version-1 and version-2 UUIDs using unique MAC addresses from network cards, collisions can occur only when an implementation varies from the standards, either inadvertently or intentionally.In contrast to version-1 and version-2 UUID's generated using MAC addresses, with version-1 and -2 UUIDs which use randomly generated node ids, hash-based version-3 and version-5 UUIDs, and random version-4 UUIDs, collisions can occur even without implementation problems, albeit with a probability so small that it can normally be ignored. This probability can be computed precisely based on analysis of the birthday problem.
For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:
This number is equivalent to generating 1 billion UUIDs per second for about 85 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes.
The smallest number of version-4 UUIDs which must be generated for the probability of finding a collision to be p is approximated by the formula
Thus, the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.
Uses
Significant uses include ext2/ext3/ext4 filesystem userspace tools, LVM, LUKS encrypted partitions, GNOME, KDE, and macOS, most of which are derived from the original implementation by Theodore Ts'o.One of the uses of UUIDs in Solaris is identification of a running operating system instance for the purpose of pairing crash dump data with Fault Management Event in the case of kernel panic.
In COM
There are several flavors of GUIDs used in Microsoft's Component Object Model :- – interface identifier;
- – class identifier;
- – type library identifier;
- – category identifier;
As database keys
The random nature of standard UUIDs of versions 3, 4, and 5, and the ordering of the fields within standard versions 1 and 2 may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version-4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" approach made the UUIDs non-standard and significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application..
Some web frameworks, such as Laravel, have support for "timestamp first" UUIDs that may be efficiently stored in an indexed database column.