LEB128 format is very similar to variable-length quantity format; the primary difference is that LEB128 is little-endian whereas variable-length quantities are big-endian. Both allow small numbers to be stored in a single byte, while also allowing encoding of arbitrarily long numbers. There are 2 versions of LEB128: unsigned LEB128 and signed LEB128. The decoder must know whether the encoded value is unsigned LEB128 or signed LEB128.
Unsigned LEB128
To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits. Break the number up into groups of 7 bits. Output one encoded byte for each 7 bit group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. The number zero is encoded as a single byte 0x00. As an example, here is how the unsigned number 624485 gets encoded: MSB ------------------ LSB 10011000011101100101 In raw binary 010011000011101100101 Padded to a multiple of 7 bits 0100110 0001110 1100101 Split into 7-bit groups 00100110 10001110 11100101 Add high 1 bits on all but last group to form bytes 0x26 0x8E 0xE5 In hexadecimal → 0xE5 0x8E 0x26 Output stream Unsigned LEB128 and VLQ both compress any given integer into not only the same number of bits, but exactly the same bits—the two formats differ only in exactly how those bits are arranged.
Signed LEB128
A signed number is represented similarly: Starting with an -bit two's complement representation, where is a multiple of 7, the number is broken into groups as for the unsigned encoding. For example the signed number -123456 is encoded as 0xC0 0xBB 0x78: MSB ------------------ LSB 11110001001000000 Binary encoding of 123456 000011110001001000000 As a 21-bit number 111100001110110111111 Negating all bits 111100001110111000000 Adding one 1111000 0111011 1000000 Split into 7-bit groups 01111000 10111011 11000000 Add high 1 bits on all but last group to form bytes 0x78 0xBB 0xC0 In hexadecimal → 0xC0 0xBB 0x78 Output stream
more = 1; negative = ; /* the size in bits of the variable value, e.g., 64 if value's type is int64_t */ size = no. of bits in signed integer; while
Decode unsigned integer
result = 0; shift = 0; while
Decode signed integer
result = 0; shift = 0; /* the size in bits of the result variable, e.g., 64 if result's type is int64_t */ size = number of bits in signed integer; do while ; /* sign bit of byte is second high order bit */ if && ) /* sign extend */ result |= ;
JavaScript Code
Encode signed 32bit integer
const encodeSignedLeb128FromInt32 = => ;
Decode signed 32bit integer
const decodeSignedLeb128 = => ;
Uses
Related Encodings
The LLVM bitcode file format uses a similar technique except that the value is broken into groups of bits of context-dependent size, with the highest bit indicating a continuation, instead of a fixed 7 bits.
uses multiples of 7 bits for the first three size breaks, but after that the increments vary. It also puts all the prefix bits at the beginning of the word, instead of at the beginning of each byte.