Golomb coding
Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.
Rice coding
Rice coding denotes using a subset of the family of Golomb codes to produce a simpler prefix code. Rice used this set of codes in an adaptive coding scheme; "Rice coding" can refer either to that adaptive scheme or to using that subset of Golomb codes. Whereas a Golomb code has a tunable parameter that can be any positive integer value, Rice codes are those in which the tunable parameter is a power of two. This makes Rice codes convenient for use on a computer since multiplication and division by 2 can be implemented more efficiently in binary arithmetic.Rice was motivated to propose this simpler subset due to the fact that geometric distributions are often varying with time, not precisely known, or both, so selecting the seemingly optimal code might not be very advantageous.
Rice coding is used as the entropy encoding stage in a number of lossless image compression and audio data compression methods.
Overview
Construction of codes
Golomb coding uses a tunable parameter to divide an input value into two parts:, the result of a division by, and, the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When Golomb coding is equivalent to unary coding.Golomb–Rice codes can be thought of as codes that indicate a number by the position of the bin, and the offset within the bin. The above figure shows the position q and offset r for the encoding of integer N using Golomb–Rice parameter M.
Formally, the two parts are given by the following expression, where is the number being encoded:
and
The final result looks like:.
Note that can be of a varying number of bits. Specifically, is only b bits for Rice code and switches between b-1 and b bits for Golomb code. Let. If, then use b-1 bits to encode r. If, then use b bits to encode r. Clearly, if M is a power of 2 and we can encode all values of r with b bits.
The parameter M is a function of the corresponding Bernoulli process, which is parameterized by the probability of success in a given Bernoulli trial. is either the median of the distribution or the median +/− 1. It can be determined by these inequalities:
which are solved by
The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.
Use with signed integers
Golomb's scheme was designed to encode sequences of non-negative numbers. However it is easily extended to accept sequences containing negative numbers using an overlap and interleave scheme, in which all values are reassigned to some positive number in a unique and reversible way. The sequence begins: 0, -1, 1, -2, 2, -3, 3, -4, 4... The nth negative value is mapped to the nth odd number, and the mth positive value is mapped to the mth even number. This may be expressed mathematically as follows: a positive value is mapped to, and a negative value is mapped to. Such a code may be used for simplicity, even if suboptimal. Truly optimal codes for two-sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters, including this one.Simple algorithm
Note below that this is the Rice–Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice coding". In this algorithm, if the M parameter is a power of 2, it becomes equivalent to the simpler Rice encoding.- Fix the parameter M to an integer value.
- For N, the number to be encoded, find
- # quotient = q = int
- # remainder = r = N modulo M
- Generate Codeword
- # The Code format :
, where - # Quotient Code
- ## Write a q-length string of 1 bits
- ## Write a 0 bit
- # Remainder Code
- ## LET
- ### If code r in binary representation using b bits.
- ### If code the number in binary representation using b + 1 bits.
Example
For example, with a Rice–Golomb encoding of parameter M = 10, the decimal number 42 would first be split into q = 4,r = 2, and would be encoded as qcode,rcode = qcode,rcode = 11110,010.
Use for run-length encoding
Given an alphabet of two symbols, or a set of two events, P and Q, with probabilities p and respectively, where p ≥ 1/2, Golomb coding can be used to encode runs of zero or more PConsider using a Rice code with a binary portion having bits to run-length encode sequences where P has a probability. If is the probability that a bit will be part of an -bit run and is the compression ratio of that run, then the expected compression ratio is
Compression is often expressed in terms of, the proportion compressed. For, the run-length coding approach results in compression ratios close to entropy. For example, using Rice code for yields compression, while the entropy limit is.
Adaptive Run-Length Golomb-Rice encoding
When a probability distribution for integers is not known, then the optimal parameter for a Golomb-Rice encoder cannot be determined. Thus, in many applications, a two-pass approach is used: first, the block of data is scanned to estimate a probability density function for the data. The Golomb-Rice parameter is then determined from that estimated PDF. A simpler variation of that approach is to assume that the PDF belongs to a parametrized family, estimate the PDF parameters from the data, and then determine compute the optimal Golomb-Rice parameter. That is the approach used in most of the applications discussed below.An alternative approach to efficiently encode integer data whose PDF is not known or is varying, is to use a backwards-adaptive encoder. The achieves that using a very simple algorithm that adjusts the Golomb-Rice parameter up or down, depending on the last encoded symbol. A decoder can follow the same rule to track the variation of the encoding parameters, so no side information needs to be transmitted, just the encoded data. Assuming a Generalized Gaussian PDF, which covers a wide range of statistics seen in data such as prediction errors or transform coefficients in multimedia codecs, the RLGR encoding algorithm can perform very well in such applications.
Applications
Numerous signal codecs use a Rice code for prediction residues.In predictive algorithms, such residues tend to fall into a two-sided geometric distribution, with small residues being more frequent than large residues, and the Rice code closely approximates the Huffman code for such a distribution without the overhead of having to transmit the Huffman table.
One signal that does not match a geometric distribution is a sine wave, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution.
Several lossless audio codecs, such as Shorten, FLAC, Apple Lossless, and MPEG-4 ALS, use a Rice code after the linear prediction step.
Rice coding is also used in the FELICS lossless image codec.
The Golomb–Rice coder is used in the entropy coding stage of Rice Algorithm based lossless image codecs. One such experiment yields a compression ratio graph given below. See other entries in this category at the bottom of this page. In those compression, the progressive space differential data yields an alternating suite of positive and negative values around 0, which are remapped to positive-only integers, and then Rice–Golomb coding is applied by varying the divisor which remains small.
In those results, the Rice coding may create very long sequences of one-bits for the quotient; for practical reasons, it is often necessary to limit the total run-length of one-bits, so a modified version of the Rice–Golomb encoding consists of replacing the long string of one-bits by encoding its length with a recursive Rice–Golomb encoding; this requires reserving some values in addition to the initial divisor k to allow the necessary distinction.
The JPEG-LS scheme uses Rice–Golomb to encode the prediction residuals.
The adaptive version of Golomb-Rice coding, mentioned above, is used for encoding screen content in virtual machines in the component of the Microsoft Remote Desktop Protocol.