are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.
Entropic inequalities
Consider a tuple of finitely supported random variables on the same probability space. There are 2n subsets, for which entropies can be computed. For example, when n = 2, we may consider the entropies and. They satisfy the following inequalities :
In fact, these can all be expressed as special cases of a single inequality involving the conditional mutual information, namely where,, and each denote the joint distribution of some arbitrary subset of our collection of random variables. Inequalities that can be derived as linear combinations of this are known as Shannon-type inequalities. For larger there are further restrictions on possible values of entropy. To make this precise, a vector in indexed by subsets of is said to be entropic if there is a joint, discrete distribution of n random variables such that is their joint entropy, for each subset. The set of entropic vectors is denoted, following the notation of Yeung. It is not closed nor convex for, but it's topological closure is known to be convex and hence it can be characterized by the linear inequalities satisfied by all entropic vectors, called entropic inequalities. The set of all vectors that satisfy Shannon-type inequalities contains. This containment is strict for and further inequalities are known as non-Shannon type inequalities. Zhang and Yeung reported the first non-Shannon-type inequality. Matus proved that no finite set of inequalities can characterize all entropic inequalities. In other words, the region is not a polytope.
A great many important inequalities in information theory are actually lower bounds for the Kullback–Leibler divergence. Even the Shannon-type inequalities can be considered part of this category, since the bivariate mutual information can be expressed as the Kullback–Leibler divergence of the joint distribution with respect to the product of the marginals, and thus these inequalities can be seen as a special case of Gibbs' inequality. On the other hand, it seems to be much more difficult to derive useful upper bounds for the Kullback–Leibler divergence. This is because the Kullback–Leibler divergence DKL depends very sensitively on events that are very rare in the reference distribution Q. DKL increases without bound as an event of finite non-zero probability in the distribution P becomes exceedingly rare in the reference distribution Q, and in fact DKL is not even defined if an event of non-zero probability in P has zero probability in Q.
Gibbs' inequality
This fundamental inequality states that the Kullback–Leibler divergence is non-negative.
Given discrete random variables,, and, such that takes values only in the interval and is determined by , we have relating the conditional expectation to the conditionalmutual information. This is a simple consequence of Pinsker's inequality.