Sanov's theorem


In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.
Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X. Suppose we draw n i.i.d. samples from q, represented by the vector. Further, let us ask that the empirical measure,, of the samples falls within the set A—formally, we write. Then,
where
In words, the probability of drawing an atypical distribution is a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.
Furthermore, if A is the closure of its interior,