Kolmogorov structure function

In 1973 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity.
The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.
The Kolmogorov structure function is used in the algorithmic information theory, also known as the theory of Kolmogorov complexity, for describing the structure of a string by use of models of increasing complexity.

Kolmogorov's definition

The structure function was originally proposed by Kolmogorov in 1973 at a Soviet Information Theory symposium in Tallinn, but these results were not published p. 182. But the results were announced in in 1974, the only written record by Kolmogorov himself. One of his last scientific statements is :

Contemporary definition

It is discussed in Cover and Thomas. It is extensively studied in Vereshchagin and Vitányi where also the main properties are resolved.
The Kolmogorov structure function can be written as
where is a binary string of length with where is a contemplated model for, is the Kolmogorov complexity of and is a nonnegative integer value bounding the complexity of the contemplated 's. Clearly, this function is nonincreasing and reaches for where is the required number of bits to change into and is the Kolmogorov complexity of.

The algorithmic sufficient statistic

We define a set containing such that
The function never decreases more than a fixed independent constant below the diagonal called sufficiency line L defined by
It is approached to within a constant distance by the graph of for certain arguments. For these 's we have and the associated model is called an optimal set for, and its description of bits is therefore an algorithmic sufficient statistic. We write `algorithmic' for `Kolmogorov complexity' by convention. The main properties of an algorithmic sufficient statistic are the following: If is an algorithmic sufficient statistic for, then
That is, the two-part description of using the model and as data-to-model code the index of in the enumeration of in bits, is as concise as the shortest one-part code of in bits. This can be easily seen as follows:
using straightforward inequalities and the sufficiency property, we find that. Therefore, the randomness deficiency of in is a constant, which means that is a typical element of S. However, there can be models containing that are not sufficient statistics. An algorithmic sufficient statistic for has the additional property, apart from being a model of best fit, that and therefore by the Kolmogorov complexity symmetry of information we have : the algorithmic sufficient statistic is a model of best fit that is almost completely determined by. The algorithmic sufficient statistic associated with the least such is called the algorithmic minimal sufficient statistic.
With respect to the picture: The MDL structure function is explained below. The Goodness-of-fit structure function is the least randomness deficiency of any model for such that. This structure function gives the goodness-of-fit of a model for the string x. When it is low the model fits well, and when it is high the model doesn't fit well. If for some then there is a typical model for such that and is typical for S. That is, is the best-fitting model for x. For more details see and especially and.

Selection of properties

Within the constraints that the graph goes down at an angle of at least 45 degrees, that it starts at n and ends approximately at, every graph is realized by the structure function of some data x and vice versa. Where the graph hits the diagonal first the argument is that of the minimum sufficient statistic. It is incomputable to determine this place. See.

Main property

It is proved that at each level of complexity the structure function allows us to select the best model for the individual string x within a strip of with certainty, not with great probability.

The MDL variant

The Minimum description length function: The length of the minimal two-part code for x consisting of the model cost K and the
length of the index of x in S, in the model class of sets of given maximal Kolmogorov complexity, the complexity of S upper bounded by, is given by the MDL function or constrained MDL estimator:
where is the total length of two-part code of x with help of model S.

Main property

It is proved that at each level of complexity the structure function allows us to select the best model S for the individual string x within a strip of with certainty, not with great probability.

Application in statistics

The mathematics developed above were taken as the foundation of MDL by its inventor Jorma Rissanen.

Probability models

For every computable probability distribution it can be proved that
For example, if is some computable distribution on the set of strings of length, then each has probability. Kolmogorov's structure function becomes
where x is a binary string of length n with where is a contemplated model for, is the Kolmogorov complexity of and is an integer value bounding the complexity of the contemplated 's. Clearly, this function is non-increasing and reaches for where c is the required number of bits to change into and is the Kolmogorov complexity of. Then. For every complexity level the function is the Kolmogorov complexity version of the maximum likelihood.

Main property

It is proved that at each level of complexity the structure function allows us to select the best model for the individual string within a strip of with certainty, not with great probability.

The MDL variant and probability models

The MDL function: The length of the minimal two-part code for x consisting of the model cost K and the
length of, in the model class of computable probability mass functions of given maximal Kolmogorov complexity, the complexity of P upper bounded by, is given by the MDL function or constrained MDL estimator:
where is the total length of two-part code of x with help of model P.

Main property

It is proved that at each level of complexity the MDL function allows us to select the best model P for the individual string x within a strip of with certainty, not with great probability.

Extension to rate distortion and denoising

It turns out that the approach can be extended to a theory of rate distortion of individual finite sequences
and denoising of individual finite sequences using Kolmogorov complexity. Experiments using real compressor programs have been carried out with success. Here the assumption is that for natural data the Kolmogorov complexity is not far from the length of a compressed version using a good compressor.

Literature

, Especially pp. 401–431 about the Kolmogorov structure function, and pp. 613–629 about rate distortion and denoising of individual sequences.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...