Growth function

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class.
The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.
It is a basic concept in machine learning.

Definitions

Set-family definition

Let be a set family and a set. Their intersection is defined as the following set-family:
The intersection-size of with respect to is. If a set has elements then the index is at most. If the index is exactly 2^m then the set is said to be shattered by, because contains all the subsets of, i.e.:
The growth function measures the size of as a function of. Formally:

Hypothesis-class definition

Equivalently, let be a hypothesis-class and a set with elements. The restriction of to is the set of binary functions on that can be derived from :
The growth function measures the size of as a function of :

Examples

1. The domain is the real line.
The set-family contains all the half-lines from a given number to positive infinity, i.e., all sets of the form for some.
For any set of real numbers, the intersection contains sets: the empty set, the set containing the largest element of, the set containing the two largest elements of, and so on. Therefore:. The same is true whether contains open half-lines, closed half-lines, or both.
2. The domain is the segment.
The set-family contains all the open sets.
For any finite set of real numbers, the intersection contains all possible subsets of. There are such subsets, so.
3. The domain is the Euclidean space.
The set-family contains all the half-spaces of the form:, where is a fixed vector.
Then,
where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.
4. The domain is the real line. The set-family contains all the real intervals, i.e., all sets of the form for some. For any set of real numbers, the intersection contains all runs of between 0 and consecutive elements of. The number of such runs is, so.

Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.
The following is a property of the intersection-size:

If, for some set of size, and for some number, -
then, there exists a subset of size such that.

This implies the following property of the Growth function.
For every family there are two cases:

The exponential case: identically.
The polynomial case: is majorized by, where is the smallest integer for which.
Other properties

Trivial upper bound

For any finite :
since for every, the number of elements in is at most. Therefore, the growth function is mainly interesting when is infinite.

Exponential upper bound

For any nonempty :
I.e, the growth function has an exponential upper-bound.
We say that a set-family shatters a set if their intersection contains all possible subsets of, i.e..
If shatters of size, then, which is the upper bound.

Cartesian intersection

Define the Cartesian intersection of two set-families as:
Then:

Union

For every two set-families:

VC dimension

The VC dimension of is defined according to these two cases:

In the polynomial case, = the largest integer for which.
In the exponential case .

So if-and-only-if.
The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether is equal to or smaller than, while the growth function tells us exactly how changes as a function of.
Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:
In particular,
This upper bound is tight, i.e, for all there exists with VC dimension such that:

Entropy

While the growth-function is related to the maximum intersection-size,
the entropy is related to the average intersection size:
The intersection-size has the following property. For every set-family :
Hence:
Moreover, the sequence converges to a constant when.
Moreover, the random-variable is concentrated near.

Applications in probability theory

Let be a set on which a probability measure is defined.
Let be family of subsets of .
Suppose we choose a set that contains elements of,
where each element is chosen at random according to the probability measure, independently of the others. For each event, we compare the following two quantities:

Its relative frequency in, i.e, ;
Its probability.

We are interested in the difference,. This difference satisfies the following upper bound:
which is equivalent to:
In words: the probability that for all events in, the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of.
A corollary of this is that, iff the growth function is polynomial in , then the above probability approaches 1 as. I.e, the family enjoys uniform convergence in probability.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...