Learnable function class

In statistical learning theory, a learnable function class is a set of functions for which an algorithm can be devised to asymptotically minimize the expected risk, uniformly over all probability distributions. The concept of learnable classes are closely related to regularization in machine learning, and provides large sample justifications for certain learning algorithms.

Definition

Background

Let be the sample space, where are the labels and are the covariates. is a collection of mappings under consideration to link to. is a pre-given loss function. Given a probability distribution on, define the expected risk to be:
The general goal in statistical learning is to find the function in that minimizes the expected risk. That is, to find solutions to the following problem:
But in practice the distribution is unknown, and any learning task can only be based on finite samples. Thus we seek instead to find an algorithm that asymptotically minimizes the empirical risk, i.e., to find a sequence of functions that satisfies
One usual algorithm to find such a sequence is through empirical risk minimization.

Learnable function class

We can make the condition given in the above equation stronger by requiring that the convergence is uniform for all probability distributions. That is:
The intuition behind the more strict requirement is as such: the rate at which sequence converges to the minimizer of the expected risk can be very different for different. Because in real world the true distribution is always unknown, we would want to select a sequence that performs well under all cases.
However, by the no free lunch theorem, such a sequence that satisfies does not exist if is too complex. This means we need to be careful and not allow too "many" functions in if we want to be a meaningful requirement. Specifically, function classes that ensure the existence of a sequence that satisfies are known as learnable classes.
It is worth noting that at least for supervised classification and regression problems, if a function class is learnable, then the empirical risk minimization automatically satisfies. Thus in these settings not only do we know that the problem posed by is solvable, we also immediately have an algorithm that gives the solution.

Interpretations

If the true relationship between and is, then by selecting the appropriate loss function, can always be expressed as the minimizer of the expected loss across all possible functions. That is,
Here we let be the collection of all possible functions mapping onto. can be interpreted as the actual data generating mechanism. However, the no free lunch theorem tells us that in practice, with finite samples we cannot hope to search for the expected risk minimizer over. Thus we often consider a subset of,, to carry out searches on. By doing so, we risk that might not be an element of. This tradeoff can be mathematically expressed as
In the above decomposition, part does not depend on the data and is non-stochastic. It describes how far away our assumptions are from the truth. will be strictly greater than 0 if we make assumptions that are too strong. On the other hand, failing to put enough restrictions on will cause it to be not learnable, and part will not stochastically converge to 0. This is the well-known overfitting problem in statistics and machine learning literature.

Example: Tikhonov regularization

A good example where learnable classes are used is the so-called Tikhonov regularization in reproducing kernel Hilbert space. Specifically, let be an RKHS, and be the norm on given by its inner product. It is shown in that is a learnable class for any finite, positive. The empirical minimization algorithm to the dual form of this problem is
This was first introduced by Tikhonov to solve ill-posed problems. Many statistical learning algorithms can be expressed in such a form.
The tradeoff between and in is geometrically more intuitive with Tikhonov regularization in RKHS. We can consider a sequence of, which are essentially balls in with centers at 0. As gets larger, gets closer to the entire space, and is likely to become smaller. However we will also suffer smaller convergence rates in. The way to choose an optimal in finite sample settings is usually through cross-validation.

Relationship to empirical process theory

Part in is closely linked to empirical process theory in statistics, where the empirical risk are known as empirical processes. In this field, the function class that satisfies the stochastic convergence
are known as uniform Glivenko–Cantelli classes. It has been shown that under certain regularity conditions, learnable classes and uniformly Glivenko-Cantelli classes are equivalent. Interplay between and in statistics literature is often known as the bias-variance tradeoff.
However, note that in the authors gave an example of stochastic convex optimization for General Setting of Learning where learnability is not equivalent with uniform convergence.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...