Karhunen–Loève theorem

In the theory of stochastic processes, the Karhunen–Loève theorem , also known as the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis technique widely used in image processing and in data analysis in many fields.
Stochastic processes given by infinite series of this form were first considered by Damodar Dharmananda Kosambi. There exist many such expansions of a stochastic process: if the process is indexed over, any orthonormal basis of yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.
In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions, the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.
In the case of a centered stochastic process satisfying a technical continuity condition, admits a decomposition
where are pairwise uncorrelated random variables and the functions are continuous real-valued functions on that are pairwise orthogonal in. It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients are orthogonal in the probability space while the deterministic functions are orthogonal in the time domain. The general case of a process that is not centered can be brought back to the case of a centered process by considering which is a centered process.
Moreover, if the process is Gaussian, then the random variables are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.
The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version is known as the Karhunen–Loève transform, principal component analysis, proper orthogonal decomposition , empirical orthogonal functions, or the Hotelling transform.

Formulation

Throughout this article, we will consider a square-integrable zero-mean random process defined over a probability space and indexed over a closed interval, with covariance function. We thus have:
We associate to K_X a linear operator T_{K_X} defined in the following way:
Statement of the theorem

Theorem. Let be a zero-mean square-integrable stochastic process defined over a probability space and indexed over a closed and bounded interval , with continuous covariance function K_X.
Then K_X is a Mercer kernel and letting e_k be an orthonormal basis on formed by the eigenfunctions of T_{K_X} with respective eigenvalues admits the following representation
where the convergence is in L², uniform in t and
Furthermore, the random variables Z_k have zero-mean, are uncorrelated and have variance λ_k
Note that by generalizations of Mercer's theorem we can replace the interval with other compact spaces C and the Lebesgue measure on with a Borel measure whose support is C.

Proof

The covariance function K_X satisfies the definition of a Mercer kernel. By Mercer's theorem, there consequently exists a set of eigenvalues and eigenfunctions of T_{K_X} forming an orthonormal basis of L², and K_X can be expressed as
The process X_t can be expanded in terms of the eigenfunctions e_k as:
We may then derive
Let us now show that the convergence is in L². Let
Properties of the Karhunen–Loève transform

Special case: Gaussian distribution

Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random variables are independent if and only if they are orthogonal, we can also conclude:
Theorem. The variables have a joint Gaussian distribution and are stochastically independent if the original process is Gaussian.
In the Gaussian case, since the variables are independent, we can say more:
almost surely.

The Karhunen–Loève transform decorrelates the process

This is a consequence of the independence of the.

The Karhunen–Loève expansion minimizes the total mean square error

In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.
More specifically, given any orthonormal basis of L², we may decompose the process X_t as:
where
and we may approximate X_t by the finite sum
for some integer N.
Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error.

Consider the error resulting from the truncation at the N-th term in the following orthonormal expansion:
The mean-square error ε_N² can be written as:
We then integrate this last equality over . The orthonormality of the f_k yields:
The problem of minimizing the total mean-square error thus comes down to minimizing the right hand side of this equality subject to the constraint that the f_k be normalized. We hence introduce, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function:
Differentiating with respect to f_i and setting the derivative to 0 yields:
which is satisfied in particular when
In other words, when the f_k are chosen to be the eigenfunctions of T_{K_X}, hence resulting in the KL expansion.

Explained variance

An important observation is that since the random coefficients Z_k of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of X_t is simply the sum of the variances of the individual components of the sum:
Integrating over and using the orthonormality of the e_k, we obtain that the total variance of the process is:
In particular, the total variance of the N-truncated approximation is
As a result, the N-truncated expansion explains
of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an such that

The Karhunen–Loève expansion has the minimum representation entropy property

Given a representation of, for some orthonormal basis and random, we let, so that. We may then define the representation entropy to be. Then we have, for all choices of. That is, the KL-expansion has minimal representation entropy.
Proof:
Denote the coefficients obtained for the basis as, and for as.
Choose. Note that since minimizes the mean squared error, we have that
Expanding the right hand size, we get:
Using the orthonormality of, and expanding in the basis, we get that the right hand size is equal to:
We may perform indentitcal analysis for the, and so rewrite the above inequality as:
Subtracting the common first term, and dividing by, we obtain that:
This implies that:

Linear Karhunen–Loève approximations

Consider a whole class of signals we want to approximate over the first vectors of a basis. These signals are modeled as realizations of a random vector of size. To optimize the approximation we design a basis that minimizes the average approximation error. This section proves that optimal bases are Karhunen–Loeve bases that diagonalize the covariance matrix of. The random vector can be decomposed in an orthogonal basis
as follows:
where each
is a random variable. The approximation from the first vectors of the basis is
The energy conservation in an orthogonal basis implies
This error is related to the covariance of defined by

For any vector we denote by the covariance operator represented by this matrix,
The error is therefore a sum of the last coefficients of the covariance operator
The covariance operator is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a Karhunen–Loève basis. The following theorem states that a Karhunen–Loève basis is optimal for linear approximations.
Theorem. Let be a covariance operator. For all, the approximation error
is minimum if and only if
is a Karhunen–Loeve basis ordered by decreasing eigenvalues.

Non-Linear approximation in bases

Linear approximations project the signal on M vectors a priori. The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties. This section analyzes the general performance of these non-linear approximations. A signal is approximated with M vectors selected adaptively in an orthonormal basis for
Let be the projection of f over M vectors whose indices are in :
The approximation error is the sum of the remaining coefficients
To minimize this error, the indices in must correspond to the M vectors having the largest inner product amplitude
These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of a linear approximation which selects the M approximation vectors independently of f. Let us sort
in decreasing order
The best non-linear approximation is
It can also be written as inner product thresholding:
with
The non-linear error is
this error goes quickly to zero as M increases, if the sorted values of have a fast decay as k increases. This decay is quantified by computing the norm of the signal inner products in B:
The following theorem relates the decay of to
Theorem. If with then
and
Conversely, if then
for any.

Non-optimality of Karhunen–Loève bases

To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y of size N that is random shift modulo N of a deterministic signal f of zero mean
The random shift P is uniformly distributed on :
Clearly
and
Hence
Since R_Y is N periodic, Y is a circular stationary random vector. The covariance operator is a circular convolution with R_Y and is therefore diagonalized in the discrete Fourier Karhunen–Loève basis
The power spectrum is Fourier transform of R_Y:
Example: Consider an extreme case where. A theorem stated above guarantees that the Fourier Karhunen–Loève basis produces a smaller expected approximation error than a canonical basis of Diracs. Indeed, we do not know a priori the abscissa of the non-zero coefficients of Y, so there is no particular Dirac that is better adapted to perform the approximation. But the Fourier vectors cover the whole support of Y and thus absorb a part of the signal energy.
Selecting higher frequency Fourier coefficients yields a better mean-square approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for non-linear approximations. If then the discrete Fourier basis is extremely inefficient because f and hence Y have an energy that is almost uniformly spread among all Fourier vectors. In contrast, since f has only two non-zero coefficients in the Dirac basis, a non-linear approximation of Y with gives zero error.

Principal component analysis

We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind
However, when applied to a discrete and finite process, the problem takes a much simpler form and standard algebra can be used to carry out the calculations.
Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.
We henceforth consider a random N-dimensional vector. As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.
As in the continuous version, we assume that X is centered, otherwise we can let which is centered.
Let us adapt the procedure to the discrete case.

Covariance matrix

Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.
Define Σ, the covariance matrix of X, as an N × N matrix whose elements are given by:
Rewriting the above integral equation to suit the discrete case, we observe that it turns into:
where is an N-dimensional vector.
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of, and we write this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of. Let also be the orthonormal matrix consisting of these eigenvectors:

Principal component transform

It remains to perform the actual KL transformation, called the principal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:
In a more compact form, the principal component transform of X is defined by:
The i-th component of Y is, the projection of X on and the inverse transform yields the expansion of on the space spanned by the :
As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some such that
where α is the explained variance threshold we wish to set.
We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation.

Examples

The Wiener process

There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process W_t with covariance function
We restrict the time domain to = without loss of generality.
The eigenvectors of the covariance kernel are easily determined. These are
and the corresponding eigenvalues are

In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:
differentiating once with respect to t yields:
a second differentiation produces the following differential equation:
The general solution of which has the form:
where A and B are two constants to be determined with the boundary conditions. Setting t = 0 in the initial integral equation gives e = 0 which implies that B = 0 and similarly, setting t = 1 in the first differentiation yields e' = 0, whence:
which in turn implies that eigenvalues of T_{K_X} are:
The corresponding eigenfunctions are thus of the form:
A is then chosen so as to normalize e_k:

This gives the following representation of the Wiener process:
Theorem. There is a sequence _i of independent Gaussian random variables with mean zero and variance 1 such that
Note that this representation is only valid for On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L² norm and uniform in t.

The Brownian bridge

Similarly the Brownian bridge which is a stochastic process with covariance function
can be represented as the series

Applications

systems sometimes use K–L functions to reconstruct wave-front phase information.
Karhunen–Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.

Applications in signal estimation and detection

Detection of a known continuous signal ''S''(''t'')

In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signal s from channel output X, N is the channel noise, which is usually assumed zero mean Gaussian process with correlation function

Signal detection in white noise

When the channel noise is white, its correlation function is
and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:
Then the noise correlation function is sinc function with zeros at Since are uncorrelated and gaussian, they are independent. Thus we can take samples from X with time spacing
Let. We have a total of i.i.d observations to develop the likelihood-ratio test. Define signal, the problem becomes,
The log-likelihood ratio
As, let:
Then G is the test statistics and the Neyman–Pearson optimum detector is
As G is Gaussian, we can characterize it by finding its mean and variances. Then we get
where
is the signal energy.
The false alarm error
And the probability of detection:
where Φ is the cdf of standard normal, or Gaussian, variable.

Signal detection in colored noise

When N is colored Gaussian noise with zero mean and covariance function we cannot sample independent discrete observations by evenly spacing the time. Instead, we can use K–L expansion to uncorrelate the noise process and get independent Gaussian observation 'samples'. The K–L expansion of N:
where and the orthonormal bases are generated by kernel, i.e., solution to
Do the expansion:
where, then
under H and under K. Let, we have
Hence, the log-LR is given by
and the optimum detector is
Define
then

How to find ''k''(''t'')

Since
k is the solution to
If Nis wide-sense stationary,
which is known as the Wiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculate k is white Gaussian noise.
The corresponding impulse response is h = k = CS. Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.

Test threshold for Neyman–Pearson detector

Since X is a Gaussian process,
is a Gaussian random variable that can be characterized by its mean and variance.
Hence, we obtain the distributions of H and K:
The false alarm error is
So the test threshold for the Neyman–Pearson optimum detector is
Its power of detection is
When the noise is white Gaussian process, the signal power is

Prewhitening

For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N is a wide-sense stationary colored noise with correlation function
The transfer function of prewhitening filter is

Detection of a Gaussian random signal in Additive white Gaussian noise (AWGN)">Additive white Gaussian noise">Additive white Gaussian noise (AWGN)

When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian process X, we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:
X is a random process with correlation function
The K–L expansion of X is
where
and are solutions to
So 's are independent sequence of r.v's with zero mean and variance. Expanding Y and N by, we get
where
As N is Gaussian white noise, 's are i.i.d sequence of r.v with zero mean and variance, then the problem is simplified as follows,
The Neyman–Pearson optimal test:
so the log-likelihood ratio is
Since
is just the minimum-mean-square estimate of given 's,
K–L expansion has the following property: If
where
then
So let
Noncausal filter Q can be used to get the estimate through
By orthogonality principle, Q satisfies
However, for practical reasons, it's necessary to further derive the causal filter h, where h = 0 for s > t, to get estimate. Specifically,

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...