Law of total variance


In probability theory, the law of total variance or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law, states that if X and Y are random variables on the same probability space, and the variance of Y is finite, then
In language perhaps better known to statisticians than to probability theorists, the two terms are the "unexplained" and the "explained" components of the variance respectively. In actuarial science, specifically credibility theory, the first component is called the expected value of the process variance and the second is called the variance of the hypothetical means. These two components are also the source of the term "Eve's law", from the initials EV VE for "expectation of variance" and "variance of expectation".
There is a general variance decomposition formula for c ≥ 2 components. For example, with two conditioning random variables:
which follows from the law of total conditional variance:
Note that the conditional expected value is a random variable in its own right, whose value depends on the value of X. Notice that the conditional expected value of Y given the event X = x is a function of x. If we write E = g then the random variable is just g. Similar comments apply to the conditional variance.
One special case, states that if is a partition of the whole outcome space, i.e. these events are mutually exclusive and exhaustive, then
In this formula, the first component is the expectation of the conditional variance; the other two rows are the variance of the conditional expectation.

Proof

The law of total variance can be proved using the law of total expectation. First,
from the definition of variance. Then we apply the law of total expectation to each term by conditioning on the random variable X:
Now we rewrite the conditional second moment of Y in terms of its variance and first moment:
Since the expectation of a sum is the sum of expectations, the terms can now be regrouped:
Finally, we recognize the terms in parentheses as the variance of the conditional expectation E:

General variance decomposition applicable to dynamic systems

The following formula shows how to apply the general, measure theoretic variance decomposition formula to stochastic dynamic systems. Let Y be the value of a system variable at time t. Suppose we have the internal histories , each one corresponding to the history of a different collection of system variables. The collections need not be disjoint. The variance of Y can be decomposed, for all times t, into c ≥ 2 components as follows:
The decomposition is not unique. It depends on the order of the conditioning in the sequential decomposition.

The square of the correlation and explained (or informational) variation

In cases where are such that the conditional expected value is linear; i.e., in cases where
it follows from the bilinearity of covariance that
and
and the explained component of the variance divided by the total variance is just the square of the correlation between Y and X; i.e., in such cases,
One example of this situation is when have a bivariate normal distribution.
More generally, when the conditional expectation is a non-linear function of X
which can be estimated as the R squared from a non-linear regression of Y on X, using data drawn from the joint distribution of. When has a Gaussian distribution, or Y itself has a Gaussian distribution, this explained component of variation sets a lower bound on the mutual information:

Higher moments

A similar law for the third central moment μ3 says
For higher cumulants, a generalization exists. See law of total cumulance.