Weighted least squares

Weighted least squares, also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which the errors covariance matrix is allowed to be different from an identity matrix.
WLS is also a specialization of generalized least squares in which the above matrix is diagonal.

Introduction

A special case of generalized least squares called weighted least squares occurs when all the off-diagonal entries of Ω are null; the variances of the observations may still be unequal.
The fit of a model to a data point is measured by its residual,, defined as the difference between a measured value of the dependent variable, and the value predicted by the model, :
If the errors are uncorrelated and have equal variance, then the minimum of the function
is found when .
The Gauss–Markov theorem shows that, when this is so, is a best linear unbiased estimator. If, however, the measurements are uncorrelated but have different uncertainties, a modified approach might be adopted. Aitken showed that when a weighted sum of squared residuals is minimized, is the BLUE if each weight is equal to the reciprocal of the variance of the measurement
The gradient equations for this sum of squares are
which, in a linear least squares system give the modified normal equations,
When the observational errors are uncorrelated and the weight matrix, W, is diagonal, these may be written as
If the errors are correlated, the resulting estimator is the BLUE if the weight matrix is equal to the inverse of the variance-covariance matrix of the observations.
When the errors are uncorrelated, it is convenient to simplify the calculations to factor the weight matrix as.
The normal equations can then be written
in the same form as ordinary least squares:
where we define the following scaled matrix and vector:
This is a type of whitening transformation; the last expression involves an entrywise division.
For non-linear least squares systems a similar argument shows that the normal equations should be modified as follows.
Note that for empirical tests, the appropriate W is not known for sure and must be estimated. For this feasible generalized least squares techniques may be used; in this case it is specialized for a diagonal covariance matrix, thus yielding a feasible weighted least squares solution.
If the uncertainty of the observations is not known from external sources, then the weights could be estimated from the given observations. This can be useful, for example, to identify outliers. After the outliers have been removed from the data set, the weights should be reset to one.

Motivation

In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:
where w_i > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.
The weights should, ideally, be equal to the reciprocal of the variance of the measurement..
The normal equations are then:
This method is used in iteratively reweighted least squares.

Parameter errors and correlation

The estimated parameter values are linear combinations of the observed values
Therefore, an expression for the estimated variance-covariance matrix of the parameter estimates can be obtained by error propagation from the errors in the observations. Let the variance-covariance matrix for the observations be denoted by M and that of the estimated parameters by M^β. Then
When W = M⁻¹, this simplifies to
When unit weights are used, it is implied that the experimental errors are uncorrelated and all equal: M = σ²I, where σ² is the a priori variance of an observation.
In any case, σ² is approximated by the reduced chi-squared :
where S is the minimum value of the objective function:
The denominator,, is the number of degrees of freedom; see effective degrees of freedom for generalizations for the case of correlated observations.
In all cases, the variance of the parameter estimate is given by and the covariance between the parameter estimates and is given by. The standard deviation is the square root of variance,, and the correlation coefficient is given by. These error estimates reflect only random errors in the measurements. The true uncertainty in the parameters is larger due to the presence of systematic errors, which, by definition, cannot be quantified.
Note that even though the observations may be uncorrelated, the parameters are typically correlated.

Parameter confidence limits

It is often assumed, for want of any concrete evidence but often appealing to the central limit theorem—see Normal distribution#Occurrence—that the error on each observation belongs to a normal distribution with a mean of zero and standard deviation. Under that assumption the following probabilities can be derived for a single scalar parameter estimate in terms of its estimated standard error :
The assumption is not unreasonable when m >> n. If the experimental errors are normally distributed the parameters will belong to a Student's t-distribution with m − n degrees of freedom. When m >> n Student's t-distribution approximates a normal distribution. Note, however, that these confidence limits cannot take systematic error into account. Also, parameter errors should be quoted to one significant figure only, as they are subject to sampling error.
When the number of observations is relatively small, Chebychev's inequality can be used for an upper bound on probabilities, regardless of any assumptions about the distribution of experimental errors: the maximum probabilities that a parameter will be more than 1, 2 or 3 standard deviations away from its expectation value are 100%, 25% and 11% respectively.

Residual values and correlation

The residuals are related to the observations by
where H is the idempotent matrix known as the hat matrix:
and I is the identity matrix. The variance-covariance matrix of the residuals, M ^r is given by
Thus the residuals are correlated, even if the observations are not.
When,
The sum of residual values is equal to zero whenever the model function contains a constant term. Left-multiply the expression for the residuals by X^T:
Say, for example, that the first term of the model is a constant, so that for all i. In that case it follows that
Thus, in the motivational example, above, the fact that the sum of residual values is equal to zero is not accidental, but is a consequence of the presence of the constant term, α, in the model.
If experimental error follows a normal distribution, then, because of the linear relationship between residuals and observations, so should residuals, but since the observations are only a sample of the population of all possible observations, the residuals should belong to a Student's t-distribution. Studentized residuals are useful in making a statistical test for an outlier when a particular residual appears to be excessively large.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...