Total least squares

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.
The total least squares approximation of the data is generically equivalent to the best, in the Frobenius norm, low-rank approximation of the data matrix.

Linear model

Background

In the least squares method of data modeling, the objective function, S,
is minimized, where r is the vector of residuals and W is a weighting matrix. In linear least squares the model contains equations which are linear in the parameters appearing in the parameter vector, so the residuals are given by
There are m observations in y and n parameters in β with m>n. X is a m×n matrix whose elements are either constants or functions of the independent variables, x. The weight matrix W is, ideally, the inverse of the variance-covariance matrix of the observations y. The independent variables are assumed to be error-free. The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations

Allowing observation errors in all variables

Now, suppose that both x and y are observed subject to error, with variance-covariance matrices and respectively. In this case the objective function can be written as
where and are the residuals in x and y respectively. Clearly these residuals cannot be independent of each other, but they must be constrained by some kind of relationship. Writing the model function as, the constraints are expressed by m condition equations.
Thus, the problem is to minimize the objective function subject to the m constraints. It is solved by the use of Lagrange multipliers. After some algebraic manipulations, the result is obtained.
or alternatively
where M is the variance-covariance matrix relative to both independent and dependent variables.

Example

When the data errors are uncorrelated, all matrices M and W are diagonal. Then, take the example of straight line fitting.
in this case
showing how the variance at the ith point is determined by the variances of both independent and dependent variables and by the model being used to fit the data. The expression may be generalized by noting that the parameter is the slope of the line.
An expression of this type is used in fitting pH titration data where a small error on x translates to a large error on y when the slope is large.

Algebraic point of view

As was shown in 1980 by Golub and Van Loan, the TLS problem does not have a solution in general. The following considers the simple case where a unique solution exists without making any particular assumptions.
The computation of the TLS using singular value decomposition is described in standard texts. We can solve the equation
for B where X is m-by-n and Y is m-by-k.
That is, we seek to find B that minimizes error matrices E and F for X and Y respectively. That is,
where is the augmented matrix with E and F side by side and is the Frobenius norm, the square root of the sum of the squares of all entries in a matrix and so equivalently the square root of the sum of squares of the lengths of the rows or columns of the matrix.
This can be rewritten as
where is the identity matrix.
The goal is then to find that reduces the rank of by k. Define to be the singular value decomposition of the augmented matrix.
where V is partitioned into blocks corresponding to the shape of X and Y.
Using the Eckart–Young theorem, the approximation minimising the norm of the error is such that matrices and are unchanged, while the smallest singular values are replaced with zeroes. That is, we want
so by linearity,
We can then remove blocks from the U and Σ matrices, simplifying to
This provides E and F so that
Now if is nonsingular, which is not always the case, we can then right multiply both sides by to bring the bottom block of the right matrix to the negative identity, giving
and so
A naive GNU Octave implementation of this is:

function B = tls
= size; % n is the width of X
Z = ; % Z is X augmented with Y.
= svd; % find the SVD of Z.
VXY = V; % Take the block of V consisting of the first n rows and the n+1 to last column
VYY = V; % Take the bottom-right block of V.
B = -VXY/VYY;
end

The way described above of solving the problem, which requires that the matrix is nonsingular, can be slightly extended by the so-called classical TLS algorithm.

Computation

The standard implementation of classical TLS algorithm is available through , see also. All modern implementations based, for example, on solving a sequence of ordinary least squares problems, approximate the matrix , as introduced by Van Huffel and Vandewalle. It is worth noting, that this is, however, not the TLS solution in many cases.

Non-linear model

For non-linear systems similar reasoning shows that the normal equations for an iteration cycle can be written as

Geometrical interpretation

When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve. In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction. In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the shortest distance between the data point and the fitted curve, that is, the residual vector is perpendicular to the tangent of the curve. For this reason, this type of regression is sometimes called two dimensional Euclidean regression or orthogonal regression.

Scale invariant methods

A serious difficulty arises if the variables are not measured in the same units. First consider measuring distance between a data point and the line: what are the measurement units for this distance? If we consider measuring distance based on Pythagoras' Theorem then it is clear that we shall be adding quantities measured in different units, which is meaningless. Secondly, if we rescale one of the variables e.g., measure in grams rather than kilograms, then we shall end up with different results. To avoid these problems it is sometimes suggested that we convert to dimensionless variables—this may be called normalization or standardization. However there are various ways of doing this, and these lead to fitted models which are not equivalent to each other. One approach is to normalize by known measurement precision thereby minimizing the Mahalanobis distance from the points to the line, providing a maximum-likelihood solution; the unknown precisions could be found via analysis of variance.
In short, total least squares does not have the property of units-invariance—i.e. it is not scale invariant. For a meaningful model we require this property to hold. A way forward is to realise that residuals measured in different units can be combined if multiplication is used instead of addition. Consider fitting a line: for each data point the product of the vertical and horizontal residuals equals twice the area of the triangle formed by the residual lines and the fitted line. We choose the line which minimizes the sum of these areas. Nobel laureate Paul Samuelson proved in 1942 that, in two dimensions, it is the only line expressible solely in terms of the ratios of standard deviations and the correlation coefficient which fits the correct equation when the observations fall on a straight line, exhibits scale invariance, and exhibits invariance under interchange of variables. This solution has been rediscovered in different disciplines and is variously known as standardised major axis, the reduced major axis, the geometric mean functional relationship, least products regression, diagonal regression, line of organic correlation, and the least areas line. Tofallis has extended this approach to deal with multiple variables.

Others

I. Hnětynková, M. Plešinger, D. M. Sima, Z. Strakoš, and S. Van Huffel, The total least squares problem in AX ≈ B. A new classification with the relationship to the classical works. SIMAX vol. 32 issue 3, pp. 748–770. Available as a .
M. Plešinger, The Total Least Squares Problem and Reduction of Data in AX ≈ B. Doctoral Thesis, TU of Liberec and Institute of Computer Science, AS CR Prague, 2008.
C. C. Paige, Z. Strakoš, Core problems in linear algebraic systems. SIAM J. Matrix Anal. Appl. 27, 2006, pp. 861–875.
S. Van Huffel and P. Lemmerling, Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002.
S. Jo and S. W. Kim, Consistent normalized least mean square filtering with noisy data matrix. IEEE Trans. Signal Process., vol. 53, no. 6, pp. 2112–2123, Jun. 2005.
R. D. DeGroat and E. M. Dowling, The data least squares problem and channel equalization. IEEE Trans. Signal Process., vol. 41, no. 1, pp. 407–411, Jan. 1993.
S. Van Huffel and J. Vandewalle, The Total Least Squares Problems: Computational Aspects and Analysis. SIAM Publications, Philadelphia PA, 1991.
T. Abatzoglou and J. Mendel, Constrained total least squares, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1987, vol. 12, pp. 1485–1488.
P. de Groen An introduction to total least squares, in Nieuw Archief voor Wiskunde, Vierde serie, deel 14, 1996, pp. 237–253 .
G. H. Golub and C. F. Van Loan, An analysis of the total least squares problem. SIAM J. on Numer. Anal., 17, 1980, pp. 883–893.
at MathPages
A. R. Amiri-Simkooei and S. Jazaeri Weighted total least squares formulated by standard least squares theory,in Journal of Geodetic Science, 2 : 113–124, 2012 .

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...