Kaczmarz method

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems. It was first discovered by the Polish mathematician Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique. ART includes the positivity constraint, making it nonlinear.
The Kaczmarz method is applicable to any linear system of equations, but its computational advantage relative to other methods depends on the system being sparse. It has been demonstrated to be superior, in some biomedical imaging applications, to other methods such as the filtered backprojection method.
It has many applications ranging from computed tomography to signal processing. It can be obtained also by applying to the hyperplanes, described by the linear system, the method of successive projections onto convex sets.

Algorithm 1: Kaczmarz algorithm

Let be a system of linear equations, let the number of rows of A, be the th row of complex-valued matrix, and let be arbitrary complex-valued initial approximation to the solution of. For compute:
where and denotes complex conjugation of.
If the system is consistent, converges to the minimum-norm solution, provided that the iterations start with the zero vector.
A more general algorithm can be defined using a relaxation parameter
There are versions of the method that converge to a regularized weighted least squares solution when applied to a system of inconsistent equations and, at least as far as initial behavior is concerned, at a lesser cost than other iterative methods, such as the conjugate gradient method.

Algorithm 2: Randomized Kaczmarz algorithm

In 2009, a randomized version of the Kaczmarz method for overdetermined linear systems was introduced by Thomas Strohmer and Roman Vershynin in which the i-th equation is selected randomly with probability proportional to
This method can be seen as a particular case of stochastic gradient descent.
Under such circumstances converges exponentially fast to the solution of and the rate of convergence depends only on the scaled condition number.

Proof

We have
Using
we can write as
The main point of the proof is to view the left hand side in as an expectation of some random variable. Namely, recall that the solution space of the equation of is the hyperplane
whose normal is Define a random vector Z whose values are the normals to all the equations of, with probabilities as in our algorithm:
Then says that
The orthogonal projection onto the solution space of a random equation of is given by
Now we are ready to analyze our algorithm. We want to show that the error reduces at each step in average by at least the factor of The next approximation is computed from as where are independent realizations of the random projection The vector is in the kernel of It is orthogonal to the solution space of the equation onto which projects, which contains the vector . The orthogonality of these two vectors then yields
To complete the proof, we have to bound from below. By the definition of, we have
where are independent realizations of the random vector
Thus
Now we take the expectation of both sides conditional upon the choice of the random vectors . Then
By and the independence,
Taking the full expectation of both sides, we conclude that
The superiority of this selection was illustrated with the reconstruction of a bandlimited function from its nonuniformly spaced sampling values. However, it has been pointed out that the reported success by Strohmer and Vershynin depends on the specific choices that were made there in translating the underlying problem, whose geometrical nature is to find a common point of a set of hyperplanes, into a system of algebraic equations. There will always be legitimate algebraic representations of the underlying problem for which the selection method in will perform in an inferior manner.
The Kaczmarz iteration has a purely geometric interpretation: the algorithm successively projects the current iterate onto the hyperplane defined by the next equation. Hence, any scaling of the equations is irrelevant; it can also be seen from that any scaling of the equations cancels out. Thus, in RK, one can use or any other weights that may be relevant. Specifically, in the above-mentioned reconstruction example, the equations were chosen with probability proportional to the average distance of each sample point from its two nearest neighbors — a concept introduced by Feichtinger and :de:Karlheinz Gröchenig|Gröchenig. For additional progress on this topic, see, and the references therein.

Algorithm 3: Gower-Richtarik algorithm

In 2015, Robert M. Gower and Peter Richtarik developed a versatile randomized iterative method for solving a consistent system of linear equations which includes the randomized Kaczmarz algorithm as a special case. Other special cases include randomized coordinate descent, randomized Gaussian descent and randomized Newton method. Block versions and versions with importance sampling of all these methods also arise as special cases. The method is shown to enjoy exponential rate decay - also known as linear convergence, under very mild conditions on the way randomness enters the algorithm. The Gower-Richtarik method is the first algorithm uncovering a "sibling" relationship between these methods, some of which were independently proposed before, while many of which were new.

Insights about Randomized Kaczmarz

Interesting new insights about the randomized Kaczmarz method that can be gained from the analysis of the method include:

The general rate of the Gower-Richtarik algorithm precisely recovers the rate of the randomized Kaczmarz method in the special case when it reduced to it.
The choice of probabilities for which the randomized Kaczmarz algorithm was originally formulated and analyzed is not optimal. Optimal probabilities are the solution of a certain semidefinite program. The theoretical complexity of randomized Kaczmarz with the optimal probabilities can be arbitrarily better than the complexity for the standard probabilities. However, the amount by which it is better depends on the matrix. There are problems for which the standard probabilities are optimal.
When applied to a system with matrix which is positive definite, Randomized Kaczmarz method is equivalent to the Stochastic Gradient Descent method for minimizing the strongly convex quadratic function Note that since is convex, the minimizers of must satisfy, which is equivalent to The "special stepsize" is the stepsize which leads to a point which in the one-dimensional line spanned by the stochastic gradient minimizes the Euclidean distance from the unknown minimizer of, namely, from This insight is gained from a dual view of the iterative process.
Six Equivalent Formulations

The Gower-Richtarik method enjoys six seemingly different but equivalent formulations, shedding additional light on how to interpret it :

1. Sketching viewpoint: Sketch & Project
2. Optimization viewpoint: Constrain and Approximate
3. Geometric viewpoint: Random Intersect
4. Algebraic viewpoint 1: Random Linear Solve
5. Algebraic viewpoint 2: Random Update
6. Analytic viewpoint: Random Fixed Point

We now describe some of these viewpoints. The method depends on 2 parameters:

a positive definite matrix giving rise to a weighted Euclidean inner product and the induced norm
and a random matrix with as many rows as .
1. Sketch and Project

Given previous iterate the new point is computed by drawing a random matrix , and setting
That is, is obtained as the projection of onto the randomly sketched system. The idea behind this method is to pick in such a way that a projection onto the sketched system is substantially simpler than the solution of the original system. Randomized Kaczmarz method is obtained by picking to be the identity matrix, and to be the unit coordinate vector with probability Different choices of and lead to different variants of the method.

2. Constrain and Approximate

A seemingly different but entirely equivalent formulation of the method is
where is also allowed to vary, and where is any solution of the system Hence, is obtained by first constraining the update to the linear subspace spanned by the columns of the random matrix, i.e., to
and then choosing the point from this subspace which best approximates. This formulation may look surprising as it seems impossible to perform the approximation step due to the fact that is not known. However, it is still possible to do this, simply because computed this way is the same as computed via the sketch and project formulation and since does not appear there.

5. Random Update

The update can also be written explicitly as
where by we denote the Moore-Penrose pseudo-inverse of matrix. Hence, the method can be written in the form, where is a random update vector.
Letting it can be shown that the system always has a solution, and that for all such solutions the vector is the same. Hence, it does not matter which of these solutions is chosen, and the method can be also written as. The pseudo-inverse leads just to one particular solution. The role of the pseudo-inverse is twofold:

It allows the method to be written in the explicit "random update" form as above,
It makes the analysis simple through the final, sixth, formulation.
6. Random Fixed Point

If we subtract from both sides of the random update formula, denote
and use the fact that we arrive at the last formulation:
where is the identity matrix. The iteration matrix, is random, whence the name of this formulation.

Convergence

By taking conditional expectations in the 6th formulation, we obtain
By taking expectation again, and using the tower property of expectations, we obtain
Gower and Richtarik show that
where the matrix norm is defined by
Moreover, without any assumptions on one has By taking norms and unrolling the recurrence, we obtain

Theorem Gower & Richtarik 2015

Remark. A sufficient condition for the expected residuals to converge to 0 is This can be achieved if has a full column rank and under very mild conditions on Convergence of the method can be established also without the full column rank assumption in a different way.
It is also possible to show a stronger result:

Theorem Gower & Richtarik 2015

The expected squared norms converge at the same rate:
Remark. This second type of convergence is stronger due to the following identity which holds for any random vector and any fixed vector :

Convergence of Randomized Kaczmarz

We have seen that the randomized Kaczmarz method appears as a special case of the Gower-Richtarik method for and being the unit coordinate vector with probability where is the row of It can be checked by direct calculation that

Further Special Cases

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Kaczmarz method

Algorithm 1: Kaczmarz algorithm

Algorithm 2: Randomized Kaczmarz algorithm

Proof

Algorithm 3: Gower-Richtarik algorithm

Insights about Randomized Kaczmarz

Six Equivalent Formulations

1. Sketch and Project

2. Constrain and Approximate

5. Random Update

6. Random Fixed Point

Convergence

Theorem Gower & Richtarik 2015

Theorem Gower & Richtarik 2015

Convergence of Randomized Kaczmarz

Further Special Cases