Simultaneous perturbation stochastic approximation

Simultaneous perturbation stochastic approximation is an algorithmic method for optimizing systems with multiple unknown parameters. It is a type of stochastic approximation algorithm. As an optimization method, it is appropriately suited to large-scale population models, adaptive modeling, simulation optimization, and atmospheric modeling. Many examples are presented at the SPSA website http://www.jhuapl.edu/SPSA. A comprehensive recent book on the subject is Bhatnagar et al.. An early paper on the subject is Spall and the foundational paper providing the key theory and justification is Spall.
SPSA is a descent method capable of finding global minima, sharing this property with other methods as simulated annealing. Its main feature is the gradient approximation that requires only two measurements of the objective function, regardless of the dimension of the optimization problem. Recall that we want to find the optimal control with loss
function :
Both Finite Differences Stochastic Approximation
and SPSA use the same iterative process:
where
represents the iterate, is the estimate of the gradient of the objective function evaluated at, and is a positive number sequence converging to 0. If is a p-dimensional vector, the component of the symmetric finite difference gradient estimator is:
1 ≤i ≤p, where is the unit vector with a 1 in the
place, and is a small positive number that decreases with n. With this method, 2p evaluations of J for each are needed. Clearly, when p is large, this estimator loses efficiency.
Let now be a random perturbation vector. The component of the stochastic perturbation gradient estimator is:
Remark that FD perturbs only one direction at a time, while the SP estimator disturbs all directions at the same time. The number of loss function measurements needed in the SPSA method for each is always 2, independent of the dimension p. Thus, SPSA uses p times fewer function evaluations than FDSA, which makes it a lot more efficient.
Simple experiments with p=2 showed that SPSA converges in the same number of iterations as FDSA. The latter follows approximately the steepest descent direction, behaving like the gradient method. On the other hand, SPSA, with the random search direction, does not follow exactly the gradient path. In average though, it tracks it nearly because the gradient approximation is an almost unbiased
estimator of the gradient, as shown in the following lemma.

Convergence lemma

Denote by
the bias in the estimator. Assume that are all mutually independent with zero-mean, bounded second
moments, and uniformly bounded. Then →0 w.p. 1.

Sketch of the proof

The main idea is to use conditioning on to express and then to use a second order Taylor expansion of and. After algebraic manipulations using the zero mean and the independence of, we get
The result follows from the hypothesis that →0.
Next we resume some of the hypotheses under which converges in probability to the set of global minima of. The efficiency of
the method depends on the shape of, the values of the parameters and and the distribution of the perturbation terms. First, the algorithm parameters must satisfy the
following conditions:

>0, →0 when n→∝ and. A good choice would be a>0;
, where c>0, ;
must be mutually independent zero-mean random variables, symmetrically distributed about zero, with. The inverse first and second moments of the must be finite.

A good choice for is the Rademacher distribution, i.e. Bernoulli +-1 with probability 0.5. Other choices are possible too, but note that the uniform and normal distributions cannot be used because they do not satisfy the finite inverse moment conditions.
The loss function J must be thrice continuously differentiable and the individual elements of the third derivative must be bounded:. Also, as.
In addition, must be Lipschitz continuous, bounded and the ODE must have a unique solution for each initial condition.
Under these conditions and a few others, converges in probability to the set of global minima of J.

Extension to Second-Order (Newton) Methods

It is known that a stochastic version of the standard Newton-Raphson algorithm provides an asymptotically optimal or near-optimal form of stochastic approximation. SPSA can also be used to efficiently estimate the Hessian matrix of the loss function based on either noisy loss measurements or noisy gradient measurements. As with the basic SPSA method, only a small fixed number of loss measurements or gradient measurements are needed at each iteration, regardless of the problem dimension p. See the brief discussion in Stochastic gradient descent.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...