Stochastic dynamic programming

Originally introduced by Richard E. Bellman in, stochastic dynamic programming is a technique for modelling and solving problems of decision making under uncertainty. Closely related to stochastic programming and dynamic programming, stochastic dynamic programming represents the problem under scrutiny in the form of a Bellman equation. The aim is to compute a policy prescribing how to act optimally in the face of uncertainty.

A motivating example: Gambling game

A gambler has $2, she is allowed to play a game of chance 4 times and her goal is to maximize her probability of ending up with a least $6. If the gambler bets $ on a play of the game, then with probability 0.4 she wins the game, recoup the initial bet, and she increases her capital position by $; with probability 0.6, she loses the bet amount $; all plays are pairwise independent. On any play of the game, the gambler may not bet more money than she has available at the beginning of that play.
Stochastic dynamic programming can be employed to model this problem and determine a betting strategy that, for instance, maximizes the gambler's probability of attaining a wealth of at least $6 by the end of the betting horizon.
Note that if there is no limit to the number of games that can be played, the problem becomes a variant of the well known St. Petersburg paradox.

Formal background

Consider a discrete system defined on stages in which each stage is characterized by

an initial state, where is the set of feasible states at the beginning of stage ;
a decision variable, where is the set of feasible actions at stage – note that may be a function of the initial state ;
an immediate cost/reward function, representing the cost/reward at stage if is the initial state and the action selected;
a state transition function that leads the system towards state.

Let represent the optimal cost/reward obtained by following an optimal policy over stages. Without loss of generality in what follow we will consider a reward maximisation setting. In deterministic dynamic programming one usually deals with functional equations taking the following structure
where and the boundary condition of the system is
The aim is to determine the set of optimal actions that maximise. Given the current state and the current action, we know with certainty the reward secured during the current stage and – thanks to the state transition function – the future state towards which the system transitions.
In practice, however, even if we know the state of the system at the beginning of the current stage as well as the decision taken, the state of the system at the beginning of the next stage and the current period reward are often random variables that can be observed only at the end of the current stage.
Stochastic dynamic programming deals with problems in which the current period reward and/or the next period state are random, i.e. with multi-stage stochastic systems. The decision maker's goal is to maximise expected reward over a given planning horizon.
In their most general form, stochastic dynamic programs deal with functional equations taking the following structure
where

is the maximum expected reward that can be attained during stages, given state at the beginning of stage ;
belongs to the set of feasible actions at stage given initial state ;
is the discount factor;
is the conditional probability that the state at the beginning of stage is given current state and selected action.

Markov decision process represent a special class of stochastic dynamic programs in which the underlying stochastic process is a stationary process that features the Markov property.

Gambling game as a stochastic dynamic program

Gambling game can be formulated as a Stochastic Dynamic Program as follows: there are games in the planning horizon

the state in period represents the initial wealth at the beginning of period ;
the action given state in period is the bet amount ;
the transition probability from state to state when action is taken in state is easily derived from the probability of winning or losing a game.

Let be the probability that, by the end of game 4, the gambler has at least $6, given that she has $ at the beginning of game.

the immediate profit incurred if action is taken in state is given by the expected value.

To derive the functional equation, define as a bet that attains, then at the beginning of game

if it is impossible to attain the goal, i.e. for ;
if the goal is attained, i.e. for ;
if the gambler should bet enough to attain the goal, i.e. for.

For the functional equation is, where ranges in ; the aim is to find.
Given the functional equation, an optimal betting policy can be obtained via forward recursion or backward recursion algorithms, as outlined below.

Solution methods

Stochastic dynamic programs can be solved to optimality by using backward recursion or forward recursion algorithms. Memoization is typically employed to enhance performance. However, like deterministic dynamic programming also its stochastic variant suffers from the curse of dimensionality. For this reason approximate solution methods are typically employed in practical applications.

Backward recursion

Given a bounded state space, backward recursion begins by tabulating for every possible state belonging to the final stage. Once these values are tabulated, together with the associated optimal state-dependent actions, it is possible to move to stage and tabulate for all possible states belonging to the stage. The process continues by considering in a backward fashion all remaining stages up to the first one. Once this tabulation process is complete, – the value of an optimal policy given initial state – as well as the associated optimal action can be easily retrieved from the table. Since the computation proceeds in a backward fashion, it is clear that backward recursion may lead to computation of a large number of states that are not necessary for the computation of.

Example: Gambling game

Forward recursion

Given the initial state of the system at the beginning of period 1, forward recursion computes by progressively expanding the functional equation. This involves recursive calls for all that are necessary for computing a given. The value of an optimal policy and its structure are then retrieved via a in which these suspended recursive calls are resolved. A key difference from backward recursion is the fact that is computed only for states that are relevant for the computation of. Memoization is employed to avoid recomputation of states that have been already considered.

Example: Gambling game

We shall illustrate forward recursion in the context of the Gambling game instance previously discussed. We begin the forward pass by considering
At this point we have not computed yet, which are needed to compute ; we proceed and compute these items. Note that, therefore one can leverage memoization and perform the necessary computations only once.
;Computation of
We have now computed for all that are needed to compute. However, this has led to additional suspended recursions involving. We proceed and compute these values.
;Computation of
Since stage 4 is the last stage in our system, represent boundary conditions that are easily computed as follows.
;Boundary conditions
At this point it is possible to proceed and recover the optimal policy and its value via a backward pass involving, at first, stage 3
;Backward pass involving
and, then, stage 2.
;Backward pass involving
We finally recover the value of an optimal policy
This is the optimal policy that has been previously illustrated. Note that there are multiple optimal policies leading to the same optimal value ; for instance, in the first game one may either bet $1 or $2.
is a standalone Java 8 implementation of the above example.

Approximate dynamic programming

An introduction to approximate dynamic programming is provided by.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...