Consider the following problem in deterministic optimal control over the time period : where is the scalar cost rate function and is a function that gives the bequest value at the final state, is the system state vector, is assumed given, and for is the control vector that we are trying to find. The system must also be subject to where gives the vector determining physical evolution of the state vector over time.
For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is subject to the terminal condition where denotes the partial derivative of with respect to the time variable. Here denotes the dot product of the vectors and and the gradient of with respect to the variables. The unknown scalar in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state at time and controlling the system optimally from then until time.
Deriving the equation
Intuitively, the HJB equation can be derived as follows. If is the optimal cost-to-go function, then by Richard Bellman's principle of optimality, going from time t to t + dt, we have Note that the Taylor expansion of the first term on the right-hand side is where denotes the terms in the Taylor expansion of higher order than one in little-o notation. Then if we subtract from both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.
Solving the equation
The HJB equation is usually solved backwards in time, starting from and ending at. When solved over the whole of state space and is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained. If we can solve for then we can find from it a control that achieves the minimum cost. In general case, the HJB equation does not have a classical solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution, minimax solution, and others.
The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above now with the stochastic process to optimize and the steering. By first using Bellman and then expanding with Itô's rule, one finds the stochastic HJB equation where represents the stochastic differentiation operator, and subject to the terminal condition Note that the randomness has disappeared. In this case a solution of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market.
Application to LQG Control
As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by and the cost accumulates at rate, the HJB equation is given by with optimal action given by Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.