The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a
powerful way of creating near-optimal controllers by learning. It is based on
the fact that if we have a feedback controller, and we learn to compute the
gradient grad-J of its cost-to-go function, then we can use that gradient to
define a better controller. We can then use the new controller's grad-J to
define a still-better controller, and so on.