Simpler near-optimal controllers through direct supervision.

link: http://arxiv.org/abs/0908.2859
Abstract

The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a
powerful way of creating near-optimal controllers by learning. It is based on
the fact that if we have a feedback controller, and we learn to compute the
gradient grad-J of its cost-to-go function, then we can use that gradient to
define a better controller. We can then use the new controller's grad-J to
define a still-better controller, and so on. Here I point out that GHJB works
indirectly in the sense that it doesn't learn the best approximation to grad-J
but instead learns the time derivative dJ/dt, and infers grad-J from that. I
show that we can get simpler and lower-cost controllers by learning grad-J
directly. To do this, we need teaching signals that report grad-J(x) for a
varied set of states x. I show how to obtain these signals, using the GHJB
equation to calculate one component of grad-J(x) -- the one parallel with dx/dt
-- and computing all the other components by backward-in-time integration,
using a formula similar to the Euler-Lagrange equation. I then compare this
direct algorithm with GHJB on 2 test problems.