We propose a versatile and fast stochastic approximation algorithm
(KL-learning) which solves the Kullback-Leibler control problem. The stochastic
orbits of the algorithm are asymptotically related to the orbits of a certain
nonlinear ODE, whose equilibrium corresponds to the solution of the KL control
problem. We can therefore perform a detailed theoretical analysis of the
stochastic algorithm (involving the theory of M-matrices and P-matrices). The
algorithm has numerically similar behaviour as Z-learning, which may be seen as
a special case of KL-learning.