We propose a novel reformulation of the stochastic optimal control problem as
an approximate inference problem, demonstrating, that such a interpretation
leads to new practical methods for the original problem. In particular we
characterise a novel class of iterative solutions to the stochastic optimal
control problem based on a natural relaxation of the exact dual formulation.
These theoretical insights are applied to the Reinforcement Learning problem
where they lead to new model free, off policy methods for discrete and
continuous problems.