This paper introduces a principled approach for the design of a scalable
general reinforcement learning agent. This approach is based on a direct
approximation of AIXI, a Bayesian optimality notion for general reinforcement
learning agents. Previously, it has been unclear whether the theory of AIXI
could motivate the design of practical algorithms. We answer this hitherto open
question in the affirmative, by providing the first computationally feasible
approximation to the AIXI agent.
This paper describes a computationally feasible approximation to the AIXI
agent, a universal reinforcement learning agent for arbitrary environments.
AIXI is scaled down in two key ways: First, the class of environment models is
restricted to all prediction suffix trees of a fixed maximum depth. This allows
a Bayesian mixture of environment models to be computed in time proportional to
the logarithm of the size of the model class. Secondly, the finite-horizon
expectimax search is approximated by an asymptotically convergent Monte Carlo
Tree Search technique.