Steffen Grünewälder

  1. The Optimal Unbiased Value Estimator and its Relation to LSTD, TD and MC.

    Authors: Steffen Grünewälder, Klaus Obermayer
    Subjects: Machine Learning
    Abstract

    In this analytical study we derive the optimal unbiased value estimator (MVU)
    and compare its statistical risk to three well known value estimators: Temporal
    Difference learning (TD), Monte Carlo estimation (MC) and Least-Squares
    Temporal Difference Learning (LSTD). We demonstrate that LSTD is equivalent to
    the MVU if the Markov Reward Process (MRP) is acyclic and show that both differ
    for most cyclic MRPs as LSTD is then typically biased. More generally, we show
    that estimators that fulfill the Bellman equation can only be unbiased for
    special cyclic MRPs.

RSS-материал