Most conventional Reinforcement Learning (RL) algorithms aim to optimize
decision- making rules in terms of the expected re- turns. However, especially
for risk man- agement purposes, other risk-sensitive crite- ria such as the
value-at-risk or the expected shortfall are sometimes preferred in real ap-
plications. Here, we describe a parametric method for estimating density of the
returns, which allows us to handle various criteria in a unified manner. We
first extend the Bellman equation for the conditional expected return to cover
a conditional probability density of the returns.
Information-maximization clustering learns a probabilistic classifier in an
unsupervised manner so that mutual information between feature vectors and
cluster assignments is maximized. A notable advantage of this approach is that
it only involves continuous optimization of model parameters, which is
substantially easier to solve than discrete optimization of cluster
assignments. However, existing methods still involve non-convex optimization
problems, and therefore finding a good local optimal solution is not
straightforward in practice.
Divergence estimators based on direct approximation of density-ratios without
going through separate approximation of numerator and denominator densities
have been successfully applied to machine learning tasks that involve
distribution comparison such as outlier detection, transfer learning, and
two-sample homogeneity test. However, since density-ratio functions often
possess high fluctuation, divergence estimation is still a challenging task in
practice.