This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distributions as mean embeddings in the
reproducing kernel Hilbert space (RKHS), we are able to apply many standard
kernel-based learning techniques in straightforward fashion.
We consider the problem of function estimation in the case where the data
distribution may shift between training and test time, and additional
information about it may be available at test time. This relates to popular
scenarios such as covariate shift, concept drift, transfer learning and
semi-supervised learning. This working paper discusses how these tasks could be
tackled depending on the kind of changes of the distributions. It argues that
knowledge of an underlying causal direction can facilitate several of these
tasks.
Inferring the causal structure of a set of random variables from a finite
sample of the joint distribution is an important problem in science. Recently,
methods using additive noise models have been suggested to approach the case of
continuous variables. In many situations, however, the variables of interest
are discrete or even have only finitely many states. In this work we extend the
notion of additive noise models to these cases.
A class of distance measures on probabilities -- the integral probability
metrics (IPMs) -- is addressed: these include the Wasserstein distance, Dudley
metric, and Maximum Mean Discrepancy. IPMs have thus far mostly been used in
more abstract settings, for instance as theoretical tools in mass
transportation problems, and in metrizing the weak topology on the set of all
Borel probability measures defined on a metric space.