Transportation distances have been used for more than a decade now in machine
learning to compare histograms of features. They have one parameter: the ground
metric, which can be any metric between the features themselves. As is the case
for all parameterized distances, transportation distances can only prove useful
in practice when this parameter is carefully chosen. To date, the only option
available to practitioners to set the ground metric parameter was to rely on a
priori knowledge of the features, which limited considerably the scope of
application of transportation distances.
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions $\{k(x,\cdot),x\in\mathcal{X}\}$ associated with a kernel $k$ defined
on a space $\mathcal{X}$. We discuss at length the construction of kernel
functions that take advantage of well-known statistical models.
We present in this work a new family of kernels to compare positive measures
on arbitrary spaces $\Xcal$ endowed with a positive kernel $\kappa$, which
translates naturally into kernels between histograms or clouds of points. We
first cover the case where $\Xcal$ is Euclidian, and focus on kernels which
take into account the variance matrix of the mixture of two measures to compute
their similarity.