Metrics specifying distances between data points can be learned in a
discriminative manner or from generative models. In this paper, we show how to
unify generative and discriminative learning of metrics via a kernel learning
framework. Specifically, we learn local metrics optimized from parametric
generative models. These are then used as base kernels to construct a global
kernel that minimizes a discriminative training criterion. We consider both
linear and nonlinear combinations of local metric kernels.
We investigate unsupervised pre-training of deep architectures as feature
generators for "shallow" classifiers. Stacked Denoising Autoencoders (SdA),
when used as feature pre-processing tools for SVM classification, can lead to
significant improvements in accuracy - however, at the price of a substantial
increase in computational cost. In this paper we create a simple algorithm
which mimics the layer by layer training of SdAs. However, in contrast to SdAs,
our algorithm requires no training through gradient descent as the parameters
can be computed in closed-form.