The restricted Boltzmann machine (RBM) is a flexible tool for modeling
complex data, however there have been significant computational difficulties in
using RBMs to model high-dimensional multinomial observations. In natural
language processing applications, words are naturally modeled by K-ary discrete
distributions, where K is determined by the vocabulary size and can easily be
in the hundred thousands.
Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic
models that have recently been applied to a wide range of problems, including
collaborative filtering, classification, and modeling motion capture data.
While much progress has been made in training non-conditional RBMs, these
algorithms are not applicable to conditional models and there has been almost
no work on training and generating predictions from conditional RBMs for
structured output problems. We first argue that standard Contrastive
Divergence-based learning may not be suitable for training CRBMs.
We discuss an attentional model for simultaneous object tracking and
recognition that is driven by gaze data. Motivated by theories of perception,
the model consists of two interacting pathways: identity and control, intended
to mirror the what and where pathways in neuroscience models. The identity
pathway models object appearance and performs classification using deep
(factored)-Restricted Boltzmann Machines. At each point in time the
observations consist of foveated images, with decaying resolution toward the
periphery of the gaze.
We consider the problem of training probabilistic conditional random fields
(CRFs) in the context of a task where performance is measured using a specific
loss function. While maximum likelihood is the most common approach to training
CRFs, it ignores the inherent structure of the task's loss function.
We consider the problem of classification when inputs correspond to sets of
vectors. This setting occurs in many problems such as the classification of
pieces of mail containing several pages, of web sites with several sections or
of images that have been pre-segmented into smaller regions. We propose
generalizations of the restricted Boltzmann machine (RBM) that are appropriate
in this context and explore how to incorporate different assumptions about the
relationship between the input sets and the target class within the RBM.
Unsupervised discovery of latent representations, in addition to being useful
for density modeling, visualisation and exploratory data analysis, is also
increasingly important for learning features relevant to discriminative tasks.
Autoencoders, in particular, have proven to be an effective way to learn latent
codes that reflect meaningful variations in data. A continuing challenge,
however, is guiding an autoencoder toward representations that are useful for
particular discriminative tasks.