The hierarchical Dirichlet process (HDP) has become an important Bayesian
nonparametric model for grouped data, such as document collections. The HDP is
used to construct a flexible mixed-membership model where the number of
components is determined by the data. As for most Bayesian nonparametric
models, exact posterior inference is intractable---practitioners use Markov
chain Monte Carlo (MCMC) or variational inference.
Latent feature models are widely used to decompose data into a small number
of components. Bayesian nonparametric variants of these models, which use the
Indian buffet process (IBP) as a prior over latent features, allow the number
of features to be determined from the data. We present a generalization of the
IBP, the distance dependent Indian buffet process (dd-IBP), for modeling
non-exchangeable data. It relies on a distance function defined between data
points, biasing nearby data to share more features.
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.
In this paper we study convex stochastic search problems where a noisy
objective function value is observed after a decision is made. There are many
stochastic search problems whose behavior depends on an exogenous state
variable which affects the shape of the objective function. Currently, there is
no general purpose algorithm to solve this class of problems. We use
nonparametric density estimation to take observations from the joint
state-outcome distribution and use them to infer the optimal decision for a
given query state.
We introduce supervised latent Dirichlet allocation (sLDA), a statistical
model of labelled documents. The model accommodates a variety of response
types. We derive an approximate maximum-likelihood procedure for parameter
estimation, which relies on variational methods to handle intractable posterior
expectations. Prediction problems motivate this research: we use the fitted
model to predict response values for new documents. We test sLDA on two
real-world problems: movie ratings predicted from reviews, and the political
tone of amendments in the U.S. Senate based on the amendment text.
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency.
We develop the distance dependent Chinese restaurant process (CRP), a
flexible class of distributions over partitions that allows for
non-exchangeability. This class can be used to model many kinds of dependencies
between data in infinite clustering models, including dependencies across time
or space. We examine the properties of the distance dependent CRP, discuss its
connections to Bayesian nonparametric mixture models, and derive a Gibbs
sampler for both observed and mixture settings. We study its performance with
three text corpora.
We propose Dirichlet Process-Generalized Linear Models (DP-GLM), a new method
of nonparametric regression that accommodates continuous and categorical
inputs, and any response that can be modeled by a generalized linear model. We
prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean
function estimate and give a practical example for when those conditions hold.
Additionally, we provide Bayesian bounds on the distance of the estimate from
the true mean function based on the number of observations and posterior
samples.
We develop the relational topic model (RTM), a hierarchical model of both
network structure and node attributes. We focus on document networks, where the
attributes of each document are its words, i.e., discrete observations taken
from a fixed vocabulary. For each pair of documents, the RTM models their link
as a binary random variable that is conditioned on their contents. The model
can be used to summarize a network of documents, predict links between them,
and predict words within them.
We present the nested Chinese restaurant process (nCRP), a stochastic process
which assigns probability distributions to infinitely-deep,
infinitely-branching trees. We show how this stochastic process can be used as
a prior distribution in a Bayesian nonparametric model of document collections.
Specifically, we present an application to information retrieval in which
documents are modeled as paths down a random tree, and the preferential
attachment dynamics of the nCRP leads to clustering of documents according to
sharing of topics at multiple levels of abstraction.