Support vector machines (SVMs) naturally embody sparseness due to their use
of hinge loss functions. However, SVMs can not directly estimate conditional
class probabilities. In this paper we propose and study a family of coherence
functions, which are convex and differentiable, as surrogates of the hinge
function. The coherence function is derived by using the maximum-entropy
principle and is characterized by a temperature parameter. It bridges the hinge
function and the logit function in logistic regression.
We consider the predictive problem of supervised ranking, where the task is
to rank sets of candidate items returned in response to queries. Although there
exist statistical procedures that come with guarantees of consistency in this
setting, these procedures require that individuals provide a complete ranking
of all items, which is rarely feasible in practice. Instead, individuals
routinely provide partial preference information, such as pairwise comparisons
of items, and more practical approaches to ranking have aimed at modeling this
partial preference data directly.
We present a probabilistic model of events in continuous time in which each
event triggers a Poisson process of successor events. The ensemble of observed
events is thereby modeled as a superposition of Poisson processes. Efficient
inference is feasible under this model with an EM algorithm. Moreover, the EM
algorithm can be implemented as a distributed algorithm, permitting the model
to be applied to very large datasets. We apply these techniques to the modeling
of Twitter messages and the revision history of Wikipedia.
We propose a Bayesian nonparametric approach to the problem of jointly
modeling multiple related time series. Our approach is based on the discovery
of a set of latent, shared dynamical behaviors. Using a beta process prior, the
size of the set and the sharing pattern are both inferred from data. We develop
efficient Markov chain Monte Carlo methods based on the Indian buffet process
representation of the predictive distribution of the beta process, without
relying on a truncated model.
In this work, we establish novel connections between the Bayesian
nonparametric clustering and featural paradigms by considering the problem of
admixture modeling. We examine the Dirichlet process-and its unnormalized
Poisson point process generation via the gamma process-on the traditional
clustering side of Bayesian nonparametrics. On the featural side, we examine
the beta process and introduce a new model, the beta negative binomial process
(BNBP), for admixture modeling.
One of the many benefits of Bayesian nonparametric processes such as the
Dirichlet process is that they can be used for modeling infinite mixture
models, thus providing a flexible answer to the question of how many clusters
exist in a data set. For the most part, such flexibility is currently lacking
in techniques based on hard clustering, such as k-means, graph cuts, and
Bregman hard clustering. For finite mixture models, there is a precise
connection between k-means and mixtures of Gaussians, obtained by an
appropriate limiting argument.
This work introduces SubMF, a parallel divide-and-conquer framework for noisy
matrix factorization. SubMF divides a large-scale matrix factorization task
into smaller subproblems, solves each subproblem in parallel using an arbitrary
base matrix factorization algorithm, and combines the subproblem solutions
using techniques from randomized matrix approximation. Our experiments with
collaborative filtering, video background modeling, and simulated data
demonstrate the near-linear to super-linear speed-ups attainable with this
approach.
The beta-Bernoulli process provides a Bayesian nonparametric prior for models
involving collections of binary-valued features. A draw from the beta process
provides an infinite collection of probabilities in the unit interval, and a
draw from the Bernoulli process turns these into binary-valued features. Recent
work has shown how to derive stick-breaking representations for the beta
process, by analogy to Sethuraman's derivation of a stick-breaking
representation for the Dirichlet process.
Inspired by Random Forests (RF) in the context of classification, we propose
a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF
randomly probes a high-dimensional data cloud to obtain "good local
clusterings" and then aggregates via spectral clustering to obtain cluster
assignments for the whole dataset. The search for good local clusterings is
guided by a cluster quality measure $\kappa$. CF progressively improves each
local clustering in a fashion that resembles the tree growth in RF.
Spectral clustering is a broad class of clustering procedures in which an
intractable combinatorial optimization formulation of clustering is "relaxed"
into a tractable eigenvector problem, and in which the relaxed solution is
subsequently "rounded" into an approximate discrete solution to the original
problem. In this paper we present a novel margin-based perspective on multiway
spectral clustering.
Statistics is a uniquely difficult field to convey to the uninitiated. It
sits astride the abstract and the concrete, the theoretical and the applied. It
has a mathematical flavor and yet it is not simply a branch of mathematics. Its
core problems blend into those of the disciplines that probe into the nature of
intelligence and thought, in particular philosophy, psychology and artificial
intelligence. Debates over foundational issues have waxed and waned, but the
field has not yet arrived at a single foundational perspective.
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. (2006).
Heavy-tailed distributions are frequently used to enhance the robustness of
regression and classification methods to outliers in output space. Often,
however, we are confronted with ``outliers'' in input space, which are isolated
observations in sparsely populated regions. We show that heavy-tailed
stochastic processes (which we construct from Gaussian processes via a copula),
can be used to improve robustness of regression and classification estimators
to such outliers by selectively shrinking them more strongly in sparse regions
than in dense regions.
Many data are naturally modeled by an unobserved hierarchical structure. In
this paper we propose a flexible nonparametric prior over unknown data
hierarchies. The approach uses nested stick-breaking processes to allow for
trees of unbounded width and depth, where data can live at any node and are
infinitely exchangeable. One can view our model as providing infinite mixtures
where the components have a dependency structure corresponding to an
evolutionary diffusion down a tree.
Many complex dynamical phenomena can be effectively modeled by a system that
switches among a set of conditionally linear dynamical modes. We consider two
such models: the switching linear dynamical system (SLDS) and the switching
vector autoregressive (VAR) process. Our Bayesian nonparametric approach
utilizes a hierarchical Dirichlet process prior to learn an unknown number of
persistent, smooth dynamical modes.
Modern Web services, such as those at Google, Yahoo!, and Amazon, handle
billions of requests per day on clusters of thousands of computers. Because
these services operate under strict performance requirements, a statistical
understanding of their performance is of great practical interest. Such
services are modeled by networks of queues, where one queue models each of the
individual computers in the system. A key challenge is that the data is
incomplete, because recording detailed information about every request to a
heavily used system can require unacceptable overhead.
We present the nested Chinese restaurant process (nCRP), a stochastic process
which assigns probability distributions to infinitely-deep,
infinitely-branching trees. We show how this stochastic process can be used as
a prior distribution in a Bayesian nonparametric model of document collections.
Specifically, we present an application to information retrieval in which
documents are modeled as paths down a random tree, and the preferential
attachment dynamics of the nCRP leads to clustering of documents according to
sharing of topics at multiple levels of abstraction.
We present a new methodology for sufficient dimension reduction (SDR). Our
methodology derives directly from the formulation of SDR in terms of the
conditional independence of the covariate $X$ from the response $Y$, given the
projection of $X$ on the central subspace [cf. J. Amer. Statist. Assoc. 86
(1991) 316--342 and Regression Graphics (1998) Wiley].