Michael I. Jordan

  1. Coherence Functions with Applications in Large-Margin Classification Methods.

    Authors: Michael I. Jordan, Zhihua Zhang, Guang Dai
    Subjects: Machine Learning
    Abstract

    Support vector machines (SVMs) naturally embody sparseness due to their use
    of hinge loss functions. However, SVMs can not directly estimate conditional
    class probabilities. In this paper we propose and study a family of coherence
    functions, which are convex and differentiable, as surrogates of the hinge
    function. The coherence function is derived by using the maximum-entropy
    principle and is characterized by a temperature parameter. It bridges the hinge
    function and the logit function in logistic regression.

  2. The Asymptotics of Ranking Algorithms.

    Authors: Michael I. Jordan, John C. Duchi, Lester Mackey
    Subjects: Statistics
    Abstract

    We consider the predictive problem of supervised ranking, where the task is
    to rank sets of candidate items returned in response to queries. Although there
    exist statistical procedures that come with guarantees of consistency in this
    setting, these procedures require that individuals provide a complete ranking
    of all items, which is rarely feasible in practice. Instead, individuals
    routinely provide partial preference information, such as pairwise comparisons
    of items, and more practical approaches to ranking have aimed at modeling this
    partial preference data directly.

  3. Modeling Events with Cascades of Poisson Processes.

    Authors: Michael I. Jordan, Aleksandr Simma
    Subjects: Artificial Intelligence
    Abstract

    We present a probabilistic model of events in continuous time in which each
    event triggers a Poisson process of successor events. The ensemble of observed
    events is thereby modeled as a superposition of Poisson processes. Efficient
    inference is feasible under this model with an EM algorithm. Moreover, the EM
    algorithm can be implemented as a distributed algorithm, permitting the model
    to be applied to very large datasets. We apply these techniques to the modeling
    of Twitter messages and the revision history of Wikipedia.

  4. Joint Modeling of Multiple Related Time Series via the Beta Process.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    We propose a Bayesian nonparametric approach to the problem of jointly
    modeling multiple related time series. Our approach is based on the discovery
    of a set of latent, shared dynamical behaviors. Using a beta process prior, the
    size of the set and the sharing pattern are both inferred from data. We develop
    efficient Markov chain Monte Carlo methods based on the Indian buffet process
    representation of the predictive distribution of the beta process, without
    relying on a truncated model.

  5. Combinatorial clustering and the beta negative binomial process.

    Authors: Michael I. Jordan, Tamara Broderick, Lester Mackey, John Paisley
    Subjects: Methodology
    Abstract

    In this work, we establish novel connections between the Bayesian
    nonparametric clustering and featural paradigms by considering the problem of
    admixture modeling. We examine the Dirichlet process-and its unnormalized
    Poisson point process generation via the gamma process-on the traditional
    clustering side of Bayesian nonparametrics. On the featural side, we examine
    the beta process and introduce a new model, the beta negative binomial process
    (BNBP), for admixture modeling.

  6. Revisiting k-means: New Algorithms via Bayesian Nonparametrics.

    Authors: Michael I. Jordan, Brian Kulis
    Subjects: Learning
    Abstract

    One of the many benefits of Bayesian nonparametric processes such as the
    Dirichlet process is that they can be used for modeling infinite mixture
    models, thus providing a flexible answer to the question of how many clusters
    exist in a data set. For the most part, such flexibility is currently lacking
    in techniques based on hard clustering, such as k-means, graph cuts, and
    Bregman hard clustering. For finite mixture models, there is a precise
    connection between k-means and mixtures of Gaussians, obtained by an
    appropriate limiting argument.

  7. Divide-and-Conquer Matrix Factorization.

    Authors: Michael I. Jordan, Ameet Talwalkar, Lester Mackey
    Subjects: Learning
    Abstract

    This work introduces SubMF, a parallel divide-and-conquer framework for noisy
    matrix factorization. SubMF divides a large-scale matrix factorization task
    into smaller subproblems, solves each subproblem in parallel using an arbitrary
    base matrix factorization algorithm, and combines the subproblem solutions
    using techniques from randomized matrix approximation. Our experiments with
    collaborative filtering, video background modeling, and simulated data
    demonstrate the near-linear to super-linear speed-ups attainable with this
    approach.

  8. Beta processes, stick-breaking, and power laws.

    Authors: Michael I. Jordan, Jim Pitman, Tamara Broderick
    Subjects: Methodology
    Abstract

    The beta-Bernoulli process provides a Bayesian nonparametric prior for models
    involving collections of binary-valued features. A draw from the beta process
    provides an infinite collection of probabilities in the unit interval, and a
    draw from the Bernoulli process turns these into binary-valued features. Recent
    work has shown how to derive stick-breaking representations for the beta
    process, by analogy to Sethuraman's derivation of a stick-breaking
    representation for the Dirichlet process.

  9. Cluster Forests.

    Authors: Michael I. Jordan, Donghui Yan, Aiyou Chen
    Subjects: Methodology
    Abstract

    Inspired by Random Forests (RF) in the context of classification, we propose
    a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF
    randomly probes a high-dimensional data cloud to obtain "good local
    clusterings" and then aggregates via spectral clustering to obtain cluster
    assignments for the whole dataset. The search for good local clusterings is
    guided by a cluster quality measure $\kappa$. CF progressively improves each
    local clustering in a fashion that resembles the tree growth in RF.

  10. Multiway Spectral Clustering: A Margin-Based Perspective.

    Authors: Michael I. Jordan, Zhihua Zhang
    Subjects: Methodology
    Abstract

    Spectral clustering is a broad class of clustering procedures in which an
    intractable combinatorial optimization formulation of clustering is "relaxed"
    into a tractable eigenvector problem, and in which the relaxed solution is
    subsequently "rounded" into an approximate discrete solution to the original
    problem. In this paper we present a novel margin-based perspective on multiway
    spectral clustering.

  11. Leo Breiman.

    Authors: Michael I. Jordan
    Subjects: Applications
    Abstract

    Statistics is a uniquely difficult field to convey to the uninitiated. It
    sits astride the abstract and the concrete, the theoretical and the applied. It
    has a mathematical flavor and yet it is not simply a branch of mathematics. Its
    core problems blend into those of the disciplines that probe into the nature of
    intelligence and thought, in particular philosophy, psychology and artificial
    intelligence. Debates over foundational issues have waxed and waned, but the
    field has not yet arrived at a single foundational perspective.

  12. A Sticky HDP-HMM with Application to Speaker Diarization.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    We consider the problem of speaker diarization, the problem of segmenting an
    audio recording of a meeting into temporal segments corresponding to individual
    speakers. The problem is rendered particularly difficult by the fact that we
    are not allowed to assume knowledge of the number of people participating in
    the meeting. To address this problem, we take a Bayesian nonparametric approach
    to speaker diarization that builds on the hierarchical Dirichlet process hidden
    Markov model (HDP-HMM) of Teh et al. (2006).

  13. Heavy-Tailed Processes for Selective Shrinkage.

    Authors: Michael I. Jordan, Fabian L. Wauthier
    Subjects: Machine Learning
    Abstract

    Heavy-tailed distributions are frequently used to enhance the robustness of
    regression and classification methods to outliers in output space. Often,
    however, we are confronted with ``outliers'' in input space, which are isolated
    observations in sparsely populated regions. We show that heavy-tailed
    stochastic processes (which we construct from Gaussian processes via a copula),
    can be used to improve robustness of regression and classification estimators
    to such outliers by selectively shrinking them more strongly in sparse regions
    than in dense regions.

  14. Tree-Structured Stick Breaking Processes for Hierarchical Data.

    Authors: Michael I. Jordan, Ryan Prescott Adams, Zoubin Ghahramani
    Subjects: Methodology
    Abstract

    Many data are naturally modeled by an unobserved hierarchical structure. In
    this paper we propose a flexible nonparametric prior over unknown data
    hierarchies. The approach uses nested stick-breaking processes to allow for
    trees of unbounded width and depth, where data can live at any node and are
    infinitely exchangeable. One can view our model as providing infinite mixtures
    where the components have a dependency structure corresponding to an
    evolutionary diffusion down a tree.

  15. Bayesian Nonparametric Inference of Switching Linear Dynamical Systems.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    Many complex dynamical phenomena can be effectively modeled by a system that
    switches among a set of conditionally linear dynamical modes. We consider two
    such models: the switching linear dynamical system (SLDS) and the switching
    vector autoregressive (VAR) process. Our Bayesian nonparametric approach
    utilizes a hierarchical Dirichlet process prior to learn an unknown number of
    persistent, smooth dynamical modes.

  16. Bayesian Inference in Queueing Networks.

    Authors: Michael I. Jordan, Charles Sutton
    Subjects: Machine Learning
    Abstract

    Modern Web services, such as those at Google, Yahoo!, and Amazon, handle
    billions of requests per day on clusters of thousands of computers. Because
    these services operate under strict performance requirements, a statistical
    understanding of their performance is of great practical interest. Such
    services are modeled by networks of queues, where one queue models each of the
    individual computers in the system. A key challenge is that the data is
    incomplete, because recording detailed information about every request to a
    heavily used system can require unacceptable overhead.

  17. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies.

    Authors: Michael I. Jordan, David M. Blei, Thomas L. Griffiths
    Subjects: Machine Learning
    Abstract

    We present the nested Chinese restaurant process (nCRP), a stochastic process
    which assigns probability distributions to infinitely-deep,
    infinitely-branching trees. We show how this stochastic process can be used as
    a prior distribution in a Bayesian nonparametric model of document collections.
    Specifically, we present an application to information retrieval in which
    documents are modeled as paths down a random tree, and the preferential
    attachment dynamics of the nCRP leads to clustering of documents according to
    sharing of topics at multiple levels of abstraction.

  18. Kernel dimension reduction in regression.

    Authors: Kenji Fukumizu, Francis R. Bach, Michael I. Jordan
    Subjects: Statistics
    Abstract

    We present a new methodology for sufficient dimension reduction (SDR). Our
    methodology derives directly from the formulation of SDR in terms of the
    conditional independence of the covariate $X$ from the response $Y$, given the
    projection of $X$ on the central subspace [cf. J. Amer. Statist. Assoc. 86
    (1991) 316--342 and Regression Graphics (1998) Wiley].

Syndicate content