Guy Lebanon

  1. A Comparative Study of Collaborative Filtering Algorithms.

    Authors: Guy Lebanon, Joonseok Lee, Mingxuan Sun
    Subjects: Information Retrieval
    Abstract

    Collaborative filtering is a rapidly advancing research area. Every year
    several new techniques are proposed and yet it is not clear which of the
    techniques work best and under what conditions. In this paper we conduct a
    study comparing several collaborative filtering techniques -- both classic and
    recent state-of-the-art -- in a variety of experimental contexts. Specifically,
    we report conclusions controlling for number of items, number of users,
    sparsity level, performance criteria, and computational complexity.

  2. Domain Knowledge Uncertainty and Probabilistic Parameter Constraints.

    Authors: Guy Lebanon, Yi Mao
    Subjects: Learning
    Abstract

    Incorporating domain knowledge into the modeling process is an effective way
    to improve learning accuracy. However, as it is provided by humans, domain
    knowledge can only be specified with some degree of uncertainty. We propose to
    explicitly model such uncertainty through probabilistic constraints over the
    parameter space. In contrast to hard parameter constraints, our approach is
    effective also when the domain knowledge is inaccurate and generally results in
    superior modeling accuracy.

  3. Beyond Sentiment: The Manifold of Human Emotions.

    Authors: Guy Lebanon, Seungyeon Kim, Fuxin Li, Irfan Essa
    Subjects: Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
    Abstract

    Sentiment analysis predicts the presence of positive or negative emotions in
    a text document. In this paper we consider higher dimensional extensions of the
    sentiment concept, which represent a richer set of human emotions. Our approach
    goes beyond previous work in that our model contains a continuous manifold
    rather than a finite set of human emotions. We investigate the resulting model,
    compare it to psychological observations, and explore its predictive
    capabilities.

  4. Statistical and Computational Tradeoffs in Stochastic Composite Likelihood.

    Authors: Joshua V Dillon, Guy Lebanon
    Subjects: Learning
    Abstract

    Maximum likelihood estimators are often of limited practical use due to the
    intensive computation they require. We propose a family of alternative
    estimators that maximize a stochastic variation of the composite likelihood
    function. Each of the estimators resolve the computation-accuracy tradeoff
    differently, and taken together they span a continuous spectrum of
    computation-accuracy tradeoff resolutions. We prove the consistency of the
    estimators, provide formulas for their asymptotic variance, statistical
    robustness, and computational complexity.

  5. Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels.

    Authors: Krishnakumar Balasubramanian, Guy Lebanon, Pinar Donmez
    Subjects: Learning
    Abstract

    Many popular linear classifiers, such as logistic regression, boosting, or
    SVM, are trained by optimizing a margin-based risk function. Traditionally,
    these risk functions are computed based on a labeled dataset. We develop a
    novel technique for estimating such risks using only unlabeled data and p(y).
    We prove that the technique is consistent for high-dimensional linear
    classifiers and demonstrate it on synthetic and real-world data.

  6. Linguistic Geometries for Unsupervised Dimensionality Reduction.

    Authors: Krishnakumar Balasubramanian, Guy Lebanon, Yi Mao
    Subjects: Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
    Abstract

    Text documents are complex high dimensional objects. To effectively visualize
    such data it is important to reduce its dimensionality and visualize the low
    dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
    dimensionality reduction methods that draw upon domain knowledge in order to
    achieve a better low dimensional embedding and visualization of documents. We
    consider the use of geometries specified manually by an expert, geometries
    derived automatically from corpus statistics, and geometries computed from
    linguistic resources.

  7. Asymptotic Analysis of Generative Semi-Supervised Learning.

    Authors: Joshua V Dillon, Krishnakumar Balasubramanian, Guy Lebanon
    Subjects: Learning
    Abstract

    Semisupervised learning has emerged as a popular framework for improving
    modeling accuracy while controlling labeling cost. Based on an extension of
    stochastic composite likelihood we quantify the asymptotic accuracy of
    generative semi-supervised learning. In doing so, we complement
    distribution-free analysis by providing an alternative framework to measure the
    value associated with different labeling policies and resolve the fundamental
    question of how much data to label and in what manner.

RSS-материал