Learning

  1. On the Performance of Maximum Likelihood Inverse Reinforcement Learning.

    Authors: Héctor Ratia, Luis Montesano, Ruben Martinez-Cantin
    Subjects: Learning
    Abstract

    Inverse reinforcement learning (IRL) addresses the problem of recovering a
    task description given a demonstration of the optimal policy used to solve such
    a task. The optimal policy is usually provided by an expert or teacher, making
    IRL specially suitable for the problem of apprenticeship learning. The task
    description is encoded in the form of a reward function of a Markov decision
    process (MDP). Several algorithms have been proposed to find the reward
    function corresponding to a set of demonstrations.

  2. Information Forests.

    Authors: Zhao Yi, Stefano Soatto, Maneesh Dewan, Yiqiang Zhan
    Subjects: Learning
    Abstract

    We describe Information Forests, an approach to classification that
    generalizes Random Forests by replacing the splitting criterion of non-leaf
    nodes from a discriminative one -- based on the entropy of the label
    distribution -- to a generative one -- based on maximizing the information
    divergence between the class-conditional distributions in the resulting
    partitions. The basic idea consists of deferring classification until a measure
    of "classification confidence" is sufficiently high, and instead breaking down
    the data so as to maximize this measure.

  3. Random ferns method implementation for the general-purpose machine learning.

    Authors: Miron B. Kursa
    Subjects: Learning
    Abstract

    In this paper I present an extended implementation of the Random ferns
    algorithm contained in the R package rFerns.

    It differs from the original by the ability of consuming categorical and
    numerical attributes instead of only binary ones. Also, instead of using simple
    attribute subspace ensemble it employs bagging and thus produce error
    approximation and variable importance measure modelled after Random forest
    algorithm.

  4. Cramer Rao-Type Bounds for Sparse Bayesian Learning.

    Authors: Chandra R. Murthy, Ranjitha Prasad
    Subjects: Learning
    Abstract

    In this paper, we derive Hybrid, Bayesian and Marginalized Cramer Rao Lower
    Bounds (HCRB, BCRB and MCRB) for the single measurement vector and multiple
    measurement vector Sparse Bayesian Learning (SBL) problem of estimating
    compressible vectors and their prior distribution parameters. We assume the
    unknown vector to be drawn from a compressible Student-t prior distribution. We
    derive CRBs that encompass the deterministic or random nature of the unknown
    parameters of the prior distribution and the regression noise variance.

  5. A Reconstruction Error Formulation for Semi-Supervised Multi-task and Multi-view Learning.

    Authors: Buyue Qian, Xiang Wang, Ian Davidson
    Subjects: Learning
    Abstract

    A significant challenge to make learning techniques more suitable for general
    purpose use is to move beyond i) complete supervision, ii) low dimensional
    data, iii) a single task and single view per instance. Solving these challenges
    allows working with "Big Data" problems that are typically high dimensional
    with multiple (but possibly incomplete) labelings and views.

  6. Support Distribution Machines.

    Authors: Barnabás Póczos, Liang Xiong, Dougal J. Sutherland, Jeff Schneider
    Subjects: Learning
    Abstract

    Most machine learning algorithms, such as classification or regression, treat
    the individual data point as the object of interest. Here we consider extending
    machine learning algorithms to operate on groups of data points. We suggest
    treating a group of data points as a set of i.i.d. samples from an underlying
    feature distribution for the group. Our approach is to generalize kernel
    machines from vectorial inputs to i.i.d. sample sets of vectors.

  7. Active Learning of Custering with Side Information Using $\eps$-Smooth Relative Regret Approximations.

    Authors: Nir Ailon, Ron Begleiter
    Subjects: Learning
    Abstract

    Clustering is considered a non-supervised learning setting, in which the goal
    is to partition a collection of data points into disjoint clusters. Often a
    bound $k$ on the number of clusters is given or assumed by the practitioner.
    Many versions of this problem have been defined, most notably $k$-means and
    $k$-median.

  8. Joint Approximation of Information and Distributed Link-Scheduling Decisions in Wireless Networks.

    Authors: Sung-eok Jeon, Chuanyi Ji
    Subjects: Learning
    Abstract

    For a large multi-hop wireless network, nodes are preferable to make
    distributed and localized link-scheduling decisions with only interactions
    among a small number of neighbors. However, for a slowly decaying channel and
    densely populated interferers, a small size neighborhood often results in
    nontrivial link outages and is thus insufficient for making optimal scheduling
    decisions. A question arises how to deal with the information outside a
    neighborhood in distributed link-scheduling. In this work, we develop joint
    approximation of information and distributed link scheduling.

  9. Stochastic Low-Rank Kernel Learning for Regression.

    Authors: Liva Ralaivola, Pierre Machart, Thomas Peel, Sandrine Anthoine, Hervé Glotin
    Subjects: Learning
    Abstract

    We present a novel approach to learn a kernel-based regression function. It
    is based on the useof conical combinations of data-based parameterized kernels
    and on a new stochastic convex optimization procedure of which we establish
    convergence guarantees. The overall learning procedure has the nice properties
    that a) the learned conical combination is automatically designed to perform
    the regression task at hand and b) the updates implicated by the optimization
    procedure are quite inexpensive.

  10. Sparse Reward Processes.

    Authors: Christos Dimitrakakis
    Subjects: Learning
    Abstract

    We introduce a class of learning problems where the agent is presented with a
    series of tasks. Intuitively, if there is relation among those tasks, then the
    information gained during execution of one task has value for the execution of
    another task. Consequently, the agent is intrinsically motivated to explore its
    environment beyond the degree necessary to solve the current task it has at
    hand. We develop a decision theoretic setting that generalises standard
    reinforcement learning tasks and captures this intuition.

  11. Automatic Detection of Diabetes Diagnosis using Feature Weighted Support Vector Machines based on Mutual Information and Modified Cuckoo Search.

    Authors: Davar Giveki, Hamid Salimi, GholamReza Bahmanyar, Younes Khademian
    Subjects: Learning
    Abstract

    Diabetes is a major health problem in both developing and developed countries
    and its incidence is rising dramatically. In this study, we investigate a novel
    automatic approach to diagnose Diabetes disease based on Feature Weighted
    Support Vector Machines (FW-SVMs) and Modified Cuckoo Search (MCS). The
    proposed model consists of three stages: Firstly, PCA is applied to select an
    optimal subset of features out of set of all the features. Secondly, Mutual
    Information is employed to construct the FWSVM by weighting different features
    based on their degree of importance.

  12. Feature Selection via Regularized Trees.

    Authors: Houtao Deng, George Runger
    Subjects: Learning
    Abstract

    Feature selection methods such as wrappers may produce high-quality feature
    subsets, however they often require building multiple models and are
    time-consuming. Motivated by a fresh perspective on the relationship between an
    information-based forward feature selection criterion and decision trees, in
    this paper we propose a tree regularization framework, in which only a single
    model needs to be built. The framework enables many tree models to perform
    feature selection. The key idea of the regularization framework is to penalize
    using a new feature for splitting when its gain (e.g.

  13. A Thermodynamical Approach for Probability Estimation.

    Authors: Takashi Isozaki
    Subjects: Learning
    Abstract

    The issue of discrete probability estimation for samples of small size is
    addressed in this study. The maximum likelihood method often suffers
    over-fitting when insufficient data is available. Although the Bayesian
    approach can avoid over-fitting by using prior distributions, it still has
    problems with objective analysis. In response to these drawbacks, a new
    theoretical framework based on thermodynamics, where energy and temperature are
    introduced, was developed. Entropy and likelihood are placed at the center of
    this method.

  14. TMBP: A Topic Modeling Toolbox Using Belief Propagation.

    Authors: Jia Zeng
    Subjects: Learning
    Abstract

    Latent Dirichlet allocation (LDA) is an important class of hierarchical
    Bayesian models for probabilistic topic modeling, which attracts worldwide
    interests and touches on many important applications in text mining, computer
    vision and computational biology. This paper introduces a topic modeling
    toolbox (TMBP) based on the belief propagation (BP) algorithms. This toolbox is
    implemented by MEX C++/MATLAB platform for either Windows or Linux.

  15. Combining One-Class Classifiers via Meta-Learning.

    Authors: Eitan Menahem Lior Rokach, Yuval Elovici
    Subjects: Learning
    Abstract

    We examine various methods for combining the output of one-class models. In
    particular, we show that simple meta-learning based ensemble achieves better
    result than weighting methods. Furthermore we propose a new one-class ensemble
    scheme, called TUPSO that uses meta-learning for combining multiple one-class
    classifiers. We also present a new one-class classification performance
    measures to weigh the base-classifiers, a process that proved helpful for
    increasing the classification performance of the induced ensemble.

  16. An Optimal Algorithm for Linear Bandits.

    Authors: Sham Kakade, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We provide the first algorithm for online bandit linear optimization whose
    regret after T rounds is of order sqrt{Td ln N} on any finite class X of N
    actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is
    infinite. These bounds are not improvable in general. The basic idea utilizes
    tools from convex geometry to construct what is essentially an optimal
    exploration basis. We also present an application to a model of linear bandits
    with expert advice.

  17. Alignment Based Kernel Learning with a Continuous Set of Base Kernels.

    Authors: Csaba Szepesvari, Arash Afkanpour, Michael Bowling
    Subjects: Learning
    Abstract

    The success of kernel-based learning methods depend on the choice of kernel.
    Recently, kernel learning methods have been proposed that use data to select
    the most appropriate kernel, usually by combining a set of base kernels. We
    introduce a new algorithm for kernel learning that combines a {\em continuous
    set of base kernels}, without the common step of discretizing the space of base
    kernels. We demonstrate that our new method achieves state-of-the-art
    performance across a variety of real-world datasets.

  18. Bipartite ranking algorithm for classification and survival analysis.

    Authors: Marina Sapir
    Subjects: Learning
    Abstract

    Unsupervised aggregation of independently built univariate predictors is
    explored as an alternative regularization approach for noisy, sparse datasets.
    Bipartite ranking algorithm Smooth Rank implementing this approach is
    introduced. The advantages of this algorithm are demonstrated on two types of
    problems. First, Smooth Rank is applied to two-class problems from bio-medical
    field, where ranking is often preferable to classification. In comparison
    against SVMs with radial and linear kernels, Smooth Rank had the best
    performance on 8 out of 12 benchmark benchmarks.

  19. Extended UCB Policy for Multi-Armed Bandit with Light-Tailed Reward Distributions.

    Authors: Qing Zhao, Keqin Liu
    Subjects: Learning
    Abstract

    We consider the multi-armed bandit problems in which a player aims to accrue
    reward by sequentially playing a given set of arms with unknown reward
    statistics. In the classic work, policies were proposed to achieve the optimal
    logarithmic regret order for some special classes of light-tailed reward
    distributions, e.g., Auer et al.'s UCB1 index policy for reward distributions
    with finite support.

  20. Learning a Factor Model via Regularized PCA.

    Authors: Yi-hao Kao, Benjamin Van Roy
    Subjects: Learning
    Abstract

    We consider the problem of learning a linear factor model with an unknown
    number of factors. We propose a regularized form of principal component
    analysis (PCA) and demonstrate through experiments with synthetic and real data
    the superiority of resulting estimates to those produced by pre-existing factor
    analysis approaches. We also establish theoretical results that elucidate the
    manner in which our algorithm corrects biases induced by conventional PCA.

  21. Optimization with Sparsity-Inducing Penalties.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    Sparse estimation methods are aimed at using or obtaining parsimonious
    representations of data or models. They were first dedicated to linear variable
    selection but numerous extensions have now emerged such as structured sparsity
    or kernel selection. It turns out that many of the related estimation problems
    can be cast as convex optimization problems by regularizing the empirical risk
    with appropriate non-smooth norms. The goal of this paper is to present from a
    general perspective optimization tools and techniques dedicated to such
    sparsity-inducing penalties.

  22. Analysis of Thompson Sampling for the multi-armed bandit problem.

    Authors: Shipra Agrawal, Navin Goyal
    Subjects: Learning
    Abstract

    The multi-armed bandit problem is a popular model for studying
    exploration/exploitation trade-off in sequential decision problems. Many
    algorithms are now available for this well-studied problem. One of the earliest
    algorithms, given by W. R. Thompson, dates back to 1933. This algorithm,
    referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic
    idea is to choose an arm to play according to its probability of being the best
    arm. Thompson Sampling algorithm has experimentally been shown to be close to
    optimal.

  23. Nonparametric Bayesian Estimation of Periodic Functions.

    Authors: Yuyang Wang, Roni Khardon, Pavlos Protopapas
    Subjects: Learning
    Abstract

    Many real world problems exhibit patterns that have periodic behavior. For
    example, in astrophysics, periodic variable stars play a pivotal role in
    understanding our universe. An important step when analyzing data from such
    processes is the problem of identifying the period: estimating the period of a
    periodic function based on noisy observations made at irregularly spaced time
    points. This problem is still a difficult challenge despite extensive study in
    different disciplines. The paper makes several contributions toward solving
    this problem.

  24. Online Learning with Preference Feedback.

    Authors: Thorsten Joachims, Pannagadatta K. Shivaswamy
    Subjects: Learning
    Abstract

    We propose a new online learning model for learning with preference feedback.
    The model is especially suited for applications like web search and recommender
    systems, where preference data is readily available from implicit user feedback
    (e.g. clicks). In particular, at each time step a potentially structured object
    (e.g. a ranking) is presented to the user in response to a context (e.g.
    query), providing him or her with some unobserved amount of utility. As
    feedback the algorithm receives an improved object that would have provided
    higher utility.

  25. Revisiting k-means: New Algorithms via Bayesian Nonparametrics.

    Authors: Michael I. Jordan, Brian Kulis
    Subjects: Learning
    Abstract

    One of the many benefits of Bayesian nonparametric processes such as the
    Dirichlet process is that they can be used for modeling infinite mixture
    models, thus providing a flexible answer to the question of how many clusters
    exist in a data set. For the most part, such flexibility is currently lacking
    in techniques based on hard clustering, such as k-means, graph cuts, and
    Bregman hard clustering. For finite mixture models, there is a precise
    connection between k-means and mixtures of Gaussians, obtained by an
    appropriate limiting argument.

  26. Kernel Topic Models.

    Authors: David Stern, Philipp Hennig, Ralf Herbrich, Thore Graepel
    Subjects: Learning
    Abstract

    Latent Dirichlet Allocation models discrete data as a mixture of discrete
    distributions, using Dirichlet beliefs over the mixture weights. We study a
    variation of this concept, in which the documents' mixture weight beliefs are
    replaced with squashed Gaussian distributions. This allows documents to be
    associated with elements of a Hilbert space, admitting kernel topic models
    (KTM), modelling temporal, spatial, hierarchical, social and other structure
    between documents. The main challenge is efficient approximate inference on the
    latent Gaussian.

  27. Data-dependent kernels in nearly-linear time.

    Authors: John Shawe-Taylor, Tom Diethe, Guy Lever
    Subjects: Learning
    Abstract

    We propose a method to efficiently construct data-dependent kernels which can
    make use of large quantities of (unlabeled) data. Our construction makes an
    approximation in the standard construction of semi-supervised kernels in
    Sindhwani et al. 2005. In typical cases these kernels can be computed in
    nearly-linear time (in the amount of data), improving on the cubic time of the
    standard construction, enabling large scale semi-supervised learning in a
    variety of contexts.

  28. Multi-criteria Anomaly Detection using Pareto Depth Analysis.

    Authors: Alfred O. Hero III, Kevin S. Xu, Ko-Jen Hsiao
    Subjects: Learning
    Abstract

    We consider the problem of identifying patterns in a data set that exhibit
    anomalous behavior, often referred to as anomaly detection. In most anomaly
    detection algorithms, the dissimilarity between data samples is calculated by a
    single criterion. When dissimilarities are calculated by multiple criteria, one
    might perform anomaly detection using a linear combination of the multiple
    dissimilarities.

  29. Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems.

    Authors: Devavrat Shah, Sewoong Oh, David R. Karger
    Subjects: Learning
    Abstract

    Crowdsourcing systems, in which numerous tasks are electronically distributed
    to numerous "information piece-workers", have emerged as an effective paradigm
    for human-powered solving of large scale problems in domains such as image
    classification, data entry, optical character recognition, recommendation, and
    proofreading. Because these low-paid workers can be unreliable, nearly all
    crowdsourcers must devise schemes to increase confidence in their answers,
    typically by assigning each task multiple times and combining the answers in
    some way such as majority voting.

  30. A Study of Unsupervised Adaptive Crowdsourcing.

    Authors: G. Kesidis, A. Kurve
    Subjects: Learning
    Abstract

    We consider unsupervised crowdsourcing performance based on the model wherein
    the responses of end-users are essentially rated according to how their
    responses correlate with the majority of other responses to the same
    subtasks/questions. In one setting, we consider an independent sequence of
    identically distributed crowdsourcing assignments (meta-tasks), while in the
    other we consider a single assignment with a large number of component
    subtasks. Both problems yield intuitive results in which the overall
    reliability of the crowd is a factor.

  31. Learning image transformations without training examples.

    Authors: Sergey Pankov
    Subjects: Learning
    Abstract

    The use of image transformations is essential for efficient modeling and
    learning of visual data. But the class of relevant transformations is large:
    affine transformations, projective transformations, elastic deformations, ...
    the list goes on. Therefore, learning these transformations, rather than hand
    coding them, is of great conceptual interest. To the best of our knowledge, all
    the related work so far has been concerned with either supervised or weakly
    supervised learning (from correlated sequences, video streams, or
    image-transform pairs).

  32. Domain Adaptation for Statistical Classifiers.

    Authors: H. Daume III, D. Marcu
    Subjects: Learning
    Abstract

    The most basic assumption used in statistical learning theory is that
    training data and test data are drawn from the same underlying distribution.
    Unfortunately, in many applications, the "in-domain" test data is drawn from a
    distribution that is related, but not identical, to the "out-of-domain"
    distribution of the training data. We consider the common case in which labeled
    out-of-domain data is plentiful, but labeled in-domain data is scarce.

  33. Learning Item Trees for Probabilistic Modelling of Implicit Feedback.

    Authors: Yee Whye Teh, Andriy Mnih
    Subjects: Learning
    Abstract

    User preferences for items can be inferred from either explicit feedback,
    such as item ratings, or implicit feedback, such as rental histories. Research
    in collaborative filtering has concentrated on explicit feedback, resulting in
    the development of accurate and scalable models. However, since explicit
    feedback is often difficult to collect it is important to develop effective
    models that take advantage of the more widely available implicit feedback.

  34. Noise Tolerance under Risk Minimization.

    Authors: P. S. Sastry, Naresh Manwani
    Subjects: Learning
    Abstract

    In this paper we explore the problem of noise tolerant learning of
    classifiers. We formulate the problem as follows. We assume that there is an
    ${\bf unobservable}$ training set which is noise-free. The actual training set
    given to the learning algorithm is obtained from this ideal data set by
    corrupting the class label of each example where the probability that the class
    label on an example is corrupted is a function of the feature vector of the
    example. This would account for almost all kinds of noisy data one may
    encounter in practice.

  35. Bias Plus Variance Decomposition for Survival Analysis Problems.

    Authors: Marina Sapir
    Subjects: Learning
    Abstract

    Bias - variance decomposition of the expected error defined for regression
    and classification problems is an important tool to study and compare different
    algorithms, to find the best areas for their application. Here the
    decomposition is introduced for the survival analysis problem. In our
    experiments, we study bias -variance parts of the expected error for two
    algorithms: original Cox proportional hazard regression and CoxPath, path
    algorithm for L1-regularized Cox regression, on the series of increased
    training sets.

  36. Active Ranking using Pairwise Comparisons.

    Authors: Robert D. Nowak, Kevin G. Jamieson
    Subjects: Learning
    Abstract

    This paper examines the problem of ranking a collection of objects using
    pairwise comparisons (rankings of two objects). In general, the ranking of $n$
    objects can be identified by standard sorting methods using $n log_2 n$
    pairwise comparisons. We are interested in natural situations in which
    relationships among the objects may allow for ranking using far fewer pairwise
    comparisons. Specifically, we assume that the objects can be embedded into a
    $d$-dimensional Euclidean space and that the rankings reflect their relative
    distances from a common reference point in $R^d$.

  37. Learning Discriminative Metrics via Generative Models and Kernel Learning.

    Authors: Fei Sha, Yuan Shi, Yung-Kyun Noh, Daniel D. Lee
    Subjects: Learning
    Abstract

    Metrics specifying distances between data points can be learned in a
    discriminative manner or from generative models. In this paper, we show how to
    unify generative and discriminative learning of metrics via a kernel learning
    framework. Specifically, we learn local metrics optimized from parametric
    generative models. These are then used as base kernels to construct a global
    kernel that minimizes a discriminative training criterion. We consider both
    linear and nonlinear combinations of local metric kernels.

  38. Differentially Private Online Learning.

    Authors: Prateek Jain, Pravesh Kothari, Abhradeep Thakurta
    Subjects: Learning
    Abstract

    In this paper, we consider the problem of preserving privacy in the online
    learning setting. We study the problem in the online convex programming (OCP)
    framework---a popular online learning setting with several interesting
    theoretical and practical implications---while using differential privacy as
    the formal privacy measure.

  39. Reconstruction of sequential data with density models.

    Authors: Miguel Á. Carreira-Perpiñán
    Subjects: Learning
    Abstract

    We introduce the problem of reconstructing a sequence of multidimensional
    real vectors where some of the data are missing. This problem contains
    regression and mapping inversion as particular cases where the pattern of
    missing data is independent of the sequence index. The problem is hard because
    it involves possibly multivalued mappings at each vector in the sequence, where
    the missing variables can take more than one value given the present variables;
    and the set of missing variables can vary from one vector to the next.

  40. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization.

    Authors: Francis Bach, Mark Schmidt, Nicolas Le Roux
    Subjects: Learning
    Abstract

    We consider the problem of optimizing the sum of a smooth convex function and
    a non-smooth convex function using proximal-gradient methods, where an error is
    present in the calculation of the gradient of the smooth term or in the
    proximity operator with respect to the non-smooth term.

  41. Structured sparsity through convex optimization.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    Sparse estimation methods are aimed at using or obtaining parsimonious
    representations of data or models. While naturally cast as a combinatorial
    optimization problem, variable or feature selection admits a convex relaxation
    through the regularization by the $\ell_1$-norm. In this paper, we consider
    situations where we are not only interested in sparsity, but where some
    structural prior knowledge is available as well.

  42. MIS-Boost: Multiple Instance Selection Boosting.

    Authors: Emre Akbas, Bernard Ghanem, Narendra Ahuja
    Subjects: Learning
    Abstract

    In this paper, we present a new multiple instance learning (MIL) method,
    called MIS-Boost, which learns discriminative instance prototypes by explicit
    instance selection in a boosting framework. Unlike previous instance selection
    based MIL methods, we do not restrict the prototypes to a discrete set of
    training instances but allow them to take arbitrary values in the instance
    feature space. We also do not restrict the total number of prototypes and the
    number of selected-instances per bag; these quantities are completely
    data-driven.

  43. Bandits with an Edge.

    Authors: Shie Mannor, Dotan Di Castro, Claudio Gentile
    Subjects: Learning
    Abstract

    We consider a bandit problem over a graph where the rewards are not directly
    observed. Instead, the decision maker can compare two nodes and receive
    (stochastic) information pertaining to the difference in their value. The graph
    structure describes the set of possible comparisons. Consequently, comparing
    between two nodes that are relatively far requires estimating the difference
    between every pair of nodes on the path between them.

  44. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints.

    Authors: P. Geibel, F. Wysotzki
    Subjects: Learning
    Abstract

    In this paper, we consider Markov Decision Processes (MDPs) with error
    states. Error states are those states entering which is undesirable or
    dangerous. We define the risk with respect to a policy as the probability of
    entering such a state when the policy is pursued. We consider the problem of
    finding good policies whose risk is smaller than some user-specified threshold,
    and formalize it as a constrained MDP with two criteria. The first criterion
    corresponds to the value function originally given.

  45. Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms.

    Authors: R. Khardon, D. Roth, R. A. Servedio
    Subjects: Learning
    Abstract

    The paper studies machine learning problems where each example is described
    using a set of Boolean features and where hypotheses are represented by linear
    threshold elements. One method of increasing the expressiveness of learned
    hypotheses in this context is to expand the feature set to include conjunctions
    of basic features. This can be done explicitly or where possible by using a
    kernel function.

  46. Weighted Clustering.

    Authors: Margareta Ackerman, Shai Ben-David, Simina Branzei, David Loker
    Subjects: Learning
    Abstract

    In this paper we investigate clustering in the weighted setting, in which
    every data point is assigned a real valued weight. We conduct a theoretical
    analysis on the influence of weighted data on standard clustering algorithms in
    each of the partitional and hierarchical settings, characterising the precise
    conditions under which such algorithms react to weights, and classifying
    clustering methods into three broad categories: weight-responsive,
    weight-considering, and weight-robust.

  47. Efficient P2P Ensemble Learning with Linear Models on Fully Distributed Data.

    Authors: Róbert Ormándi, István Hegedűs, Márk Jelasity
    Subjects: Learning
    Abstract

    Machine learning over fully distributed data poses an important problem in
    peer-to-peer (P2P) applications. In this model we have one data record at each
    network node, but without the possibility to move raw data due to privacy
    considerations. For example, user profiles, ratings, history, or sensor
    readings can represent this case. This problem is difficult, because there is
    no possibility to learn local models, yet the communication cost needs to be
    kept low.

  48. Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation.

    Authors: Georgios B. Giannakis, Vassilis Kekatos
    Subjects: Learning
    Abstract

    Volterra and polynomial regression models play a major role in nonlinear
    system identification and inference tasks. Exciting applications ranging from
    neuroscience to genome-wide association analysis build on these models with the
    additional requirement of parsimony. This requirement has high interpretative
    value, but unfortunately cannot be met by least-squares based or kernel
    regression methods. To this end, compressed sampling (CS) approaches, already
    successful in linear regression settings, can offer a viable alternative.

  49. Learning nonsingular phylogenies and hidden Markov models.

    Authors: Elchanan Mossel, Sébastien Roch
    Subjects: Learning
    Abstract

    In this paper we study the problem of learning phylogenies and hidden Markov
    models. We call a Markov model nonsingular if all transition matrices have
    determinants bounded away from 0 (and 1). We highlight the role of the
    nonsingularity condition for the learning problem. Learning hidden Markov
    models without the nonsingularity condition is at least as hard as learning
    parity with noise, a well-known learning problem conjectured to be
    computationally hard. On the other hand, we give a polynomial-time algorithm
    for learning nonsingular phylogenies and hidden Markov models.

  50. ShareBoost: Efficient Multiclass Learning with Feature Sharing.

    Authors: Shai Shalev-Shwartz, Amnon Shashua, Yonatan Wexler
    Subjects: Learning
    Abstract

    Multiclass prediction is the problem of classifying an object into a relevant
    target class. We consider the problem of learning a multiclass predictor that
    uses only few features, and in particular, the number of used features should
    increase sub-linearly with the number of possible classes. This implies that
    features should be shared by several classes. We describe and analyze the
    ShareBoost algorithm for learning a multiclass predictor that uses few shared
    features.

  51. No Internal Regret via Neighborhood Watch.

    Authors: Alexander Rakhlin, Dean Foster
    Subjects: Learning
    Abstract

    We present an algorithm which attains O(\sqrt{T}) internal (and thus
    external) regret for finite games with partial monitoring under the local
    observability condition. Recently, this condition has been shown by (Bartok,
    Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring
    games against an i.i.d. opponent, and the authors conjectured that the same
    holds for non-stochastic adversaries.

  52. Structured Sparsity and Generalization.

    Authors: Massimiliano Pontil, Andreas Maurer
    Subjects: Learning
    Abstract

    We present a data dependent generalization bound for a large class of
    regularized algorithms which implement structured sparsity constraints. The
    bound can be applied to standard squared-norm regularization, the lasso, the
    group lasso, some versions of the group lasso with overlapping groups, multiple
    kernel learning and other regularization schemes. In all these cases
    competitive results are obtained.

  53. A Machine Learning Perspective on Predictive Coding with PAQ.

    Authors: Nando de Freitas, Byron Knoll
    Subjects: Learning
    Abstract

    PAQ8 is an open source lossless data compression algorithm that currently
    achieves the best compression rates on many benchmarks. This report presents a
    detailed description of PAQ8 from a statistical machine learning perspective.
    It shows that it is possible to understand some of the modules of PAQ8 and use
    this understanding to improve the method. However, intuitive statistical
    explanations of the behavior of other modules remain elusive.

  54. Stability Conditions for Online Learnability.

    Authors: Stephane Ross, J. Andrew Bagnell
    Subjects: Learning
    Abstract

    Stability is a general notion that quantifies the sensitivity of a learning
    algorithm's output to small change in the training dataset (e.g. deletion or
    replacement of a single training sample). Such conditions have recently been
    shown to be more powerful to characterize learnability in the general learning
    setting under i.i.d. samples where uniform convergence is not necessary for
    learnability, but where stability is both sufficient and necessary for
    learnability. We here show that similar stability conditions are also
    sufficient for online learnability, i.e.

  55. New Risk Modeling Method for Robust Learning on Smaller Samples.

    Authors: Marina Sapir
    Subjects: Learning
    Abstract

    Prognosis of disease progression is necessary for development of
    individualized treatment, understanding of the disease. Risk modeling is a
    challenging problem, and too often amount of available relevant observations is
    not sufficient to build a quality model with traditional approaches. New method
    Smooth Rank for survival analysis, risk modeling is introduced here. Smooth
    Rank is robust against overfitting on relatively small samples. The method is
    compared with established risk modeling methods on 10 real life datasets.

  56. Uncertain Nearest Neighbor Classification.

    Authors: Fabrizio Angiulli, Fabio Fassetti
    Subjects: Learning
    Abstract

    This work deals with the problem of classifying uncertain data. With this aim
    the Uncertain Nearest Neighbor (UNN) rule is here introduced, which represents
    the generalization of the deterministic nearest neighbor rule to the case in
    which uncertain objects are available. The UNN rule relies on the concept of
    nearest neighbor class, rather than on that of nearest neighbor object. The
    nearest neighbor class of a test object is the class that maximizes the
    probability of providing its nearest neighbor.

  57. Generating a Diverse Set of High-Quality Clusterings.

    Authors: Jeff M. Phillips, Suresh Venkatasubramanian, Parasaran Raman
    Subjects: Learning
    Abstract

    We provide a new framework for generating multiple good quality partitions
    (clusterings) of a single data set. Our approach decomposes this problem into
    two components, generating many high-quality partitions, and then grouping
    these partitions to obtain k representatives. The decomposition makes the
    approach extremely modular and allows us to optimize various criteria that
    control the choice of representative partitions.

  58. The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation.

    Authors: Michael Fairbank, Eduardo Alonso
    Subjects: Learning
    Abstract

    This paper gives specific divergence examples of value-iteration for several
    major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when
    using a function approximator for the value function. These divergence examples
    differ from previous divergence examples in the literature, in that they are
    applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps
    surprisingly, with a greedy policy, it is also possible to get divergence for
    the algorithms TD(1) and Sarsa(1).

  59. Adaptive Content Search Through Comparisons.

    Authors: Amin Karbasi, Laurent Massoulie, Stratis Ioannidis
    Subjects: Learning
    Abstract

    We study the problem of navigating through a database of similar objects
    using comparisons. This problem is known to be strongly related to the
    small-world network design problem. However, contrary to prior work, which
    focuses on cases where objects in the database are equally popular, we consider
    here the case where the demand for objects may be heterogeneous.

  60. Polyceptron: A Polyhedral Learning Algorithm.

    Authors: P. S. Sastry, Naresh Manwani
    Subjects: Learning
    Abstract

    In this paper we propose a new algorithm for learning polyhedral classifiers
    which we call as Polyceptron. It is a Perception like algorithm which updates
    the parameters only when the current classifier misclassifies any training
    data. We give both batch and online version of Polyceptron algorithm. Finally
    we give experimental results to show the effectiveness of our approach.

  61. Spectral Methods for Learning Multivariate Latent Tree Structure.

    Authors: Tong Zhang, Animashree Anandkumar, Sham M. Kakade, Kamalika Chaudhuri, Daniel Hsu, Le Song
    Subjects: Learning
    Abstract

    This work considers the problem of learning the structure of a broad class of
    multivariate latent variable tree models, which include a variety of continuous
    and discrete models (including the widely used linear-Gaussian models, hidden
    Markov models, and Markov evolutionary trees). The setting is one where we only
    have samples from certain observed variables in the tree and our goal is to
    estimate the tree structure (i.e., the graph of how the underlying hidden
    variables are connected to the observed variables).

  62. High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families.

    Authors: Animashree Anandkumar, Vincent Y.F. Tan, Alan. S. Willsky
    Subjects: Learning
    Abstract

    We consider the problem of high-dimensional Gaussian graphical model
    selection. We identify a set of graphs for which an efficient estimation
    algorithm exists, and this algorithm is based on thresholding of empirical
    conditional covariances. Under a set of transparent conditions, we establish
    structural consistency (or sparsistency) for the proposed algorithm, when the
    number of samples n=omega(J_{min}^{-2} log p), where p is the number of
    variables and J_{min} is the minimum (absolute) edge potential of the graphical
    model.

  63. Divide-and-Conquer Matrix Factorization.

    Authors: Michael I. Jordan, Ameet Talwalkar, Lester Mackey
    Subjects: Learning
    Abstract

    This work introduces SubMF, a parallel divide-and-conquer framework for noisy
    matrix factorization. SubMF divides a large-scale matrix factorization task
    into smaller subproblems, solves each subproblem in parallel using an arbitrary
    base matrix factorization algorithm, and combines the subproblem solutions
    using techniques from randomized matrix approximation. Our experiments with
    collaborative filtering, video background modeling, and simulated data
    demonstrate the near-linear to super-linear speed-ups attainable with this
    approach.

  64. A Dirty Model for Multiple Sparse Regression.

    Authors: Pradeep Ravikumar, Sujay Sanghavi, Ali Jalali
    Subjects: Learning
    Abstract

    Sparse linear regression -- finding an unknown vector from linear
    measurements -- is now known to be possible with fewer samples than variables,
    via methods like the LASSO. We consider the multiple sparse linear regression
    problem, where several related vectors -- with partially shared support sets --
    have to be recovered. A natural question in this setting is whether one can use
    the sharing to further decrease the overall number of samples required.

  65. Better Mini-Batch Algorithms via Accelerated Gradient Methods.

    Authors: Ohad Shamir, Karthik Sridharan, Nathan Srebro, Andrew Cotter
    Subjects: Learning
    Abstract

    Mini-batch algorithms have been proposed as a way to speed-up stochastic
    convex optimization problems. We study how such algorithms can be improved
    using accelerated gradient methods. We provide a novel analysis, which shows
    how standard gradient methods may sometimes be insufficient to obtain a
    significant speed-up and propose a novel accelerated gradient algorithm, which
    deals with this deficiency, enjoys a uniformly superior guarantee and works
    well in practice.

  66. Algorithmic Programming Language Identification.

    Authors: David Klein, Kyle Murray, Simon Weber
    Subjects: Learning
    Abstract

    Motivated by the amount of code that goes unidentified on the web, we
    introduce a practical method for algorithmically identifying the programming
    language of source code. Our work is based on supervised learning and
    intelligent statistical features. We also explored, but abandoned, a
    grammatical approach. In testing, our implementation greatly outperforms that
    of an existing tool that relies on a Bayesian classifier.

  67. Handling uncertainties in SVM classification.

    Authors: Rémi Flamary, Emilie Niaf, Carole Lartizien, Stéphane Canu
    Subjects: Learning
    Abstract

    This paper addresses the pattern classification problem arising when
    available target data include some uncertainty information. Target data
    considered here is either qualitative (a class label) or quantitative (an
    estimation of the posterior probability). Our main contribution is a SVM
    inspired formulation of this problem allowing to take into account class label
    through a hinge loss as well as probability estimates using epsilon-insensitive
    cost function together with a minimum norm (maximum margin) objective.

  68. Large margin filtering for signal sequence labeling.

    Authors: Alain Rakotomamonjy, Rémi Flamary, Benjamin Labbé
    Subjects: Learning
    Abstract

    Signal Sequence Labeling consists in predicting a sequence of labels given an
    observed sequence of samples. A naive way is to filter the signal in order to
    reduce the noise and to apply a classification algorithm on the filtered
    samples. We propose in this paper to jointly learn the filter with the
    classifier leading to a large margin filtering for classification. This method
    allows to learn the optimal cutoff frequency and phase of the filter that may
    be different from zero. Two methods are proposed and tested on a toy dataset
    and on a real life BCI dataset from BCI Competition III.

  69. Decoding finger movements from ECoG signals using switching linear models.

    Authors: Alain Rakotomamonjy, Rémi Flamary
    Subjects: Learning
    Abstract

    One of the major challenges of ECoG-based Brain-Machine Interfaces is the
    movement prediction of a human subject. Several methods exist to predict an arm
    2-D trajectory. The fourth BCI Competition gives a dataset in which the aim is
    to predict individual finger movements (5-D trajectory). The difficulty lies in
    the fact that there is no simple relation between ECoG signals and finger
    movement. We propose in this paper to decode finger flexions using switching
    models.

  70. On epsilon-optimality of the pursuit learning algorithm.

    Authors: Ryan Martin, Omkar Tilak
    Subjects: Learning
    Abstract

    Estimator algorithms in learning automata are useful tools for adaptive,
    real-time optimization in computer science and engineering applications. This
    paper investigates theoretical convergence properties for a special case of
    estimator algorithms---the pursuit learning algorithm. We identify a gap in
    existing proofs of probabilistic convergence for pursuit learning and present a
    more refined analysis to fill this gap.

  71. From Bandits to Experts: On the Value of Side-Observations.

    Authors: Ohad Shamir, Shie Mannor
    Subjects: Learning
    Abstract

    We consider an adversarial online learning setting where a decision maker can
    choose an action in every stage of the game. In addition to observing the
    reward of the chosen action, the decision maker gets side observations on the
    reward he would have obtained had he chosen some of the other actions. The
    observation structure is encoded as a graph, where node $i$ is linked to node
    $j$ if sampling $i$ provides information on the reward of $j$.

  72. Clustering with Multi-Layer Graphs: A Spectral Perspective.

    Authors: Pascal Frossard, Pierre Vandergheynst, Xiaowen Dong, Nikolai Nefedov
    Subjects: Learning
    Abstract

    Observational data usually comes with a multimodal nature, which means that
    it can be naturally represented by a multi-layer graph whose layers share the
    same set of vertices (users) with different edges (pairwise relationships). In
    this paper, we address the problem of combining different layers of the
    multi-layer graph for improved clustering of the vertices compared to using
    layers independently.

  73. Efficient Online Learning via Randomized Rounding.

    Authors: Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    Most online algorithms used in machine learning today are based on variants
    of mirror descent or follow-the-leader. In this paper, we present an online
    algorithm based on a completely different approach, which combines "random
    playout" and randomized rounding of loss subgradients. As an application of our
    approach, we provide the first computationally efficient online algorithm for
    collaborative filtering with norm-constrained matrices. As a second
    application, we solve an open question linking batch learning and transductive
    online learning.

  74. Efficient Optimal Learning for Contextual Bandits.

    Authors: Tong Zhang, John Langford, Daniel Hsu, Lev Reyzin, Nikos Karampatziakis, Miroslav Dudik, Satyen Kale
    Subjects: Learning
    Abstract

    We address the problem of learning in an online setting where the learner
    repeatedly observes features, selects among a set of actions, and receives
    reward for the action taken. We provide the first efficient algorithm with an
    optimal regret. Our algorithm uses a cost sensitive classification learner as
    an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number
    of classification rules among which the oracle might choose. This is
    exponentially faster than all previous algorithms that achieve optimal regret
    in this setting.

  75. Max-Margin Stacking and Sparse Regularization for Linear Classifier Combination and Selection.

    Authors: Hakan Erdogan, Mehmet Umut Sen
    Subjects: Learning
    Abstract

    The main principle of stacked generalization (or Stacking) is using a
    second-level generalizer to combine the outputs of base classifiers in an
    ensemble. In this paper, we investigate different combination types under the
    stacking framework; namely weighted sum (WS), class-dependent weighted sum
    (CWS) and linear stacked generalization (LSG). For learning the weights, we
    propose using regularized empirical risk minimization with the hinge loss. In
    addition, we propose using group sparsity for regularization to facilitate
    classifier selection.

  76. Large-Scale Convex Minimization with a Low-Rank Constraint.

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Alon Gonen
    Subjects: Learning
    Abstract

    We address the problem of minimizing a convex function over the space of
    large matrices with low rank. While this optimization problem is hard in
    general, we propose an efficient greedy algorithm and derive its formal
    approximation guarantees. Each iteration of the algorithm involves
    (approximately) finding the left and right singular vectors corresponding to
    the largest singular value of a certain matrix, which can be calculated in
    linear time.

  77. Bayesian and L$_\mathbf{1}$ Approaches to Sparse Unsupervised Learning.

    Authors: Katherine Heller, Zoubin Ghahramani, Shakir Mohamed
    Subjects: Learning
    Abstract

    The use of $L_1$ regularisation for sparse learning has generated immense
    research interest, with successful application in such diverse areas as signal
    acquisition, image coding, genomics and collaborative filtering.

  78. Submodular Functions Are Noise Stable.

    Authors: Mahdi Cheraghchi, Adam Klivans, Homin K. Lee, Pravesh Kothari
    Subjects: Learning
    Abstract

    We show that all non-negative submodular functions have high {\em
    noise-stability}. As a consequence, we obtain a polynomial-time learning
    algorithm for this class with respect to any product distribution on
    $\{-1,1\}^n$ (for any constant accuracy parameter $\epsilon$). Our algorithm
    also succeeds in the agnostic setting. Previous work on learning submodular
    functions required either query access or strong assumptions about the types of
    submodular functions to be learned (and did not hold in the agnostic setting).

  79. Online Learning, Stability, and Stochastic Gradient Descent.

    Authors: Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco
    Subjects: Learning
    Abstract

    In batch learning, stability together with existence and uniqueness of the
    solution corresponds to well-posedness of Empirical Risk Minimization (ERM)
    methods; recently, it was proved that CV_loo stability is necessary and
    sufficient for generalization and consistency of ERM. In this note, we
    introduce CV_on stability, which plays a similar note in online learning. We
    show that stochastic gradient descent (SDG) with the usual hypotheses is CVon
    stable and we then discuss the implications of CV_on stability for convergence
    of SGD.

  80. Bounding the Fat Shattering Dimension of a Composition Function Class Built Using a Continuous Logic Connective.

    Authors: Hubert Haoyang Duan
    Subjects: Learning
    Abstract

    We begin this report by describing the Probably Approximately Correct (PAC)
    model for learning a concept class, consisting of subsets of a domain, and a
    function class, consisting of functions from the domain to the unit interval.
    Two combinatorial parameters, the Vapnik-Chervonenkis (VC) dimension and its
    generalization, the Fat Shattering dimension of scale e, are explained and a
    few examples of their calculations are given with proofs.

  81. PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off.

    Authors: John Shawe-Taylor, Nicolò Cesa-Bianchi, Yevgeny Seldin, François Laviolette, Jan Peters, Peter Auer
    Subjects: Learning
    Abstract

    We develop a coherent framework for integrative simultaneous analysis of the
    exploration-exploitation and model order selection trade-offs. We improve over
    our preceding results on the same subject (Seldin et al., 2011) by combining
    PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a
    combination is also of independent interest for studies of multiple
    simultaneously evolving martingales.

  82. b-Bit Minwise Hashing for Large-Scale Linear SVM.

    Authors: Ping Li, Joshua Moore, Christian Konig
    Subjects: Learning
    Abstract

    In this paper, we propose to (seamlessly) integrate b-bit minwise hashing
    with linear SVM to substantially improve the training (and testing) efficiency
    using much smaller memory, with essentially no loss of accuracy. Theoretically,
    we prove that the resemblance matrix, the minwise hashing matrix, and the b-bit
    minwise hashing matrix are all positive definite matrices (kernels).
    Interestingly, our proof for the positive definiteness of the b-bit minwise
    hashing kernel naturally suggests a simple strategy to integrate b-bit hashing
    with linear SVM.

  83. Behavior of Graph Laplacians on Manifolds with Boundary.

    Authors: Mikhail Belkin, Xueyuan Zhou
    Subjects: Learning
    Abstract

    In manifold learning, algorithms based on graph Laplacians constructed from
    data have received considerable attention both in practical applications and
    theoretical analysis. In particular, the convergence of graph Laplacians
    obtained from sampled data to certain continuous operators has become an active
    research topic recently.

  84. Feature Selection for MAUC-Oriented Classification Systems.

    Authors: Rui Wang, Ke Tang
    Subjects: Learning
    Abstract

    Feature selection is an important pre-processing step for many pattern
    classification tasks. Traditionally, feature selection methods are designed to
    obtain a feature subset that can lead to high classification accuracy. However,
    classification accuracy has recently been shown to be an inappropriate
    performance metric of classification systems in many cases. Instead, the Area
    Under the receiver operating characteristic Curve (AUC) and its multi-class
    extension, MAUC, have been proved to be better alternatives.

  85. Semantic Vector Machines.

    Authors: Etter Vincent
    Subjects: Learning
    Abstract

    We first present our work in machine translation, during which we used
    aligned sentences to train a neural network to embed n-grams of different
    languages into an $d$-dimensional space, such that n-grams that are the
    translation of each other are close with respect to some metric. Good n-grams
    to n-grams translation results were achieved, but full sentences translation is
    still problematic.

  86. PAC-Bayesian Analysis of Martingales and Multiarmed Bandits.

    Authors: John Shawe-Taylor, Yevgeny Seldin, François Laviolette, Jan Peters, Peter Auer
    Subjects: Learning
    Abstract

    We present two alternative ways to apply PAC-Bayesian analysis to sequences
    of dependent random variables. The first is based on a new lemma that enables
    to bound expectations of convex functions of certain dependent random variables
    by expectations of the same functions of independent Bernoulli random
    variables. This lemma provides an alternative tool to Hoeffding-Azuma
    inequality to bound concentration of martingale values.

  87. Generalized Boosting Algorithms for Convex Optimization.

    Authors: J. Andrew Bagnell, Alexander Grubb
    Subjects: Learning
    Abstract

    Boosting is a popular way to derive powerful learners from simpler hypothesis
    classes. Following previous work (Mason et al., 1999; Friedman, 2000) on
    general boosting frameworks, we analyze gradient-based descent algorithms for
    boosting with respect to any convex objective and introduce a new measure of
    weak learner performance into this setting which generalizes existing work. We
    present the first weak to strong learning guarantees for the existing gradient
    boosting work for smooth convex objectives, and also demonstrate that this work
    fails for non-smooth objectives.

  88. Principal Graphs and Manifolds.

    Authors: A. N. Gorban, A. Y. Zinovyev
    Subjects: Learning
    Abstract

    In many physical, statistical, biological and other investigations it is
    desirable to approximate a system of points by objects of lower dimension
    and/or complexity. For this purpose, Karl Pearson invented principal component
    analysis in 1901 and found 'lines and planes of closest fit to system of
    points'. The famous k-means algorithm solves the approximation problem too, but
    by finite sets instead of lines and planes. This chapter gives a brief
    practical introduction into the methods of construction of general principal
    objects, i.e.

  89. Adaptively Learning the Crowd Kernel.

    Authors: Omer Tamuz, Ohad Shamir, Ce Liu, Serge Belongie, Adam Tauman Kalai
    Subjects: Learning
    Abstract

    We introduce an algorithm that, given n objects, learns a similarity matrix
    over all n^2 pairs, from crowdsourced data alone. The algorithm samples
    responses to adaptively chosen triplet-based relative-similarity queries. Each
    query has the form "is object 'a' more similar to 'b' or to 'c'?" and is chosen
    to be maximally informative given the preceding responses. The output is an
    embedding of the objects into Euclidean space (like MDS); we refer to this as
    the "crowd kernel."

  90. Rapid Feature Learning with Stacked Linear Denoisers.

    Authors: Zhixiang Eddie Xu, Kilian Q. Weinberger, Fei Sha
    Subjects: Learning
    Abstract

    We investigate unsupervised pre-training of deep architectures as feature
    generators for "shallow" classifiers. Stacked Denoising Autoencoders (SdA),
    when used as feature pre-processing tools for SVM classification, can lead to
    significant improvements in accuracy - however, at the price of a substantial
    increase in computational cost. In this paper we create a simple algorithm
    which mimics the layer by layer training of SdAs. However, in contrast to SdAs,
    our algorithm requires no training through gradient descent as the parameters
    can be computed in closed-form.

  91. Rapid Learning with Stochastic Focus of Attention.

    Authors: Zhiliang Ying, Raphael Pelossof
    Subjects: Learning
    Abstract

    We present a method to stop the evaluation of a decision making process when
    the result of the full evaluation is obvious. This trait is highly desirable
    for online margin-based machine learning algorithms where a classifier
    traditionally evaluates all the features for every example. We observe that
    some examples are easier to classify than others, a phenomenon which is
    characterized by the event when most of the features agree on the class of an
    example.

  92. Clustering Partially Observed Graphs via Convex Optimization.

    Authors: Huan Xu, Sujay Sanghavi, Yudong Chen, Ali Jalali
    Subjects: Learning
    Abstract

    This paper considers the problem of clustering a partially observed
    unweighted graph -- i.e. one where for some node pairs we know there is an edge
    between them, for some others we know there is no edge, and for the remaining
    we do not know whether or not there is an edge. We want to organize the nodes
    into disjoint clusters so that there is relatively dense (observed)
    connectivity within clusters, and sparse across clusters.

  93. PAC learnability versus VC dimension: a footnote to a basic result of statistical learning.

    Authors: Vladimir Pestov
    Subjects: Learning
    Abstract

    A fundamental result of statistical learnig theory states that a concept
    class is PAC learnable if and only if it is a uniform Glivenko-Cantelli class
    if and only if the VC dimension of the class is finite. However, the theorem is
    only valid under special assumptions of measurability of the class, in which
    case the PAC learnability even becomes consistent.

  94. Adaptive Evolutionary Clustering.

    Authors: Alfred O. Hero III, Kevin S. Xu, Mark Kliger
    Subjects: Learning
    Abstract

    In many practical applications of clustering, the objects to be clustered
    evolve over time, and a clustering result is desired at each time step. In such
    applications, evolutionary clustering typically outperforms ordinary static
    clustering by producing clustering results that reflect long-term trends while
    being robust to short-term variations. Several evolutionary clustering
    algorithms have recently been proposed, often by adding a temporal smoothness
    penalty to the cost function of a static clustering method.

  95. Efficient First Order Methods for Linear Composite Regularizers.

    Authors: Massimiliano Pontil, Charles A. Micchelli, Andreas Argyriou, Lixin Shen, Yuesheng Xu
    Subjects: Learning
    Abstract

    A wide class of regularization problems in machine learning and statistics
    employ a regularization term which is obtained by composing a simple convex
    function \omega with a linear transformation. This setting includes Group Lasso
    methods, the Fused Lasso and other total variation methods, multi-task learning
    methods and many more. In this paper, we present a general approach for
    computing the proximity operator of this class of regularizers, under the
    assumption that the proximity operator of the function \omega is known in
    advance.

  96. Online and Batch Learning Algorithms for Data with Missing Features.

    Authors: Alekh Agarwal, Peter Bartlett, Afshin Rostamizadeh
    Subjects: Learning
    Abstract

    We introduce new online and batch algorithms that are robust to data with
    missing features, a situation that arises in many practical applications. In
    the online setup, we allow for the comparison hypothesis to change as a
    function of the subset of features that is observed on any given round,
    extending the standard setting where the comparison hypothesis is fixed
    throughout. In the batch setup, we present a convex relation of a non-convex
    problem to jointly estimate an imputation function, used to fill in the values
    of missing features, along with the classification hypothesis.

  97. Classification of Sets using Restricted Boltzmann Machines.

    Authors: Hugo Larochelle, Jérôme Louradour
    Subjects: Learning
    Abstract

    We consider the problem of classification when inputs correspond to sets of
    vectors. This setting occurs in many problems such as the classification of
    pieces of mail containing several pages, of web sites with several sections or
    of images that have been pre-segmented into smaller regions. We propose
    generalizations of the restricted Boltzmann machine (RBM) that are appropriate
    in this context and explore how to incorporate different assumptions about the
    relationship between the input sets and the target class within the RBM.

  98. Clustered regression with unknown clusters.

    Authors: Onkar Dabeer, Kishor Barman
    Subjects: Learning
    Abstract

    We consider a collection of prediction experiments, which are clustered in
    the sense that groups of experiments ex- hibit similar relationship between the
    predictor and response variables. The experiment clusters as well as the
    regres- sion relationships are unknown. The regression relation- ships define
    the experiment clusters, and in general, the predictor and response variables
    may not exhibit any clus- tering. We call this prediction problem clustered
    regres- sion with unknown clusters (CRUC) and in this paper we focus on linear
    regression.

  99. Doubly Robust Policy Evaluation and Learning.

    Authors: John Langford, Lihong Li, Miroslav Dudik
    Subjects: Learning
    Abstract

    We study decision making in environments where the reward is only partially
    observed, but can be modeled as a function of an action and an observed
    context. This setting, known as contextual bandits, encompasses a wide variety
    of applications including health-care policy and Internet advertising. A
    central task is evaluation of a new policy given historic data consisting of
    contexts, actions and received rewards. The key challenge is that the past data
    typically does not faithfully represent proportions of actions taken by a new
    policy.

  100. Parallel Online Learning.

    Authors: John Langford, Daniel Hsu, Alex Smola, Nikos Karampatziakis
    Subjects: Learning
    Abstract

    In this work we study parallelization of online learning, a core primitive in
    machine learning. In a parallel environment all known approaches for parallel
    online learning lead to delayed updates, where the model is updated using
    out-of-date information.

  101. A note on active learning for smooth problems.

    Authors: Satyaki Mahalanabis
    Subjects: Learning
    Abstract

    We show that the disagreement coefficient of certain smooth hypothesis
    classes is $O(m)$, where $m$ is the dimension of the hypothesis space, thereby
    answering a question posed in \cite{friedman09}.

  102. COMET: A Recipe for Learning and Using Large Ensembles on Massive Data.

    Authors: Tamara G. Kolda, Justin D. Basilico, M. Arthur Munson, Kevin R. Dixon, W. Philip Kegelmeyer
    Subjects: Learning
    Abstract

    The collection of massive volumes of data requires machine learning
    algorithms that can be applied to distributed data. We describe COMET (Cloud of
    Massive Ensemble Trees), a recipe for distributed supervised learning
    consisting of three components: (1) Map-Reduce is used to parallelize the
    learning and evaluation tasks and collect the results, (2) an IVoting Random
    Forest is used for the learning task on each local data partition, and (3) the
    results of all local ensembles are combined into one massive ensemble and lazy
    evaluation is used to dynamically subsample it.

  103. Learning transformed product distributions.

    Authors: Ilias Diakonikolas, Rocco A. Servedio, Constantinos Daskalakis
    Subjects: Learning
    Abstract

    We consider the problem of learning an unknown product distribution $X$ over
    $\{0,1\}^n$ using samples $f(X)$ where $f$ is a \emph{known} transformation
    function. Each choice of a transformation function $f$ specifies a learning
    problem in this framework.

  104. Multi-label Learning via Structured Decomposition and Group Sparsity.

    Authors: Tianyi Zhou, Dacheng Tao
    Subjects: Learning
    Abstract

    In multi-label learning, each sample is associated with several labels.
    Existing works indicate that exploring correlations between labels improve the
    prediction performance. However, embedding the label correlations into the
    training process significantly increases the problem size. Moreover, the
    mapping of the label structure in the feature space is not clear. In this
    paper, we propose a novel multi-label learning method "Structured Decomposition
    + Group Sparsity (SDGS)".

  105. Concentration-Based Guarantees for Low-Rank Matrix Reconstruction.

    Authors: Nathan Srebro, Rina Foygel
    Subjects: Learning
    Abstract

    We consider the problem of approximately reconstructing a partially-observed,
    approximately low-rank matrix. This problem has received much attention lately,
    mostly using the trace-norm as a surrogate to the rank. Here we study low-rank
    matrix reconstruction using both the trace-norm, as well as the less-studied
    max-norm, and present reconstruction guarantees based on existing analysis on
    the Rademacher complexity of the unit balls of these norms. We show how these
    are superior in several ways to recently published guarantees based on
    specialized analysis.

  106. Universal Learning Theory.

    Authors: Marcus Hutter
    Subjects: Learning
    Abstract

    This encyclopedic article gives a mini-introduction into the theory of
    universal learning, founded by Ray Solomonoff in the 1960s and significantly
    developed and extended in the last decade. It explains the spirit of universal
    learning, but necessarily glosses over technical subtleties.

  107. Spatially-Aware Comparison and Consensus for Clusterings.

    Authors: Jeff M. Phillips, Suresh Venkatasubramanian, Parasaran Raman
    Subjects: Learning
    Abstract

    This paper proposes a new distance metric between clusterings that
    incorporates information about the spatial distribution of points and clusters.
    Our approach builds on the idea of a Hilbert space-based representation of
    clusters as a combination of the representations of their constituent points.
    We use this representation and the underlying metric to design a
    spatially-aware consensus clustering procedure. This consensus procedure is
    implemented via a novel reduction to Euclidean clustering, and is both simple
    and efficient. All of our results apply to both soft and hard clusterings.

  108. Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing.

    Authors: Kian Hsiang Low, John M. Dolan, Pradeep Khosla
    Subjects: Learning
    Abstract

    Recent research in multi-robot exploration and mapping has focused on
    sampling environmental fields, which are typically modeled using the Gaussian
    process (GP). Existing information-theoretic exploration strategies for
    learning GP-based environmental field maps adopt the non-Markovian problem
    structure and consequently scale poorly with the length of history of
    observations. Hence, it becomes computationally impractical to use these
    strategies for in situ, real-time active sampling.

  109. Close the Gaps: A Learning-while-Doing Algorithm for a Class of Single-Product Revenue Management Problems.

    Authors: Zizhuo Wang, Yinyu Ye, Shiming Deng
    Subjects: Learning
    Abstract

    We present a dynamic learning algorithm for a class of single-product revenue
    management problems. In these problems, a retailer sells a single product with
    limited on-hand inventory over a finite selling season. Customer demand arrives
    according to a Poisson process, the rate of which is influenced by a single
    action taken by the retailer (such as price adjustment, sales commission,
    advertisement intensity, etc.). The relation between the action and the demand
    rate is not known in advance.

  110. Queue-Aware Dynamic Clustering and Power Allocation for Network MIMO Systems via Distributive Stochastic Learning.

    Authors: Ying Cui, Vincent K.N.Lau, Qingqing Huang
    Subjects: Learning
    Abstract

    In this paper, we propose a two-timescale delay-optimal dynamic clustering
    and power allocation design for downlink network MIMO systems. The dynamic
    clustering control is adaptive to the global queue state information (GQSI)
    only and computed at the base station controller (BSC) over a longer time
    scale. On the other hand, the power allocations of all the BSs in one cluster
    are adaptive to both intra-cluster channel state information (CCSI) and
    intra-cluster queue state information (CQSI), and computed at the cluster
    manager (CM) over a shorter time scale.

  111. An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA.

    Authors: Matthias Hein, Thomas Bühler
    Subjects: Learning
    Abstract

    Many problems in machine learning and statistics can be formulated as
    (generalized) eigenproblems. In terms of the associated optimization problem,
    computing linear eigenvectors amounts to finding critical points of a quadratic
    function subject to quadratic constraints. In this paper we show that a certain
    class of constrained optimization problems with nonquadratic objective and
    constraints can be understood as nonlinear eigenproblems. We derive a
    generalization of the inverse power method which is guaranteed to converge to a
    nonlinear eigenvector.

  112. Tight Sample Complexity of Large-Margin Learning.

    Authors: Nathan Srebro, Sivan Sabato, Naftali Tishby
    Subjects: Learning
    Abstract

    We obtain a tight distribution-specific characterization of the sample
    complexity of large-margin classification with L_2 regularization: We introduce
    the \gamma-adapted-dimension, which is a simple function of the spectrum of a
    distribution's covariance matrix, and show distribution-specific upper and
    lower bounds on the sample complexity, both governed by the
    \gamma-adapted-dimension of the source distribution. We conclude that this new
    quantity tightly characterizes the true sample complexity of large-margin
    classification.

  113. PADDLE: Proximal Algorithm for Dual Dictionaries LEarning.

    Authors: Curzio Basso, Matteo Santoro, Alessandro Verri, Silvia Villa
    Subjects: Learning
    Abstract

    Recently, considerable research efforts have been devoted to the design of
    methods to learn from data overcomplete dictionaries for sparse coding.
    However, learned dictionaries require the solution of an optimization problem
    for coding new data. In order to overcome this drawback, we propose an
    algorithm aimed at learning both a dictionary and its dual: a linear mapping
    directly performing the coding.

  114. Blackwell Approachability and Low-Regret Learning are Equivalent.

    Authors: Peter L. Bartlett, Jacob Abernethy, Elad Hazan
    Subjects: Learning
    Abstract

    We consider the celebrated Blackwell Approachability Theorem for two-player
    games with vector payoffs. We show that Blackwell's result is equivalent, via
    efficient reductions, to the existence of "no-regret" algorithms for Online
    Linear Optimization. Indeed, we show that any algorithm for one such problem
    can be efficiently converted into an algorithm for the other. We provide a
    useful application of this reduction: the first efficient algorithm for
    calibrated forecasting.

  115. No-Regret Reductions for Imitation Learning and Structured Prediction.

    Authors: Geoffrey J. Gordon, Stephane Ross, J. Andrew Bagnell
    Subjects: Learning
    Abstract

    Sequential prediction problems such as imitation learning, where future
    observations depend on previous predictions (actions), violate the common
    i.i.d. assumptions made in statistical learning. This leads to poor performance
    in both theory and often in practice. Some recent approaches provide stronger
    performance guarantees in this setting, but remain somewhat unsatisfactory as
    they train either non-stationary or a stochastic policies and require a large
    number of iterations.

  116. Regularized Risk Minimization by Nesterov's Accelerated Gradient Methods: Algorithmic Extensions and Empirical Studies.

    Authors: Ankan Saha, S.V.N. Vishwanathan, Xinhua Zhang
    Subjects: Learning
    Abstract

    Nesterov's accelerated gradient methods (AGM) have been successfully applied
    in many machine learning areas. However, their empirical performance on
    training max-margin models has been inferior to existing specialized solvers.
    In this paper, we first extend AGM to strongly convex and composite objective
    functions with Bregman style prox-functions. Our unifying framework covers both
    the $\infty$-memory and 1-memory styles of AGM, tunes the Lipschiz constant
    adaptively, and bounds the duality gap.

  117. Predictive State Temporal Difference Learning.

    Authors: Byron Boots, Geoffrey J. Gordon
    Subjects: Learning
    Abstract

    We propose a new approach to value function approximation which combines
    linear temporal difference reinforcement learning with subspace identification.
    In practical applications, reinforcement learning (RL) is complicated by the
    fact that state is either high-dimensional or partially observable. Therefore,
    RL methods are designed to work with features of state rather than state
    itself, and the success or failure of learning is often determined by the
    suitability of the selected features.

  118. Sparse Inverse Covariance Selection via Alternating Linearization Methods.

    Authors: Donald Goldfarb, Shiqian Ma, Katya Scheinberg
    Subjects: Learning
    Abstract

    Gaussian graphical models are of great interest in statistical learning.
    Because the conditional independencies between different nodes correspond to
    zero entries in the inverse covariance matrix of the Gaussian distribution, one
    can learn the structure of the graph by estimating a sparse inverse covariance
    matrix from sample data, by solving a convex maximum likelihood problem with an
    $\ell_1$-regularization term.

  119. Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations.

    Authors: Andri Mirzal
    Subjects: Learning
    Abstract

    This paper proposes uni-orthogonal and bi-orthogonal nonnegative matrix
    factorization algorithms with robust convergence proofs. We design the
    algorithms based on the work of Lee and Seung for the standard nonnegative
    matrix factorization [1], and derive the converged versions by utilizing ideas
    from the work of Lin [2]. The experimental results confirm the theoretical
    guarantees of the convergences.

  120. Efficient Matrix Completion with Gaussian Models.

    Authors: Guillermo Sapiro, Guoshen Yu, Flavien Léger
    Subjects: Learning
    Abstract

    A general framework based on Gaussian models and a MAP-EM algorithm is
    introduced in this paper for solving matrix/table completion problems. The
    numerical experiments with the standard and challenging movie ratings data show
    that the proposed approach, based on probably one of the simplest probabilistic
    models, leads to the results in the same ballpark as the state-of-the-art, at a
    lower computational cost.

  121. Robust PCA via Outlier Pursuit.

    Authors: Huan Xu, Constantine Caramanis, Sujay Sanghavi
    Subjects: Learning
    Abstract

    Singular Value Decomposition (and Principal Component Analysis) is one of the
    most widely used techniques for dimensionality reduction: successful and
    efficiently computable, it is nevertheless plagued by a well-known,
    well-documented sensitivity to outliers. Recent work has considered the setting
    where each point has a few arbitrarily corrupted components.

  122. Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions.

    Authors: Ilias Diakonikolas, Rocco A. Servedio, Ryan O'Donnell, Yi Wu
    Subjects: Learning
    Abstract

    Hardness results for maximum agreement problems have close connections to
    hardness results for proper learning in computational learning theory. In this
    paper we prove two hardness results for the problem of finding a low degree
    polynomial threshold function (PTF) which has the maximum possible agreement
    with a given set of labeled examples in $\R^n \times \{-1,1\}.$ We prove that
    for any constants $d\geq 1, \eps > 0$,

    {itemize}

  123. Near-Optimal Bayesian Active Learning with Noisy Observations.

    Authors: Andreas Krause, Daniel Golovin, Debajyoti Ray
    Subjects: Learning
    Abstract

    We tackle the fundamental problem of Bayesian active learning with noise,
    where we need to adaptively select from a number of expensive tests in order to
    identify an unknown hypothesis sampled from a known prior distribution. In the
    case of noise-free observations, a greedy algorithm called generalized binary
    search (GBS) is known to perform near-optimally. We show that if the
    observations are noisy, perhaps surprisingly, GBS can perform very poorly.

  124. Queue-Aware Distributive Resource Control for Delay-Sensitive Two-Hop MIMO Cooperative Systems.

    Authors: Vincent K. N. Lau, Ying Cui, Rui Wang
    Subjects: Learning
    Abstract

    In this paper, we consider a queue-aware distributive resource control
    algorithm for two-hop MIMO cooperative systems. We shall illustrate that relay
    buffering is an effective way to reduce the intrinsic half-duplex penalty in
    cooperative systems. The complex interactions of the queues at the source node
    and the relays are modeled as an average-cost infinite horizon Markov Decision
    Process (MDP). The traditional approach solving this MDP problem involves
    centralized control with huge complexity.

  125. Extension of Wirtinger's Calculus to Reproducing Kernel Hilbert Spaces and the Complex Kernel LMS.

    Authors: Sergios Theodoridis, Pantelis Bouboulis
    Subjects: Learning
    Abstract

    Over the last decade, kernel methods for nonlinear processing have
    successfully been used in the machine learning community. The primary
    mathematical tool employed in these methods is the notion of the Reproducing
    Kernel Hilbert Space. However, so far, the emphasis has been on batch
    techniques. It is only recently, that online techniques have been considered in
    the context of adaptive signal processing tasks. Moreover, these efforts have
    only been focussed on real valued data sequences.

  126. Fast Reinforcement Learning for Energy-Efficient Wireless Communications.

    Authors: Mihaela van der Schaar, Nicholas Mastronarde
    Subjects: Learning
    Abstract

    We consider the problem of energy-efficient point-to-point transmission of
    delay-sensitive data (e.g. multimedia data) over a fading channel.

  127. Hedging Strategies for Bayesian Optimization.

    Authors: Eric Brochu, Matthew Hoffman, Nando de Freitas
    Subjects: Learning
    Abstract

    Bayesian optimization with Gaussian processes has become an increasingly
    popular tool in the machine learning community. It is efficient and can be used
    when very little is known about the objective function, making it popular in
    expensive black-box optimization scenarios. It is able to do this by sampling
    the objective using an acquisition function which incorporates the model's
    estimate of the objective and the uncertainty at any given point. However,
    there are several different parameterized acquisition functions in the
    literature, and it is often unclear which one to use.

  128. Speaker Identification using MFCC-Domain Support Vector Machine.

    Authors: Md. Saiful Islam, S. M. Kamruzzaman, A. N. M. Rezaul Karim, Md. Emdadul Haque
    Subjects: Learning
    Abstract

    Speech recognition and speaker identification are important for
    authentication and verification in security purpose, but they are difficult to
    achieve. Speaker identification methods can be divided into text-independent
    and text-dependent. This paper presents a technique of text-dependent speaker
    identification using MFCC-domain support vector machine (SVM).

  129. Multi-parametric Solution-path Algorithm for Instance-weighted Support Vector Machines.

    Authors: Masashi Sugiyama, Masayuki Karasuyama, Naoyuki Harada, Ichiro Takeuchi
    Subjects: Learning
    Abstract

    An instance-weighted variant of the support vector machine (SVM) has
    attracted considerable attention recently since they are useful in various
    machine learning tasks such as non-stationary data analysis, heteroscedastic
    data modeling, transfer learning, learning to rank, and transduction. An
    important challenge in these scenarios is to overcome the computational
    bottleneck---instance weights often change dynamically or adaptively, and thus
    the weighted SVM solutions must be repeatedly computed.

  130. Efficient L1/Lq Norm Regularization.

    Authors: Jun Liu, Jieping Ye
    Subjects: Learning
    Abstract

    Sparse learning has recently received increasing attention in many areas
    including machine learning, statistics, and applied mathematics. The mixed-norm
    regularization based on the L1/Lq norm with q > 1 is attractive in many
    applications of regression and classification in that it facilitates group
    sparsity in the model. The resulting optimization problem is, however,
    challenging to solve due to the structure of the L1/Lq -regularization.
    Existing work deals with special cases including q = 2,infinity, and they
    cannot be easily extended to the general case.

  131. Text Classification using the Concept of Association Rule of Data Mining.

    Authors: S. M. Kamruzzaman, Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Parvez Naushad
    Subjects: Learning
    Abstract

    As the amount of online text increases, the demand for text classification to
    aid the analysis and management of text is increasing. Text is cheap, but
    information, in the form of knowing what classes a text belongs to, is
    expensive. Automatic classification of text can provide this information at low
    cost, but the classifiers themselves must be built with expensive human effort,
    or trained from texts which have themselves been manually classified. In this
    paper we will discuss a procedure of classifying text using the concept of
    association rule of data mining.

  132. Structured sparsity-inducing norms through submodular functions.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    Sparse methods for supervised learning aim at finding good linear predictors
    from as few variables as possible, i.e., with small cardinality of their
    supports. This combinatorial selection problem is often turned into a convex
    optimization problem by replacing the cardinality function by its convex
    envelope (tightest convex lower bound), in this case the L1-norm.

  133. Safe Feature Elimination in Sparse Supervised Learning.

    Authors: Laurent El Ghaoui, Vivian Viallon, Tarek Rabbani
    Subjects: Learning
    Abstract

    We investigate fast methods that allow to quickly eliminate variables
    (features) in supervised learning problems involving a convex loss function and
    a $l_1$-norm penalty, leading to a potentially substantial reduction in the
    number of variables prior to running the supervised learning algorithm. The
    methods are not heuristic: they only eliminate features that are {\em
    guaranteed} to be absent after solving the learning problem. Our framework
    applies to a large class of problems, including support vector machine
    classification, logistic regression and least-squares.

  134. Approximate Inference and Stochastic Optimal Control.

    Authors: Konrad Rawlik, Marc Toussaint, Sethu Vijayakumar
    Subjects: Learning
    Abstract

    We propose a novel reformulation of the stochastic optimal control problem as
    an approximate inference problem, demonstrating, that such a interpretation
    leads to new practical methods for the original problem. In particular we
    characterise a novel class of iterative solutions to the stochastic optimal
    control problem based on a natural relaxation of the exact dual formulation.
    These theoretical insights are applied to the Reinforcement Learning problem
    where they lead to new model free, off policy methods for discrete and
    continuous problems.

  135. Conditional Random Fields and Support Vector Machines: A Hybrid Approach.

    Authors: Tiberio Caetano, Mark D. Reid, Qinfeng Shi
    Subjects: Learning
    Abstract

    We propose a novel hybrid loss for multiclass and structured prediction
    problems that is a convex combination of log loss for Conditional Random Fields
    (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We
    provide a sufficient condition for when the hybrid loss is Fisher consistent
    for classification. This condition depends on a measure of dominance between
    labels - specifically, the gap in per observation probabilities between the
    most likely labels. We also prove Fisher consistency is necessary for
    parametric consistency when learning models such as CRFs.

  136. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and Implicit Updates.

    Authors: H. Brendan McMahan
    Subjects: Learning
    Abstract

    We study two families of online convex optimization algorithms: mirror
    descent and follow-the-regularized-leader. We prove that many mirror descent
    algorithms (such as online gradient descent) are actually instances of a more
    general follow-the-leader algorithm which uses proximal regularization
    (additional strong convexity centered at the current feasible point). Further,
    under certain step-size assumptions, other mirror-descent algorithms are
    equivalent to follow-the-leader algorithms with origin-centered regularization
    (such as dual averaging).

  137. Reinforcement Learning by Comparing Immediate Reward.

    Authors: Shishir Kumar, Punit Pandey, Deepshikha Pandey
    Subjects: Learning
    Abstract

    This paper introduces an approach to Reinforcement Learning Algorithm by
    comparing their immediate rewards using a variation of Q-Learning algorithm.
    Unlike the conventional Q-Learning, the proposed algorithm compares current
    reward with immediate reward of past move and work accordingly. Relative reward
    based Q-learning is an approach towards interactive learning. Q-Learning is a
    model free reinforcement learning method that used to learn the agents.

  138. Fast Overlapping Group Lasso.

    Authors: Jun Liu, Jieping Ye
    Subjects: Learning
    Abstract

    The group Lasso is an extension of the Lasso for feature selection on
    (predefined) non-overlapping groups of features. The non-overlapping group
    structure limits its applicability in practice. There have been several recent
    attempts to study a more general formulation, where groups of features are
    given, potentially with overlaps between the groups. The resulting optimization
    is, however, much more challenging to solve due to the group overlaps. In this
    paper, we consider the efficient optimization of the overlapping group Lasso
    penalized problem.

  139. A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering.

    Authors: Yevgeny Seldin
    Subjects: Learning
    Abstract

    We formulate weighted graph clustering as a prediction problem: given a
    subset of edge weights we analyze the ability of graph clustering to predict
    the remaining edge weights. This formulation enables practical and theoretical
    comparison of different approaches to graph clustering as well as comparison of
    graph clustering with other possible ways to model the graph. We adapt the
    PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009)
    to derive a PAC-Bayesian generalization bound for graph clustering.

  140. Exploring Language-Independent Emotional Acoustic Features via Feature Selection.

    Authors: Arslan Shaukat, Ke Chen
    Subjects: Learning
    Abstract

    We propose a novel feature selection strategy to discover
    language-independent acoustic features that tend to be responsible for emotions
    regardless of languages, linguistics and other factors. Experimental results
    suggest that the language-independent feature subset discovered yields the
    performance comparable to the full feature set on various emotional speech
    corpora.

  141. Network Flow Algorithms for Structured Sparsity.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    We consider a class of learning problems that involve a structured
    sparsity-inducing norm defined as the sum of $\ell_\infty$-norms over groups of
    variables. Whereas a lot of effort has been put in developing fast optimization
    methods when the groups are disjoint or embedded in a specific hierarchical
    structure, we address here the case of general overlapping groups. To this end,
    we show that the corresponding optimization problem is related to network flow
    optimization. More precisely, the proximal problem associated with the norm we
    consider is dual to a quadratic min-cost flow problem.

  142. Fixed-point and coordinate descent algorithms for regularized kernel methods.

    Authors: Francesco Dinuzzo
    Subjects: Learning
    Abstract

    In this paper, we study two general classes of optimization algorithms for
    kernel methods with convex loss function and quadratic norm regularization, and
    analyze their convergence. The first approach, based on fixed-point iterations,
    is simple to implement and analyze, and can be easily parallelized. The second,
    based on coordinate descent, exploits the structure of additively separable
    loss functions to compute solutions of line searches in closed form.

  143. Freezing and Sleeping: Tracking Experts that Learn by Evolving Past Posteriors.

    Authors: Tim van Erven, Wouter M. Koolen
    Subjects: Learning
    Abstract

    A problem posed by Freund is how to efficiently track a small pool of experts
    out of a much larger set. This problem was solved when Bousquet and Warmuth
    introduced their mixing past posteriors (MPP) algorithm in 2001.

  144. Switching between Hidden Markov Models using Fixed Share.

    Authors: Tim van Erven, Wouter M. Koolen
    Subjects: Learning
    Abstract

    In prediction with expert advice the goal is to design online prediction
    algorithms that achieve small regret (additional loss on the whole data)
    compared to a reference scheme. In the simplest such scheme one compares to the
    loss of the best expert in hindsight. A more ambitious goal is to split the
    data into segments and compare to the best expert on each segment. This is
    appropriate if the nature of the data changes between segments. The standard
    fixed-share algorithm is fast and achieves small regret compared to this
    scheme.

  145. NESVM: a Fast Gradient Method for Support Vector Machines.

    Authors: Tianyi Zhou, Dacheng Tao, Xindong Wu
    Subjects: Learning
    Abstract

    Support vector machines (SVMs) are invaluable tools for many practical
    applications in artificial intelligence, e.g., classification and event
    recognition. However, popular SVM solvers are not sufficiently efficient for
    applications with a great deal of samples as well as a large number of
    features. In this paper, thus, we present NESVM, a fast gradient SVM solver
    that can optimize various SVM models, e.g., classical SVM, linear programming
    SVM and least square SVM.

  146. Comparison of Support Vector Machine and Back Propagation Neural Network in Evaluating the Enterprise Financial Distress.

    Authors: Ming-Chang Lee, Chang To
    Subjects: Learning
    Abstract

    Recently, applying the novel data mining techniques for evaluating enterprise
    financial distress has received much research alternation. Support Vector
    Machine (SVM) and back propagation neural (BPN) network has been applied
    successfully in many areas with excellent generalization results, such as rule
    extraction, classification and evaluation. In this paper, a model based on SVM
    with Gaussian RBF kernel is proposed here for enterprise financial distress
    evaluation.

  147. Adapting to the Shifting Intent of Search Queries.

    Authors: Aleksandrs Slivkins, Umar Syed, Nina Mishra
    Subjects: Learning
    Abstract

    Search engines today present results that are often oblivious to abrupt
    shifts in intent. For example, the query `independence day' usually refers to a
    US holiday, but the intent of this query abruptly changed during the release of
    a major film by that name. While no studies exactly quantify the magnitude of
    intent-shifting traffic, studies suggest that news events, seasonal topics, pop
    culture, etc account for 50% of all search queries.

  148. Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction.

    Authors: Tianyi Zhou, Dacheng Tao, Xindong Wu
    Subjects: Learning
    Abstract

    It is difficult to find the optimal sparse solution of a manifold learning
    based dimensionality reduction algorithm. The lasso or the elastic net
    penalized manifold learning based dimensionality reduction is not directly a
    lasso penalized least square problem and thus the least angle regression (LARS)
    (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse
    learning, cannot be applied. Therefore, most current approaches take indirect
    ways or have strict settings, which can be inconvenient for applications.

  149. A Brief Introduction to Temporality and Causality.

    Authors: Kamran Karimi
    Subjects: Learning
    Abstract

    Causality is a non-obvious concept that is often considered to be related to
    temporality. In this paper we present a number of past and present approaches
    to the definition of temporality and causality from philosophical, physical,
    and computational points of view. We note that time is an important ingredient
    in many relationships and phenomena. The topic is then divided into the two
    main areas of temporal discovery, which is concerned with finding relations
    that are stretched over time, and causal discovery, where a claim is made as to
    the causal influence of certain events on others.

  150. Reinforcement Learning via AIXI Approximation.

    Authors: Joel Veness, Kee Siong Ng, Marcus Hutter, David Silver
    Subjects: Learning
    Abstract

    This paper introduces a principled approach for the design of a scalable
    general reinforcement learning agent. This approach is based on a direct
    approximation of AIXI, a Bayesian optimality notion for general reinforcement
    learning agents. Previously, it has been unclear whether the theory of AIXI
    could motivate the design of practical algorithms. We answer this hitherto open
    question in the affirmative, by providing the first computationally feasible
    approximation to the AIXI agent.

  151. Consistency of Feature Markov Processes.

    Authors: Marcus Hutter, Peter Sunehag
    Subjects: Learning
    Abstract

    We are studying long term sequence prediction (forecasting). We approach this
    by investigating criteria for choosing a compact useful state representation.
    The state is supposed to summarize useful information from the history. We want
    a method that is asymptotically consistent in the sense it will provably
    eventually only choose between alternatives that satisfy an optimality property
    related to the used criterion.

  152. A note on sample complexity of learning binary output neural networks under fixed input distributions.

    Authors: Vladimir Pestov
    Subjects: Learning
    Abstract

    We show that the learning sample complexity of a sigmoidal neural network
    constructed by Sontag (1992) required to achieve a given misclassification
    error under a fixed purely atomic distribution can grow arbitrarily fast: for
    any prescribed rate of growth there is an input distribution having this rate
    as the sample complexity, and the bound is asymptotically tight. The rate can
    be superexponential, a non-recursive function, etc. We further observe that
    Sontag's ANN is not Glivenko-Cantelli under any input distribution having a
    non-atomic part.

  153. Adaptive Bound Optimization for Online Convex Optimization.

    Authors: Matthew Streeter, H. Brendan McMahan
    Subjects: Learning
    Abstract

    We introduce a new online convex optimization algorithm that adaptively
    chooses its regularization function based on the loss functions observed so
    far. This is in contrast to previous algorithms that use a fixed regularization
    function such as L2-squared, and modify it only via a single time-dependent
    parameter. Our algorithm's regret bounds are worst-case optimal, and for
    certain realistic classes of loss functions they are much better than existing
    bounds. These bounds are problem-dependent, which means they can exploit the
    structure of the actual problem instance.

  154. Filtrage vaste marge pour l'\'etiquetage s\'equentiel \`a noyaux de signaux.

    Authors: Alain Rakotomamonjy, Rémi Flamary, Benjamin Labbé
    Subjects: Learning
    Abstract

    We address in this paper the problem of multi-channel signal sequence
    labeling. In particular, we consider the problem where the signals are
    contaminated by noise or may present some dephasing with respect to their
    labels. For that, we propose to jointly learn a SVM sample classifier with a
    temporal filtering of the channels. This will lead to a large margin filtering
    that is adapted to the specificity of each channel (noise and time-lag). We
    derive algorithms to solve the optimization problem and we discuss different
    filter regularizations for automated scaling or selection of channels.

  155. The Latent Bernoulli-Gauss Model for Data Analysis.

    Authors: Amnon Shashua, Gabi Pragier
    Subjects: Learning
    Abstract

    We present a new latent-variable model employing a Gaussian mixture
    integrated with a feature selection procedure (the Bernoulli part of the model)
    which together form a "Latent Bernoulli-Gauss" distribution. The model is
    applied to MAP estimation, clustering, feature selection and collaborative
    filtering and fares favorably with the state-of-the-art latent-variable models.

  156. A Reinforcement Learning Model Using Neural Networks for Music Sight Reading Learning Problem.

    Authors: Keyvan Yahya, Pouyan Rafiei Fard
    Subjects: Learning
    Abstract

    Music Sight Reading is a complex process that when it is occurred in the
    brain, some learning attributes would be emerged. Besides giving a model based
    on actor-critic method in the Reinforcement Learning, the agent is considered
    to have a neural network structure. We studied on where the sight reading
    process is happened and also a serious problem which is how the synaptic
    weights would be adjusted through the learning process. The model we offer here
    is a computational model on which an updated weights equation to fixing the
    weights is accompanied too.

  157. Query Strategies for Evading Convex-Inducing Classifiers.

    Authors: Benjamin I. P. Rubinstein, Ling Huang, Blaine Nelson, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
    Subjects: Learning
    Abstract

    Classifiers are often used to detect miscreant activities. We study how an
    adversary can systematically query a classifier to elicit information that
    allows the adversary to evade detection while incurring a near-minimal cost of
    modifying their intended malfeasance. We generalize the theory of Lowd and Meek
    (2005) to the family of convex-inducing classifiers that partition input space
    into two sets one of which is convex.

  158. PAC learnability of a concept class under non-atomic measures: a problem by Vidyasagar.

    Authors: Vladimir Pestov
    Subjects: Learning
    Abstract

    In response to a 1997 problem of M. Vidyasagar, we state a necessary and
    sufficient condition for distribution-free PAC learnability of a concept class
    $\mathscr C$ under the family of all non-atomic (diffuse) measures on the
    domain $\Omega$. Clearly, finiteness of the classical Vapnik-Chervonenkis
    dimension of $\mathscr C$ is a sufficient, but no longer necessary, condition.
    Besides, learnability of $\mathscr C$ under non-atomic measures does not imply
    the uniform Glivenko-Cantelli property with regard to non-atomic measures.

  159. GraphLab: A New Framework for Parallel Machine Learning.

    Authors: Danny Bickson, Carlos Guestrin, Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Joseph M. Hellerstein
    Subjects: Learning
    Abstract

    Designing and implementing efficient, provably correct parallel machine
    learning (ML) algorithms is challenging. Existing high-level parallel
    abstractions like MapReduce are insufficiently expressive while low-level tools
    like MPI and Pthreads leave ML experts repeatedly solving the same design
    challenges. By targeting common patterns in ML, we developed GraphLab, which
    improves upon abstractions like MapReduce by compactly expressing asynchronous
    iterative algorithms with sparse computational dependencies while ensuring data
    consistency and achieving a high degree of parallel performance.

  160. Fast ABC-Boost for Multi-Class Classification.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    Abc-boost is a new line of boosting algorithms for multi-class
    classification, by utilizing the commonly used sum-to-zero constraint. To
    implement abc-boost, a base class must be identified at each boosting step.
    Prior studies used a very expensive procedure based on exhaustive search for
    determining the base class at each boosting step. Good testing performances of
    abc-boost (implemented as abc-mart and abc-logitboost) on a variety of datasets
    were reported.

  161. MINLIP for the Identification of Monotone Wiener Systems.

    Authors: Kristiaan Pelckmans
    Subjects: Learning
    Abstract

    This paper studies the MINLIP estimator for the identification of Wiener
    systems consisting of a sequence of a linear FIR dynamical model, and a
    monotonically increasing (or decreasing) static function. Given $T$
    observations, this algorithm boils down to solving a convex quadratic program
    with $O(T)$ variables and inequality constraints, implementing an inference
    technique which is based entirely on model complexity control.

  162. Predictive PAC learnability: a paradigm for learning from exchangeable input data.

    Authors: Vladimir Pestov
    Subjects: Learning
    Abstract

    Exchangeable random variables form an important and well-studied
    generalization of i.i.d. variables, however simple examples show that no
    nontrivial concept or function classes are PAC learnable under general
    exchangeable data inputs $X_1,X_2,\ldots$. Inspired by the work of Berti and
    Rigo on a Glivenko--Cantelli theorem for exchangeable inputs, we propose a new
    paradigm, adequate for learning from exchangeable data: predictive PAC
    learnability.

  163. Uncovering the Riffled Independence Structure of Rankings.

    Authors: Jonathan Huang, Carlos Guestrin
    Subjects: Learning
    Abstract

    Representing distributions over permutations can be a daunting task due to
    the fact that the number of permutations of $n$ objects scales factorially in
    $n$. One recent way that has been used to reduce storage complexity has been to
    exploit probabilistic independence, but as we argue, full independence
    assumptions impose strong sparsity constraints on distributions and are
    unsuitable for modeling rankings.

  164. Online Learning: Random Averages, Combinatorial Parameters, and Learnability.

    Authors: Ambuj Tewari, Karthik Sridharan, Alexander Rakhlin
    Subjects: Learning
    Abstract

    We study learnability in the online learning model. We define several
    complexity measures which capture the difficulty of learning in a sequential
    manner. Among these measures are analogues of Rademacher complexity, covering
    numbers and fat shattering dimension from statistical learning theory.
    Relationship among these complexity measures, their connection to online
    learning, and tools for bounding them are provided. In the setting of
    supervised learning, finiteness of the introduced scale-sensitive parameters is
    shown to be equivalent to learnability.

  165. Prediction with Advice of Unknown Number of Experts.

    Authors: Vladimir Vovk, Alexey Chernov
    Subjects: Learning
    Abstract

    In the framework of prediction with expert advice, we consider a recently
    introduced kind of regret bounds: the bounds that depend on the effective
    instead of nominal number of experts. In contrast to the NormalHedge bound,
    which mainly depends on the effective number of experts and also weakly depends
    on the nominal one, we obtain a bound that does not contain the nominal number
    of experts at all. We use the defensive forecasting method and introduce an
    application of defensive forecasting to multivalued supermartingales.

  166. Sequence prediction in realizable and non-realizable cases.

    Authors: Daniil Ryabko
    Subjects: Learning
    Abstract

    A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated
    according to some unknown probabilistic law (measure) $\mu$. After observing
    each outcome, it is required to give the conditional probabilities of the next
    observation. The realizable case is when the measure $\mu$ belongs to an
    arbitrary but known class $C$ of process measures. The non-realizable case is
    when $\mu$ is completely arbitrary, but the prediction performance is measured
    with respect to a given set $C$ of process measures.

  167. Multi-View Active Learning in the Non-Realizable Case.

    Authors: Wei Wang, Zhi-Hua Zhou
    Subjects: Learning
    Abstract

    Many classical bounds on the sample complexity of active learning based on
    the realizability assumption have been derived, which show that active learning
    can exponentially improve the sample complexity over passive learning. However,
    this realizability assumption could not be met in practice and few results on
    the exponential improvement in the sample complexity in the non-realizable case
    has been obtained. In this paper, we theoretically characterize the sample
    complexity of active learning in the non-realizable case with Tsybakov noise
    condition under the multi-view setting.

  168. On the clustering aspect of nonnegative matrix factorization.

    Authors: Andri Mirzal, Masashi Furukawa
    Subjects: Learning
    Abstract

    This paper provides a theoretical explanation on the clustering aspect of
    nonnegative matrix factorization (NMF). We prove that even without imposing
    orthogonality or sparsity constraint on the basis and/or coefficient matrix,
    NMF still can give clustering results, thus providing a theoretical support for
    the works of Xu et al. [1] and Kim et al. [2], where the authors showed the
    superiority of the standard NMF as a clustering method.

  169. Empirical learning aided by weak domain knowledge in the form of feature importance.

    Authors: Ridwan Al Iqbal
    Subjects: Learning
    Abstract

    Standard empirical learners that use domain knowledge require stronger
    knowledge that is hard and expensive to acquire. However, weaker domain
    knowledge can benefit from prior knowledge while being cost effective. Weak
    knowledge in the form of feature relative importance (FRI) is presented and
    explained. Feature relative importance is a real valued approximation of a
    feature's importance provided by experts. Advantage of using this knowledge is
    demonstrated by IANN, a modified multilayer neural network algorithm.

  170. Using a Kernel Adatron for Object Classification with RCS Data.

    Authors: Marten F. Byl, James T. Demers, Edward A. Rietman
    Subjects: Learning
    Abstract

    Rapid identification of object from radar cross section (RCS) signals is
    important for many space and military applications. This identification is a
    problem in pattern recognition which either neural networks or support vector
    machines should prove to be high-speed. Bayesian networks would also provide
    value but require significant preprocessing of the signals. In this paper, we
    describe the use of a support vector machine for object identification from
    synthesized RCS data.

  171. Online Learning of Noisy Data with Kernels.

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We study online learning when individual instances are corrupted by random
    noise. We assume the noise distribution is unknown, and may change over time
    with no restriction other than having zero mean and bounded variance. Our
    technique relies on a family of unbiased estimators for non-linear functions,
    which may be of independent interest. We show that a variant of online gradient
    descent can learn functions in any dot-product (e.g., polynomial) or Gaussian
    kernel space with any analytic convex loss function.

  172. Prediction with Expert Advice under Discounted Loss.

    Authors: Fedor Zhdanov, Alexey Chernov
    Subjects: Learning
    Abstract

    We study prediction with expert advice in the setting where the losses are
    accumulated with some discounting---the impact of old losses may gradually
    vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm
    for Regression to this case, propose a suitable new variant of exponential
    weights algorithm, and prove respective loss bounds.

  173. S4VM: Safe Semi-Supervised Support Vector Machine.

    Authors: Zhi-Hua Zhou, Yu-Feng Li
    Subjects: Learning
    Abstract

    Semi-supervised learning tries to improve performance by using unlabeled
    data. In some situations, however, its performance may become inferior to that
    of without using unlabeled data. It is desired to have safe semi-supervised
    methods which often improve the performance while rarely degenerate the
    performance.

  174. The Complex Gaussian Kernel LMS algorithm.

    Authors: Sergios Theodoridis, Pantelis Bouboulis
    Subjects: Learning
    Abstract

    Although the real reproducing kernels are used in an increasing number of
    machine learning problems, complex kernels have not, yet, been used, in spite
    of their potential interest in applications such as communications. In this
    work, we focus our attention on the complex gaussian kernel and its possible
    application in the complex Kernel LMS algorithm. In order to derive the
    gradients needed to develop the complex kernel LMS (CKLMS), we employ the
    powerful tool of Wirtinger's Calculus, which has recently attracted much
    attention in the signal processing community.

  175. Extension of Wirtinger Calculus in RKH Spaces and the Complex Kernal LMS.

    Authors: Sergios Theodoridis, Pantelis Bouboulis
    Subjects: Learning
    Abstract

    Over the last decade, kernel methods for nonlinear processing have
    successfully been used in the machine learning community. However, so far, the
    emphasis has been on batch techniques. It is only recently, that online
    adaptive techniques have been considered in the context of signal processing
    tasks. To the best of our knowledge, no kernel-based strategy has been
    developed, so far, that is able to deal with complex valued signals. In this
    paper, we take advantage of a technique called complexification of real RKHSs
    to attack this problem.

  176. Clustering processes.

    Authors: Daniil Ryabko
    Subjects: Learning
    Abstract

    The problem of clustering is considered, for the case when each data point is
    a sample generated by a stationary ergodic process. We propose a very natural
    asymptotic notion of consistency, and show that simple consistent algorithms
    exist, under most general non-parametric assumptions. The notion of consistency
    is as follows: two samples should be put into the same cluster if and only if
    they were generated by the same distribution. With this notion of consistency,
    clustering generalizes such classical statistical problems as homogeneity
    testing and process classification.

  177. Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data.

    Authors: Mohak Shah, Mario Marchand, Jacques Corbeil
    Subjects: Learning
    Abstract

    One of the objectives of designing feature selection learning algorithms is
    to obtain classifiers that depend on a small number of attributes and have
    verifiable future performance guarantees. There are few, if any, approaches
    that successfully address the two goals simultaneously. Performance guarantees
    become crucial for tasks such as microarray data analysis due to very small
    sample sizes resulting in limited empirical evaluation.

  178. Statistical Learning in Automated Troubleshooting: Application to LTE Interference Mitigation.

    Authors: Moazzam Islam Tiwana, Berna Sayrac, Zwi Altman
    Subjects: Learning
    Abstract

    This paper presents a method for automated healing as part of off-line
    automated troubleshooting. The method combines statistical learning with
    constraint optimization. The automated healing aims at locally optimizing radio
    resource management (RRM) or system parameters of cells with poor performance
    in an iterative manner. The statistical learning processes the data using
    Logistic Regression (LR) to extract closed form (functional) relations between
    Key Performance Indicators (KPIs) and Radio Resource Management (RRM)
    parameters.

  179. Adaptive Bases for Reinforcement Learning.

    Authors: Shie Mannor, Dotan Di Castro
    Subjects: Learning
    Abstract

    We consider the problem of reinforcement learning using function
    approximation, where the approximating basis can change dynamically while
    interacting with the environment. A motivation for such an approach is
    maximizing the value function fitness to the problem faced. Three errors are
    considered: approximation square error, Bellman residual, and projected Bellman
    residual. Algorithms under the actor-critic framework are presented, and shown
    to converge. The advantage of such an adaptive basis is demonstrated in
    simulations.

  180. Generative and Latent Mean Map Kernels.

    Authors: Nishant A. Mehta, Alexander G. Gray
    Subjects: Learning
    Abstract

    We introduce two kernels that extend the mean map, which embeds probability
    measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth
    similarity measure between probabilistic models. The latent mean map kernel
    (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of
    empirical distributions in order to incorporate latent variable models. When
    comparing certain classes of distributions, the GMMK exhibits beneficial
    regularization and generalization properties not shown for previous generative
    kernels.

  181. Clustering processes.

    Authors: Daniil Ryabko
    Subjects: Learning
    Abstract

    The problem of clustering is considered, for the case when each data point is
    a sample generated by a stationary ergodic process. We propose a very natural
    asymptotic notion of consistency, and show that simple consistent algorithms
    exist, under most general non-parametric assumptions. The notion of consistency
    is as follows: two samples should be put into the same cluster if and only if
    they were generated by the same distribution. With this notion of consistency,
    clustering generalizes such classical statistical problems as homogeneity
    testing and process classification.

  182. Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence.

    Authors: Olivier Cappé, Sarah Filippi, Aurélien Garivier
    Subjects: Learning
    Abstract

    We consider model-based reinforcement learning in finite Markov Decision
    Processes (MDPs), focussing on so-called optimistic strategies. Optimism is
    usually implemented by carrying out extended value iterations, under a
    constraint of consistency with the estimated model transition probabilities. In
    this paper, we strongly argue in favor of using the Kullback-Leibler (KL)
    divergence for this purpose. By study- ing the linear maximization problem
    under KL constraints, we provide an efficient algorithm for solving
    KL-optimistic extended value iteration.

  183. Polynomial Learning of Distribution Families.

    Authors: Mikhail Belkin, Kaushik Sinha
    Subjects: Learning
    Abstract

    The question of polynomial learnability of probability distributions,
    particularly Gaussian mixture distributions, has recently received significant
    attention in theoretical computer science and machine learning. However,
    despite major progress, the general question of polynomial learnability of
    Gaussian mixture distributions still remained open. The current work resolves
    the question of polynomial learnability for Gaussian mixtures in high dimension
    with an arbitrary fixed number of components.

  184. Efficient Learning with Partially Observed Attributes.

    Authors: Shai Shalev-Shwartz, Ohad Shamir, Nicolò Cesa-Bianchi
    Subjects: Learning
    Abstract

    We describe and analyze efficient algorithms for learning a linear predictor
    from examples when the learner can only view a few attributes of each training
    example. This is the case, for instance, in medical research, where each
    patient participating in the experiment is only willing to go through a small
    number of tests. Our analysis bounds the number of additional examples
    sufficient to compensate for the lack of full information on each training
    example.

  185. Bregman Distance to L1 Regularized Logistic Regression.

    Authors: Mithun Das Gupta, Thomas S. Huang
    Subjects: Learning
    Abstract

    In this work we investigate the relationship between Bregman distances and
    regularized Logistic Regression model. We present a detailed study of Bregman
    Distance minimization, a family of generalized entropy measures associated with
    convex functions. We convert the L1-regularized logistic regression into this
    more general framework and propose a primal-dual method based algorithm for
    learning the parameters.

  186. Generation and Interpretation of Temporal Decision Rules.

    Authors: Kamran Karimi, Howard J. Hamilton
    Subjects: Learning
    Abstract

    We present a solution to the problem of understanding a system that produces
    a sequence of temporally ordered observations. Our solution is based on
    generating and interpreting a set of temporal decision rules. A temporal
    decision rule is a decision rule that can be used to predict or retrodict the
    value of a decision attribute, using condition attributes that are observed at
    times other than the decision attribute's time of observation.

  187. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory.

    Authors: Sumio Watanabe
    Subjects: Learning
    Abstract

    In regular statistical models, it is well known that the cross validation
    leaving one out is asymptotically equivalent to Akaike information criterion.
    However, a lot of learning machines are singular statistical models, resulting
    that the asymptotic behavior of the cross validation has been left unknown. In
    previous papers, we established singular learning theory and proposed a widely
    applicable information criterion whose expectation value is asymptotically
    equal to the average Bayes generalization loss.

  188. Dynamic Policy Programming.

    Authors: Mohammad Gheshlaghi Azar, Hilbert J. Kappen
    Subjects: Learning
    Abstract

    In this paper, we consider the problem of planning and learning in the
    infinite-horizon discounted-reward Markov decision problems. We propose a novel
    iterative direct policy-search approach, called dynamic policy programming
    (DPP). DPP is, to the best of our knowledge, the first convergent direct
    policy-search method that uses a Bellman-like iteration technique and at the
    same time is compatible with function approximation. For the tabular case, we
    prove that DPP converges asymptotically to the optimal policy.

  189. State-Space Dynamics Distance for Clustering Sequential Data.

    Authors: Darío García-García, Emilio Parrado-Hernández, Fernando Díaz-de-María
    Subjects: Learning
    Abstract

    This paper proposes a novel similarity measure for clustering sequential
    data. We first construct a common state-space by training a single
    probabilistic model with all the sequences in order to get a unified
    representation for the dataset. Then, distances are obtained attending to the
    transition matrices induced by each sequence in that state-space.

  190. An Analytical Study on Behavior of Clusters Using K Means, EM and K* Means Algorithm.

    Authors: M. Punithavalli, G. Nathiya, S. C. Punitha
    Subjects: Learning
    Abstract

    Clustering is an unsupervised learning method that constitutes a cornerstone
    of an intelligent data analysis process. It is used for the exploration of
    inter-relationships among a collection of patterns, by organizing them into
    homogeneous clusters. Clustering has been dynamically applied to a variety of
    tasks in the field of Information Retrieval (IR). Clustering has become one of
    the most active area of research and the development.

  191. Design and Implementation of an Intelligent Educational Model Based on Personality and Learner's Emotion.

    Authors: Somayeh Fatahi, Nasser Ghasem-Aghaee
    Subjects: Learning
    Abstract

    The Personality and emotions are effective parameters in learning process.
    Thus, virtual learning environments should pay attention to these parameters.
    In this paper, a new e-learning model is designed and implemented according to
    these parameters. The Virtual learning environment that is presented here uses
    two agents: Virtual Tutor Agent (VTA), and Virtual Classmate Agent (VCA).
    During the learning process and depending on events happening in the
    environment, learner's emotions are changed.

  192. Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis coding.

    Authors: Phanu Waraporn, Phayung Meesad, Gareth Clayton
    Subjects: Learning
    Abstract

    This paper discusses the knowledge integration of clinical information
    extracted from distributed medical ontology in order to ameliorate a machine
    learning-based multi-label coding assignment system. The proposed approach is
    implemented using a decision tree based cascade hierarchical technique on the
    university hospital data for patients with Coronary Heart Disease (CHD). The
    preliminary results obtained show a satisfactory finding.

  193. On Tsallis Entropy Bias and Generalized Maximum Entropy Models.

    Authors: Peng Zhang, Yuexian Hou, Tingxu Yan, Dawei Song, Wenjie Li
    Subjects: Learning
    Abstract

    In density estimation task, maximum entropy model (Maxent) can effectively
    use reliable prior information via certain constraints, i.e., linear
    constraints without empirical parameters. However, reliable prior information
    is often insufficient, and the selection of uncertain constraints becomes
    necessary but poses considerable implementation complexity. Improper setting of
    uncertain constraints can result in overfitting or underfitting. To solve this
    problem, a generalization of Maxent, under Tsallis entropy framework, is
    proposed.

  194. Using Rough Set and Support Vector Machine for Network Intrusion Detection.

    Authors: Rung-Ching Chen, Kai-Fan Cheng, Chia-Fen Hsieh
    Subjects: Learning
    Abstract

    The main function of IDS (Intrusion Detection System) is to protect the
    system, analyze and predict the behaviors of users. Then these behaviors will
    be considered an attack or a normal behavior. Though IDS has been developed for
    many years, the large number of return alert messages makes managers maintain
    system inefficiently. In this paper, we use RST (Rough Set Theory) and SVM
    (Support Vector Machine) to detect intrusions. First, RST is used to preprocess
    the data and reduce the dimensions. Next, the features were selected by RST
    will be sent to SVM model to learn and test respectively.

  195. An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms.

    Authors: John Langford, Wei Chu, Lihong Li
    Subjects: Learning
    Abstract

    Offline evaluation of reinforcement learning algorithms based on collected
    data (state transitions and rewards) has remained a challenging problem. Common
    practice is to create a simulator based on collected data and then run the
    algorithm against this simulator. Such an approach involves creating a
    simulator of the problem at hand, which is often difficult and may introduce
    bias to the evaluation results. In this paper, we introduce an offline
    evaluation method for a subclass of reinforcement learning problems known as
    contextual bandits.

  196. Etiqueter un corpus oral par apprentissage automatique \`a l'aide de connaissances linguistiques.

    Authors: Iris Eshkol, Isabelle Tellier, Taalab Samer, Sylvie Billot
    Subjects: Learning
    Abstract

    Thanks to the Eslo1 ("Enqu\^ete sociolinguistique d'Orl\'eans", i.e.
    "Sociolinguistic Inquiery of Orl\'eans") campain, a large oral corpus has been
    gathered and transcribed in a textual format. The purpose of the work presented
    here is to associate a morpho-syntactic label to each unit of this corpus. To
    this aim, we have first studied the specificities of the necessary labels, and
    their various possible levels of description. This study has led to a new
    original hierarchical structuration of labels.

  197. Large Margin Boltzmann Machines and Large Margin Sigmoid Belief Networks.

    Authors: Xu Miao, Rajesh P.N. Rao
    Subjects: Learning
    Abstract

    Current statistical models for structured prediction make simplifying
    assumptions about the underlying output graph structure, such as assuming a
    low-order Markov chain, because exact inference becomes intractable as the
    tree-width of the underlying graph increases. Approximate inference algorithms,
    on the other hand, force one to trade off representational power with
    computational efficiency.

  198. Plagiarism Detection using ROUGE and WordNet.

    Authors: Chien-Ying Chen, Jen-Yuan Yeh, Hao-Ren Ke
    Subjects: Learning
    Abstract

    With the arrival of digital era and Internet, the lack of information control
    provides an incentive for people to freely use any content available to them.
    Plagiarism occurs when users fail to credit the original owner for the content
    referred to, and such behavior leads to violation of intellectual property. Two
    main approaches to plagiarism detection are fingerprinting and term occurrence;
    however, one common weakness shared by both approaches, especially
    fingerprinting, is the incapability to detect modified text plagiarism.

  199. A Novel Clustering Algorithm Based Upon Games on Evolving Network.

    Authors: Zhuo Chen, Qiang Li, Yan He, Jing-ping Jiang
    Subjects: Learning
    Abstract

    This paper introduces a model based upon games on an evolving network, and
    develops three clustering algorithms according to it. In the clustering
    algorithms, data points for clustering are regarded as players who can make
    decisions in games. On the network describing relationships among data points,
    an edge-removing-and-rewiring (ERR) function is employed to explore in a
    neighborhood of a data point, which removes edges connecting to neighbors with
    small payoffs, and creates new edges to neighbors with larger payoffs. As such,
    the connections among data points vary over time.

  200. A New Heuristic for Feature Selection by Consistent Biclustering.

    Authors: Antonio Mucherino, Sonia Cafieri
    Subjects: Learning
    Abstract

    Given a set of data, biclustering aims at finding simultaneous partitions in
    biclusters of its samples and of the features which are used for representing
    the samples. Consistent biclusterings allow to obtain correct classifications
    of the samples from the known classification of the features, and vice versa,
    and they are very useful for performing supervised classifications.

  201. Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory.

    Authors: Sumio Watanabe
    Subjects: Learning
    Abstract

    Bayes statistics and statistical physics have the common mathematical
    structure, where the log likelihood function corresponds to the random
    Hamiltonian. Recently, it was discovered that the asymptotic learning curves in
    Bayes estimation are subject to a universal law, even if the log likelihood
    function can not be approximated by any quadratic form. However, it is left
    unknown what mathematical property ensures such a universal law.

  202. Near-Optimal Evasion of Convex-Inducing Classifiers.

    Authors: Benjamin I. P. Rubinstein, Ling Huang, Blaine Nelson, Anthony D. Joseph, Shing-hon Lau, Steven J. Lee, Satish Rao, Anthony Tran, J. D. Tygar
    Subjects: Learning
    Abstract

    Classifiers are often used to detect miscreant activities. We study how an
    adversary can efficiently query a classifier to elicit information that allows
    the adversary to evade detection at near-minimal cost. We generalize results of
    Lowd and Meek (2005) to convex-inducing classifiers. We present algorithms that
    construct undetected instances of near-minimal cost using only polynomially
    many queries in the dimension of the space and without reverse engineering the
    decision boundary.

  203. Structure-Aware Stochastic Control for Transmission Scheduling.

    Authors: Mihaela van der Schaar, Fangwen Fu
    Subjects: Learning
    Abstract

    In this paper, we consider the problem of real-time transmission scheduling
    over time-varying channels. We first formulate the transmission scheduling
    problem as a Markov decision process (MDP) and systematically unravel the
    structural properties (e.g. concavity in the state-value function and
    monotonicity in the optimal scheduling policy) exhibited by the optimal
    solutions. We then propose an online learning algorithm which preserves these
    structural properties and achieves -optimal solutions for an arbitrarily small
    .

  204. Supermartingales in Prediction with Expert Advice.

    Authors: Vladimir Vovk, Fedor Zhdanov, Yuri Kalnishkan, Alexey Chernov
    Subjects: Learning
    Abstract

    We apply the method of defensive forecasting, based on the use of
    game-theoretic supermartingales, to prediction with expert advice. In the
    traditional setting of a countable number of experts and a finite number of
    outcomes, the Defensive Forecasting Algorithm is very close to the well-known
    Aggregating Algorithm. Not only the performance guarantees but also the
    predictions are the same for these two methods of fundamentally different
    nature. We discuss also a new setting where the experts can give advice
    conditional on the learner's future decision.

  205. Exponential Family Hybrid Semi-Supervised Learning.

    Authors: Arvind Agarwal, Hal Daume III
    Subjects: Learning
    Abstract

    We present an approach to semi-supervised learning based on an exponential
    family characterization. Our approach generalizes previous work on coupled
    priors for hybrid generative/discriminative models. Our model is more flexible
    and natural than previous approaches. Experimental results on several data sets
    show that our approach also performs better in practice.

  206. Statistical and Computational Tradeoffs in Stochastic Composite Likelihood.

    Authors: Joshua V Dillon, Guy Lebanon
    Subjects: Learning
    Abstract

    Maximum likelihood estimators are often of limited practical use due to the
    intensive computation they require. We propose a family of alternative
    estimators that maximize a stochastic variation of the composite likelihood
    function. Each of the estimators resolve the computation-accuracy tradeoff
    differently, and taken together they span a continuous spectrum of
    computation-accuracy tradeoff resolutions. We prove the consistency of the
    estimators, provide formulas for their asymptotic variance, statistical
    robustness, and computational complexity.

  207. Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels.

    Authors: Krishnakumar Balasubramanian, Guy Lebanon, Pinar Donmez
    Subjects: Learning
    Abstract

    Many popular linear classifiers, such as logistic regression, boosting, or
    SVM, are trained by optimizing a margin-based risk function. Traditionally,
    these risk functions are computed based on a labeled dataset. We develop a
    novel technique for estimating such risks using only unlabeled data and p(y).
    We prove that the technique is consistent for high-dimensional linear
    classifiers and demonstrate it on synthetic and real-world data.

  208. A Unified Algorithmic Framework for Multi-Dimensional Scaling.

    Authors: Jeff M. Phillips, Suresh Venkatasubramanian, Arvind Agarwal
    Subjects: Learning
    Abstract

    In this paper, we propose a unified algorithmic framework for solving many
    known variants of \mds. Our algorithm is a simple iterative scheme with
    guaranteed convergence, and is \emph{modular}; by changing the internals of a
    single subroutine in the algorithm, we can switch cost functions and target
    spaces easily. In addition to the formal guarantees of convergence, our
    algorithms are accurate; in most cases, they converge to better quality
    solutions than existing methods, in comparable time.

  209. Model Selection with the Loss Rank Principle.

    Authors: Marcus Hutter, Minh-Ngoc Tran
    Subjects: Learning
    Abstract

    A key issue in statistics and machine learning is to automatically select the
    "right" model complexity, e.g., the number of neighbors to be averaged over in
    k nearest neighbor (kNN) regression or the polynomial degree in regression with
    polynomials. We suggest a novel principle - the Loss Rank Principle (LoRP) -
    for model selection in regression and classification. It is based on the loss
    rank, which counts how many other (fictitious) data would be fitted better.
    LoRP selects the model that has minimal loss rank.

  210. Asymptotic Analysis of Generative Semi-Supervised Learning.

    Authors: Joshua V Dillon, Krishnakumar Balasubramanian, Guy Lebanon
    Subjects: Learning
    Abstract

    Semisupervised learning has emerged as a popular framework for improving
    modeling accuracy while controlling labeling cost. Based on an extension of
    stochastic composite likelihood we quantify the asymptotic accuracy of
    generative semi-supervised learning. In doing so, we complement
    distribution-free analysis by providing an alternative framework to measure the
    value associated with different labeling policies and resolve the fundamental
    question of how much data to label and in what manner.

  211. Non-Sparse Regularization and Efficient Training with Multiple Kernels.

    Authors: Marius Kloft, Ulf Brefeld, Soeren Sonnenburg, Alexander Zien
    Subjects: Learning
    Abstract

    Learning linear combinations of multiple kernels is an appealing strategy
    when the right choice of features is unknown. Previous approaches to multiple
    kernel learning (MKL) promote sparse kernel combinations to support
    interpretability and scalability. Unfortunately, this $\ell_1$-norm MKL is
    rarely observed to outperform trivial baselines in practical applications. To
    allow for robust kernel mixtures, we generalize MKL to arbitrary norms.

  212. Learning from Logged Implicit Exploration Data.

    Authors: John Langford, Alex Strehl, Sham Kakade
    Subjects: Learning
    Abstract

    We provide a sound and consistent foundation for the use of \emph{nonrandom}
    exploration data in "contextual bandit" or "partially labeled" settings where
    only the value of a chosen action is learned.

  213. A Contextual-Bandit Approach to Personalized News Article Recommendation.

    Authors: John Langford, Wei Chu, Lihong Li, Robert E. Schapire
    Subjects: Learning
    Abstract

    Personalized web services strive to adapt their services (advertisements,
    news articles, etc) to individual users by making use of both content and user
    information. Despite a few recent advances, this problem remains challenging
    for at least two reasons. First, web service is featured with dynamically
    changing pools of content, rendering traditional collaborative filtering
    methods inapplicable. Second, the scale of most web services of practical
    interest calls for solutions that are both fast in learning and computation.

  214. Gaussian Process Structural Equation Models with Latent Variables.

    Authors: Ricardo Silva
    Subjects: Learning
    Abstract

    In a variety of disciplines such as social sciences, psychology, medicine and
    economics, the recorded data are considered to be noisy measurements of latent
    variables connected by some causal structure. This corresponds to a family of
    graphical models known as the structural equation model with latent variables.
    While linear non-Gaussian variants have been well-studied, inference in
    nonparametric structural equation models is still underdeveloped.

  215. Less Regret via Online Conditioning.

    Authors: Matthew Streeter, H. Brendan McMahan
    Subjects: Learning
    Abstract

    We analyze and evaluate an online gradient descent algorithm with adaptive
    per-coordinate adjustment of learning rates. Our algorithm can be thought of as
    an online version of batch gradient descent with a diagonal preconditioner.
    This approach leads to regret bounds that are stronger than those of standard
    online gradient descent for general online convex optimization problems.
    Experimentally, we show that our algorithm is competitive with state-of-the-art
    algorithms for large scale machine learning problems.

  216. Linearly Parameterized Bandits.

    Authors: Paat Rusmevichientong, John N. Tsitsiklis
    Subjects: Learning
    Abstract

    We consider bandit problems involving a large (possibly infinite) collection
    of arms, in which the expected reward of each arm is a linear function of an
    $r$-dimensional random vector $\mathbf{Z} \in \mathbb{R}^r$, where $r \geq 2$.
    The objective is to minimize the cumulative regret and Bayes risk.

  217. An Optimal High Probability Algorithm for the Contextual Bandit Problem.

    Authors: John Langford, Alina Beygelzimer, Lihong Li, Lev Reyzin, Robert E. Schapire
    Subjects: Learning
    Abstract

    We consider the problem of learning to predict with expert advice in an
    adversarial, on-line bandit setting. We study how to behave in a way that
    achieves nearly as much reward as the best expert with high probability, rather
    than in expectation. We provide the algorithm Exp4.P for solving this
    contextual bandit problem. We prove that Exp4.P competes with any set of
    policies or experts of size $N$ while incurring regret at most
    $O(\sqrt{KT\ln(N/\delta)})$ with probability $1-\delta$, where $K$ is the
    number of actions and $T$ is the number of rounds of interaction.

  218. Word level Script Identification from Bangla and Devanagri Handwritten Texts mixed with Roman Script.

    Authors: Nibaran Das, Ram Sarkar, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri, Dipak Kumar Basu
    Subjects: Learning
    Abstract

    India is a multi-lingual country where Roman script is often used alongside
    different Indic scripts in a text document. To develop a script specific
    handwritten Optical Character Recognition (OCR) system, it is therefore
    necessary to identify the scripts of handwritten text correctly.

  219. Supervised Classification Performance of Multispectral Images.

    Authors: R. Bhaskaran, K. Perumal
    Subjects: Learning
    Abstract

    Nowadays government and private agencies use remote sensing imagery for a
    wide range of applications from military applications to farm development. The
    images may be a panchromatic, multispectral, hyperspectral or even
    ultraspectral of terra bytes. Remote sensing image classification is one
    amongst the most significant application worlds for remote sensing. A few
    number of image classification algorithms have proved good precision in
    classifying remote sensing data.

  220. What Can We Learn Privately?.

    Authors: Adam Smith, Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova
    Subjects: Learning
    Abstract

    Learning problems form an important category of computational tasks that
    generalizes many of the computations researchers apply to large real-life data
    sets. We ask: what concept classes can be learned privately, namely, by an
    algorithm whose output does not depend too heavily on any one input or specific
    training example? More precisely, we investigate learning algorithms that
    satisfy differential privacy, a notion that provides strong confidentiality
    guarantees in contexts where aggregate information is released about a database
    containing sensitive information about individuals.

  221. Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm.

    Authors: Ruslan Salakhutdinov, Nathan Srebro
    Subjects: Learning
    Abstract

    We show that matrix completion with trace-norm regularization can be
    significantly hurt when entries of the matrix are sampled non-uniformly. We
    introduce a weighted version of the trace-norm regularizer that works well also
    with non-uniform sampling. Our experimental results demonstrate that the
    weighted trace-norm regularization indeed yields significant gains on the
    (highly non-uniformly sampled) Netflix dataset.

  222. Application of k Means Clustering algorithm for prediction of Students Academic Performance.

    Authors: O. J. Oyelade, O. O. Oladipupo, I. C. Obagbuwa
    Subjects: Learning
    Abstract

    The ability to monitor the progress of students academic performance is a
    critical issue to the academic community of higher learning. A system for
    analyzing students results based on cluster analysis and uses standard
    statistical algorithms to arrange their scores data according to the level of
    their performance is described. In this paper, we also implemented k mean
    clustering algorithm for analyzing students result data.

  223. On the Stability of Empirical Risk Minimization in the Presence of Multiple Risk Minimizers.

    Authors: Benjamin I. P. Rubinstein, Aleksandr Simma
    Subjects: Learning
    Abstract

    Recently Kutin and Niyogi investigated several notions of algorithmic
    stability--a property of a learning map conceptually similar to
    continuity--showing that training-stability is sufficient for consistency of
    Empirical Risk Minimization while distribution-free CV-stability is necessary
    and sufficient for having finite VC-dimension.

  224. Online Distributed Sensor Selection.

    Authors: Andreas Krause, Daniel Golovin, Matthew Faulkner
    Subjects: Learning
    Abstract

    A key problem in sensor networks is to decide which sensors to query when, in
    order to obtain the most useful information (e.g., for performing accurate
    prediction), subject to constraints (e.g., on power and bandwidth). In many
    applications the utility function is not known a priori, must be learned from
    data, and can even change over time. Furthermore for large sensor networks
    solving a centralized optimization problem to select sensors is not feasible,
    and thus we seek a fully distributed solution.

  225. A CHAID Based Performance Prediction Model in Educational Data Mining.

    Authors: M. Ramaswami, R. Bhaskaran
    Subjects: Learning
    Abstract

    The performance in higher secondary school education in India is a turning
    point in the academic lives of all students. As this academic performance is
    influenced by many factors, it is essential to develop predictive data mining
    model for students' performance so as to identify the slow learners and study
    the influence of the dominant factors on their academic performance. In the
    present investigation, a survey cum experimental methodology was adopted to
    generate a database and it was constructed from a primary and a secondary
    source.

  226. Aggregating Algorithm competing with Banach lattices.

    Authors: Fedor Zhdanov, Yuri Kalnishkan, Alexey Chernov
    Subjects: Learning
    Abstract

    The paper deals with on-line regression settings with signals belonging to a
    Banach lattice. Our algorithms work in a semi-online setting where all the
    inputs are known in advance and outcomes are unknown and given step by step. We
    apply the Aggregating Algorithm to construct a prediction method whose
    cumulative loss over all the input vectors is comparable with the cumulative
    loss of any linear functional on the Banach lattice. As a by-product we get an
    algorithm that takes signals from an arbitrary domain.

  227. X-Armed Bandits.

    Authors: Gilles Stoltz, Sébastien Bubeck, Rémi Munos, Csaba Szepesvari
    Subjects: Learning
    Abstract

    We consider a generalization of stochastic bandits where the set of arms,
    $\cX$, is allowed to be a generic measurable space and the mean-payoff function
    is "locally Lipschitz" with respect to a dissimilarity function that is known
    to the decision maker. Under this condition we construct an arm selection
    policy, called HOO hierarchical optimistic optimization), with improved regret
    bounds compared to previous results for a large class of problems.

  228. Role of Interestingness Measures in CAR Rule Ordering for Associative Classifier: An Empirical Approach.

    Authors: S.Kannan, R.Bhaskaran
    Subjects: Learning
    Abstract

    Associative Classifier is a novel technique which is the integration of
    Association Rule Mining and Classification. The difficult task in building
    Associative Classifier model is the selection of relevant rules from a large
    number of class association rules (CARs). A very popular method of ordering
    rules for selection is based on confidence, support and antecedent size (CSA).
    Other methods are based on hybrid orderings in which CSA method is combined
    with other measures.

  229. A parameter-free hedging algorithm.

    Authors: Yoav Freund, Kamalika Chaudhuri, Daniel Hsu
    Subjects: Learning
    Abstract

    We study the problem of decision-theoretic online learning (DTOL). Motivated
    by practical applications, we focus on DTOL when the number of actions is very
    large. Previous algorithms for learning in this framework have a tunable
    learning rate parameter, and a barrier to using online-learning in practical
    applications is that it is not understood how to set this parameter optimally,
    particularly when the number of actions is large.

  230. Tracking using explanation-based modeling.

    Authors: Yoav Freund, Kamalika Chaudhuri, Daniel Hsu
    Subjects: Learning
    Abstract

    We study the tracking problem, namely, estimating the hidden state of an
    object over time, from unreliable and noisy measurements. The standard
    framework for the tracking problem is the generative framework, which is the
    basis of solutions such as the Bayesian algorithm and its approximation, the
    particle filters. However, the problem with these solutions is that they are
    very sensitive to model mismatches. In this paper, motivated by online
    learning, we introduce a new framework -- an {\em explanatory} framework -- for
    tracking.

  231. Kernel machines with two layers and multiple kernel learning.

    Authors: Francesco Dinuzzo
    Subjects: Learning
    Abstract

    In this paper, the framework of kernel machines with two layers is
    introduced, generalizing classical kernel methods. The new learning methodology
    provide a formal connection between computational architectures with multiple
    layers and the theme of kernel learning in standard regularization methods.
    First, a representer theorem for two-layer networks is presented, showing that
    finite linear combinations of kernels on each layer are optimal architectures
    whenever the corresponding functions solve suitable variational problems in
    reproducing kernel Hilbert spaces (RKHS).

  232. An Empirical Evaluation of Four Algorithms for Multi-Class Classification: Mart, ABC-Mart, Robust LogitBoost, and ABC-LogitBoost.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    This empirical study is mainly devoted to comparing four tree-based boosting
    algorithms: mart, abc-mart, robust logitboost, and abc-logitboost, for
    multi-class classification on a variety of publicly available datasets. Some of
    those datasets have been thoroughly tested in prior studies using a broad range
    of classification algorithms including SVM, neural nets, and deep learning.

    In terms of the empirical classification errors, our experiment results
    demonstrate:

  233. Measuring Latent Causal Structure.

    Authors: Ricardo Silva
    Subjects: Learning
    Abstract

    Discovering latent representations of the observed world has become
    increasingly more relevant in data analysis. Much of the effort concentrates on
    building latent variables which can be used in prediction problems, such as
    classification and regression. A related goal of learning latent structure from
    data is that of identifying which hidden common causes generate the
    observations, such as in applications that require predicting the effect of
    policies.

  234. Linear Probability Forecasting.

    Authors: Fedor Zhdanov, Yuri Kalnishkan
    Subjects: Learning
    Abstract

    Multi-class classification is one of the most important tasks in machine
    learning. In this paper we consider two online multi-class classification
    problems: classification by a linear model and by a kernelized model. The
    quality of predictions is measured by the Brier loss function. We suggest two
    computationally efficient algorithms to work with these problems and prove
    theoretical guarantees on their losses. We kernelize one of the algorithms and
    prove theoretical guarantees on its loss. We perform experiments and compare
    our algorithms with logistic regression.

  235. Optimal Query Complexity for Reconstructing Hypergraphs.

    Authors: Nader H. Bshouty, Hanna Mazzawi
    Subjects: Learning
    Abstract

    In this paper we consider the problem of reconstructing a hidden weighted
    hypergraph of constant rank using additive queries. We prove the following: Let
    $G$ be a weighted hidden hypergraph of constant rank with n vertices and $m$
    hyperedges. For any $m$ there exists a non-adaptive algorithm that finds the
    edges of the graph and their weights using $$ O(\frac{m\log n}{\log m}) $$
    additive queries. This solves the open problem in [S. Choi, J. H. Kim. Optimal
    Query Complexity Bounds for Finding Graphs. {\em STOC}, 749--758,~2008].

  236. Predictive Hypothesis Identification.

    Authors: Marcus Hutter
    Subjects: Learning
    Abstract

    While statistics focusses on hypothesis testing and on estimating (properties
    of) the true sampling distribution, in machine learning the performance of
    learning algorithms on future data is the primary issue. In this paper we
    bridge the gap with a general principle (PHI) that identifies hypotheses with
    best predictive performance. This includes predictive point and interval
    estimation, simple and composite hypothesis testing, (mixture) model selection,
    and others as special cases. For concrete instantiations we will recover
    well-known methods, variations thereof, and new ones.

  237. On Finding Predictors for Arbitrary Families of Processes.

    Authors: Daniil Ryabko
    Subjects: Learning
    Abstract

    The problem is sequence prediction in the following setting. A sequence
    $x_1,...,x_n,...$ of discrete-valued observations is generated according to
    some unknown probabilistic law (measure) $\mu$. After observing each outcome,
    it is required to give the conditional probabilities of the next observation.
    The measure $\mu$ belongs to an arbitrary but known class $C$ of stochastic
    process measures. We are interested in predictors $\rho$ whose conditional
    probabilities converge (in some sense) to the "true" $\mu$-conditional
    probabilities if any $\mu\in C$ is chosen to generate the sequence.

  238. Gaussian Process Bandits without Regret: An Experimental Design Approach.

    Authors: Sham M. Kakade, Niranjan Srinivas, Andreas Krause, Matthias Seeger
    Subjects: Learning
    Abstract

    We consider the problem of optimizing an unknown, noisy function that is
    expensive to evaluate. We cast this problem as a multiarmed bandit problem
    where the payoff function is sampled from a Gaussian Process. We resolve an
    important open problem on deriving regret bounds for this setting. In
    particular, we analyze an upper confidence algorithm and bound its cumulative
    regret in terms of the maximal information gain due to sampling, thus
    connecting Gaussian Process bandits and optimal experimental design.

  239. Performance Analysis of AIM-K-means & K-means in Quality Cluster Generation.

    Authors: Samarjeet Borah, Mrinal Kanti Ghose
    Subjects: Learning
    Abstract

    Among all the partition based clustering algorithms K-means is the most
    popular and well known method. It generally shows impressive results even in
    considerably large data sets. The computational complexity of K-means does not
    suffer from the size of the data set. The main disadvantage faced in performing
    this clustering is that the selection of initial means.

  240. On the Dual Formulation of Boosting Algorithms.

    Authors: Chunhua Shen, Hanxi Li
    Subjects: Learning
    Abstract

    We study boosting algorithms from a new perspective. We show that the
    Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with
    generalized hinge loss are all entropy maximization problems. By looking at the
    dual problems of these boosting algorithms, we show that the success of
    boosting algorithms can be understood in terms of maintaining a better margin
    distribution by maximizing margins and at the same time controlling the margin
    variance.We also theoretically prove that, approximately, AdaBoost maximizes
    the average margin, instead of the minimum margin.

  241. Closing the Learning-Planning Loop with Predictive State Representations.

    Authors: Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon
    Subjects: Learning
    Abstract

    A central problem in artificial intelligence is that of planning to maximize
    future reward under uncertainty in a partially observable environment. In this
    paper we propose and demonstrate a novel algorithm which accurately learns a
    model of such an environment directly from sequences of action-observation
    pairs. We then close the loop from observations to actions by planning in the
    learned model and recovering a policy which is near-optimal in the original
    environment.

  242. Early Detection of Breast Cancer using SVM Classifier Technique.

    Authors: S.Thamarai Selvi, Y.Ireaneus Anna Rejani
    Subjects: Learning
    Abstract

    This paper presents a tumor detection algorithm from mammogram. The proposed
    system focuses on the solution of two problems. One is how to detect tumors as
    suspicious regions with a very weak contrast to their background and another is
    how to extract features which categorize tumors. The tumor detection method
    follows the scheme of (a) mammogram enhancement. (b) The segmentation of the
    tumor area. (c) The extraction of features from the segmented tumor area. (d)
    The use of SVM classifier.

  243. Effect of Tuned Parameters on a LSA MCQ Answering Model.

    Authors: Alain Lifchitz, Sandra Jhean-Larose, Guy Denhière
    Subjects: Learning
    Abstract

    This paper presents the current state of a work in progress, whose objective
    is to better understand the effects of factors that significantly influence the
    performance of Latent Semantic Analysis (LSA). A difficult task, which consists
    in answering (French) biology Multiple Choice Questions, is used to test the
    semantic properties of the truncated singular space and to study the relative
    influence of main parameters. A dedicated software has been designed to fine
    tune the LSA semantic space for the Multiple Choice Questions task.

  244. Learning Mixtures of Gaussians using the k-means Algorithm.

    Authors: Kamalika Chaudhuri, Sanjoy Dasgupta, Andrea Vattani
    Subjects: Learning
    Abstract

    One of the most popular algorithms for clustering in Euclidean space is the
    $k$-means algorithm; $k$-means is difficult to analyze mathematically, and few
    theoretical guarantees are known about it, particularly when the data is {\em
    well-clustered}. In this paper, we attempt to fill this gap in the literature
    by analyzing the behavior of $k$-means on well-clustered data. In particular,
    we study the case when each cluster is distributed as a different Gaussian --
    or, in other words, when the input comes from a mixture of Gaussians.

  245. Differentially Private Support Vector Machines.

    Authors: Kamalika Chaudhuri, Anand Sarwate, Claire Monteleoni
    Subjects: Learning
    Abstract

    This paper addresses the problem of practical privacy-preserving machine
    learning: how to detect patterns in massive, real-world databases of sensitive
    personal information, while maintaining the privacy of individuals. Chaudhuri
    and Monteleoni (2008) recently provided privacy-preserving techniques for
    learning linear separators via regularized logistic regression. With the goal
    of handling large databases that may not be linearly separable, we provide
    privacy-preserving support vector machine algorithms.

  246. Cross-situational and supervised learning in the emergence of communication.

    Authors: José F. Fontanari, Angelo Cangelosi
    Subjects: Learning
    Abstract

    Scenarios for the emergence or bootstrap of a lexicon involve the repeated
    interaction between at least two agents who must reach a consensus on how to
    name N objects using H words. Here we consider minimal models of two types of
    learning algorithms: cross-situational learning, in which the individuals
    determine the meaning of a word by looking for something in common across all
    observed uses of that word, and supervised operant conditioning learning, in
    which there is strong feedback between individuals about the intended meaning
    of the words.

  247. Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning.

    Authors: Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, Nina Taft
    Subjects: Learning
    Abstract

    Several recent studies in privacy-preserving learning have considered the
    trade-off between utility or risk and the level of differential privacy
    guaranteed by mechanisms for statistical query processing. In this paper we
    study this trade-off in private Support Vector Machine (SVM) learning. We
    present two efficient mechanisms, one for the case of finite-dimensional
    feature mappings and one for potentially infinite-dimensional feature mappings
    with translation-invariant kernels.

  248. Statistical exponential families: A digest with flash cards.

    Authors: Frank Nielsen, Vincent Garcia
    Subjects: Learning
    Abstract

    This document describes concisely the ubiquitous class of exponential family
    distributions met in statistics. The first part recalls definitions and
    summarizes main properties and duality with Bregman divergences (all proofs are
    skipped). The second part lists decompositions and related formula of common
    exponential family distributions. We recall the Fisher-Rao-Riemannian
    geometries and the dual affine connection information geometries of statistical
    manifolds. It is intended to maintain and update this document and catalog by
    adding new distribution items.

  249. Towards Industrialized Conception and Production of Serious Games.

    Authors: Iza Marfisi-Schottman, Aymen Sghaier, Sébastien George, Franck Tarpin-Bernard, Patrick Prévôt
    Subjects: Learning
    Abstract

    Serious Games (SGs) have experienced a tremendous outburst these last years.
    Video game companies have been producing fun, user-friendly SGs, but their
    educational value has yet to be proven. Meanwhile, cognition research scientist
    have been developing SGs in such a way as to guarantee an educational gain, but
    the fun and attractive characteristics featured often would not meet the
    public's expectations. The ideal SG must combine these two aspects while still
    being economically viable.

  250. A Geometric Approach to Sample Compression.

    Authors: J. Hyam Rubinstein, Benjamin I. P. Rubinstein
    Subjects: Learning
    Abstract

    The Sample Compression Conjecture of Littlestone & Warmuth has remained
    unsolved for over two decades. This paper presents a systematic geometric
    investigation of the compression of finite maximum concept classes. Simple
    arrangements of hyperplanes in Hyperbolic space, and Piecewise-Linear
    hyperplane arrangements, are shown to represent maximum classes, generalizing
    the corresponding Euclidean result.

  251. Boosting through Optimization of Margin Distributions.

    Authors: Chunhua Shen, Hanxi Li
    Subjects: Learning
    Abstract

    Boosting has attracted much research attention in the past decade. The
    success of boosting algorithms may be interpreted in terms of the margin
    theory. Recently it has been shown that generalization error of classifiers can
    be obtained by explicitly taking the margin distribution of the training data
    into account. Most of the current boosting algorithms in practice usually
    optimizes a convex loss function and do not make use of the margin
    distribution.

  252. Keystroke Dynamics Authentication For Collaborative Systems.

    Authors: Romain Giot, Mohamad El-Abed, Christophe Rosenberger
    Subjects: Learning
    Abstract

    We present in this paper a study on the ability and the benefits of using a
    keystroke dynamics authentication method for collaborative systems.
    Authentication is a challenging issue in order to guarantee the security of use
    of collaborative systems during the access control step. Many solutions exist
    in the state of the art such as the use of one time passwords or smart-cards.
    We focus in this paper on biometric based solutions that do not necessitate any
    additional sensor. Keystroke dynamics is an interesting solution as it uses
    only the keyboard and is invisible for users.

  253. Sequential anomaly detection in the presence of noise and limited feedback.

    Authors: Maxim Raginsky, Rebecca Willett, Roummel Marcia, Jorge Silva
    Subjects: Learning
    Abstract

    This paper describes a method for detecting anomalies from sequentially
    observed and potentially noisy data. The proposed approach consists of two main
    elements: (1) filtering, or assigning a belief or likelihood to each successive
    measurement based upon our ability to predict it from previous noisy
    observations, and (2) hedging, or flagging potential anomalies by comparing the
    current belief against a time-varying and data-adaptive threshold. The
    threshold is adjusted based on the available feedback from an end user.

  254. Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity.

    Authors: Sham M. Kakade, Ambuj Tewari, Ohad Shamir, Karthik Sridharan
    Subjects: Learning
    Abstract

    The versatility of exponential families, along with their attendant convexity
    properties, make them a popular and effective statistical model. A central
    issue is learning these models in high-dimensions, such as when there is some
    sparsity pattern of the optimal parameter. This work characterizes a certain
    strong convexity property of general exponential families, which allow their
    generalization ability to be quantified. In particular, we show how this
    property can be used to analyze generic exponential families under L_1
    regularization.

  255. Metric and Kernel Learning using a Linear Transformation.

    Authors: Prateek Jain, Inderjit S. Dhillon, Brian Kulis, Jason V. Davis
    Subjects: Learning
    Abstract

    Metric and kernel learning are important in several machine learning
    applications. However, most existing metric learning algorithms are limited to
    learning metrics over low-dimensional data, while existing kernel learning
    algorithms are often limited to the transductive setting and do not generalize
    to new data points. In this paper, we study metric learning as a problem of
    learning a linear transformation of the input data.

  256. Anomaly Detection with Score functions based on Nearest Neighbor Graphs.

    Authors: Manqi Zhao, Venkatesh Saligrama
    Subjects: Learning
    Abstract

    We propose a novel non-parametric adaptive anomaly detection algorithm for
    high dimensional data based on score functions derived from nearest neighbor
    graphs on $n$-point nominal data. Anomalies are declared whenever the score of
    a test sample falls below $\alpha$, which is supposed to be the desired false
    alarm level. The resulting anomaly detector is shown to be asymptotically
    optimal in that it is uniformly most powerful for the specified false alarm
    level, $\alpha$, for the case when the anomaly density is a mixture of the
    nominal and a known density.

  257. Self-concordant analysis for logistic regression.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    Most of the non-asymptotic theoretical work in regression is carried out for
    the square loss, where estimators can be obtained through closed-form
    expressions. In this paper, we use and extend tools from the convex
    optimization literature, namely self-concordant functions, to provide simple
    extensions of theoretical results for the square loss to the logistic loss.

  258. Guaranteed Rank Minimization via Singular Value Projection.

    Authors: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
    Subjects: Learning
    Abstract

    Minimizing the rank of a matrix subject to affine constraints is a
    fundamental problem with many important applications in machine learning and
    statistics.

  259. Improved Collaborative Filtering Algorithm via Information Transformation.

    Authors: Jian-Guo Liu, Bing-Hong Wang, Qiang Guo
    Subjects: Learning
    Abstract

    In this paper, we propose a spreading activation approach for collaborative
    filtering (SA-CF). By using the opinion spreading process, the similarity
    between any users can be obtained. The algorithm has remarkably higher accuracy
    than the standard collaborative filtering (CF) using Pearson correlation.
    Furthermore, we introduce a free parameter $\beta$ to regulate the
    contributions of objects to user-user correlations. The numerical results
    indicate that decreasing the influence of popular objects can further improve
    the algorithmic accuracy and personality.

  260. Effectiveness and Limitations of Statistical Spam Filters.

    Authors: M. Tariq Banday, Tariq R. Jan
    Subjects: Learning
    Abstract

    In this paper we discuss the techniques involved in the design of the famous
    statistical spam filters that include Naive Bayes, Term Frequency-Inverse
    Document Frequency, K-Nearest Neighbor, Support Vector Machine, and Bayes
    Additive Regression Tree. We compare these techniques with each other in terms
    of accuracy, recall, precision, etc. Further, we discuss the effectiveness and
    limitations of statistical filters in filtering out various types of spam from
    legitimate e-mails.

  261. Entropy Message Passing Algorithm.

    Authors: Miomir S. Stankovic, Velimir M. Ilic, Branimir T. Todorovic
    Subjects: Learning
    Abstract

    Message passing over factor graph can be considered as generalization of many
    well known algorithms for efficient marginalization of multivariate function. A
    specific instance of the algorithm is obtained by choosing an appropriate
    commutative semiring for the range of the function to be marginalized. Some
    examples are Viterbi algorithm, obtained on max-product semiring and
    forward-backward algorithm obtained on sum-product semiring. In this paper,
    Entropy Message Passing algorithm (EMP) is developed. It operates over entropy
    semiring, previously introduced in automata theory.

  262. Local and global approaches of affinity propagation clustering for large scale data.

    Authors: Dingyin Xia, Fei Wu, Xuqing Zhang, Yueting Zhuang
    Subjects: Learning
    Abstract

    Recently a new clustering algorithm called 'affinity propagation' (AP) has
    been proposed, which efficiently clustered sparsely related data by passing
    messages between data points. However, we want to cluster large scale data
    where the similarities are not sparse in many cases. This paper presents two
    variants of AP for grouping large scale data with a dense similarity matrix.
    The local approach is partition affinity propagation (PAP) and the global
    method is landmark affinity propagation (LAP).

  263. Reduced-Rank Hidden Markov Models.

    Authors: Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon
    Subjects: Learning
    Abstract

    We introduce the Reduced-Rank Hidden Markov Model (RR-HMM), a generalization
    of HMMs that can model smooth state evolution as in Linear Dynamical Systems
    (LDSs) as well as non-log-concave predictive distributions as in
    continuous-observation HMMs. RR-HMMs assume an m-dimensional latent state and n
    discrete observations, with a transition matrix of rank k <= m. This implies
    the dynamics evolve in a k-dimensional subspace, while the shape of the set of
    predictive distributions is determined by m.

  264. Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison.

    Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
    Subjects: Learning
    Abstract

    We consider a problem of significant practical importance, namely, the
    reconstruction of a low-rank data matrix from a small subset of its entries.
    This problem appears in many areas such as collaborative filtering, computer
    vision and wireless sensor networks. In this paper, we focus on the matrix
    completion problem in the case when the observed samples are corrupted by
    noise. We compare the performance of three state-of-the-art matrix completion
    algorithms (OptSpace, ADMiRA and FPCA) on a single simulation platform and
    present numerical results.

  265. Applications of strong convexity--strong smoothness duality to learning with matrices.

    Authors: Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari
    Subjects: Learning
    Abstract

    It is known that a function is strongly convex with respect to some norm if
    and only if its conjugate function is strongly smooth with respect to the dual
    norm. This result has already been found to be a key component in deriving and
    analyzing several learning algorithms. Utilizing this duality, we isolate a
    single inequality which seamlessly implies both generalization bounds and
    online regret bounds; and we show how to construct strongly convex functions
    over matrices based on strongly convex functions over vectors.

  266. Post-Processing of Discovered Association Rules Using Ontologies.

    Authors: Claudia Marinica, Fabrice Guillet, Henri Briand
    Subjects: Learning
    Abstract

    In Data Mining, the usefulness of association rules is strongly limited by
    the huge amount of delivered rules. In this paper we propose a new approach to
    prune and filter discovered rules. Using Domain Ontologies, we strengthen the
    integration of user knowledge in the post-processing task. Furthermore, an
    interactive and iterative framework is designed to assist the user along the
    analyzing task. On the one hand, we represent user domain knowledge using a
    Domain Ontology over database. On the other hand, a novel technique is
    suggested to prune and to filter discovered rules.

  267. How random are a learner's mistakes?.

    Authors: Joel Ratsaby
    Subjects: Learning
    Abstract

    Given a random binary sequence $X^{(n)}$ of random variables, $X_{t},$
    $t=1,2,...,n$, for instance, one that is generated by a Markov source (teacher)
    of order $k^{*}$ (each state represented by $k^{*}$ bits). Assume that the
    probability of the event $X_{t}=1$ is constant and denote it by $\beta$.
    Consider a learner which is based on a parametric model, for instance a Markov
    model of order $k$, who trains on a sequence $x^{(m)}$ which is randomly drawn
    by the teacher.

  268. How random are a learner's mistakes?.

    Authors: Joel Ratsaby
    Subjects: Learning
    Abstract

    Given a random binary sequence $X^{(n)}$ of random variables, $X_{t},$
    $t=1,2,...,n$, for instance, one that is generated by a Markov source (teacher)
    of order $k^{*}$ (each state represented by $k^{*}$ bits). Assume that the
    probability of the event $X_{t}=1$ is constant and denote it by $\beta$.
    Consider a learner which is based on a parametric model, for instance a Markov
    model of order $k$, who trains on a sequence $x^{(m)}$ which is randomly drawn
    by the teacher.

  269. Eignets for function approximation on manifolds.

    Authors: H. N. Mhaskar
    Subjects: Learning
    Abstract

    Let $\XX$ be a compact, smooth, connected, Riemannian manifold without
    boundary, $G:\XX\times\XX\to \RR$ be a kernel. Analogous to a radial basis
    function network, an eignet is an expression of the form $\sum_{j=1}^M
    a_jG(\circ,y_j)$, where $a_j\in\RR$, $y_j\in\XX$, $1\le j\le M$. We describe a
    deterministic, universal algorithm for constructing an eignet for approximating
    functions in $L^p(\mu;\XX)$ for a general class of measures $\mu$ and kernels
    $G$. Our algorithm yields linear operators.

  270. Scalable Inference for Latent Dirichlet Allocation.

    Authors: James Petterson, Tiberio Caetano
    Subjects: Learning
    Abstract

    We investigate the problem of learning a topic model - the well-known Latent
    Dirichlet Allocation - in a distributed manner, using a cluster of C processors
    and dividing the corpus to be learned equally among them. We propose a simple
    approximated method that can be tuned, trading speed for accuracy according to
    the task at hand. Our approach is asynchronous, and therefore suitable for
    clusters of heterogenous machines.

  271. Classifier Ensemble with Unlabeled Data.

    Authors: Min-Ling Zhang, Zhi-Hua Zhou
    Subjects: Learning
    Abstract

    Ensemble learning aims to improve generalization ability by using multiple
    base learners. It is well-known that to construct a good ensemble, the base
    learners should be accurate as well as diverse. In this paper, unlabeled data
    is exploited to facilitate ensemble learning by helping augment the diversity
    among the base learners. Specifically, a semi-supervised ensemble method named
    SEALED is proposed.

  272. Randomized Algorithms for Large scale SVMs.

    Authors: Vinay Jethava, Krishnan Suresh, Chiranjib Bhattacharyya, Ramesh Hariharan
    Subjects: Learning
    Abstract

    We propose a randomized algorithm for training Support vector machines(SVMs)
    on large datasets. By using ideas from Random projections we show that the
    combinatorial dimension of SVMs is $O({log} n)$ with high probability. This
    estimate of combinatorial dimension is used to derive an iterative algorithm,
    called RandSVM, which at each step calls an existing solver to train SVMs on a
    randomly chosen subset of size $O({log} n)$. The algorithm has probabilistic
    guarantees and is capable of training SVMs with Kernels for both classification
    and regression problems.

  273. Extension of Path Probability Method to Approximate Inference over Time.

    Authors: Vinay Jethava
    Subjects: Learning
    Abstract

    There has been a tremendous growth in publicly available digital video
    footage over the past decade. This has necessitated the development of new
    techniques in computer vision geared towards efficient analysis, storage and
    retrieval of such data. Many mid-level computer vision tasks such as
    segmentation, object detection, tracking, etc. involve an inference problem
    based on the video data available. Video data has a high degree of spatial and
    temporal coherence. The property must be intelligently leveraged in order to
    obtain better results.

  274. Matrix Completion from a Few Entries.

    Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
    Subjects: Learning
    Abstract

    Let M be a random (alpha n) x n matrix of rank r<<n, and assume that a
    uniformly random subset E of its entries is observed. We describe an efficient
    algorithm that reconstructs M from |E| = O(rn) observed entries with relative
    root mean square error RMSE <= C(rn/|E|)^0.5 . Further, if r=O(1), M can be
    reconstructed exactly from |E| = O(n log(n)) entries. These results apply
    beyond random matrices to general low-rank incoherent matrices.

  275. A Convergent Online Single Time Scale Actor Critic Algorithm.

    Authors: D. Di Castro, R. Meir
    Subjects: Learning
    Abstract

    Actor-Critic based approaches were among the first to address reinforcement
    learning in a general setting. Recently, these algorithms have gained renewed
    interest due to their generality, good convergence properties, and possible
    biological relevance. In this paper, we introduce an online temporal difference
    based actor-critic algorithm which is proved to converge to a neighborhood of a
    local maximum of the average reward.

  276. Distribution-Specific Agnostic Boosting.

    Authors: Vitaly Feldman
    Subjects: Learning
    Abstract

    We consider the problem of boosting the accuracy of weak learning algorithms
    in the agnostic learning framework of Haussler (1992) and Kearns et al. (1992).
    Known algorithms for this problem (Ben-David et al., 2001; Gavinsky, 2002;
    Kalai et al., 2008) follow the same strategy as boosting algorithms in the PAC
    model: the weak learner is executed on the same target function but over
    different distributions on the domain.

  277. Minimum Probability Flow Learning.

    Authors: Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese
    Subjects: Learning
    Abstract

    Learning in probabilistic models is often hampered by the general
    intractability of the normalization factor and its derivatives. Here we propose
    a new learning technique that obviates the need to compute an intractable
    normalization factor or sample from the equilibrium distribution of the model.
    This is achieved by establishing dynamics that would transform the observed
    data distribution into the model distribution, and then setting as the
    objective the minimization of the initial flow of probability away from the
    data distribution.

  278. Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary \beta-Mixing Processes.

    Authors: Liva Ralaivola, Marie Szafranski, Guillaume Stempfel
    Subjects: Learning
    Abstract

    Pac-Bayes bounds are among the most accurate generalization bounds for
    classifiers learned from independently and identically distributed (IID) data,
    and it is particularly so for margin classifiers: there have been recent
    contributions showing how practical these bounds can be either to perform model
    selection (Ambroladze et al., 2007) or even to directly guide the learning of
    linear classifiers (Germain et al., 2009). However, there are many practical
    situations where the training data show some dependencies and where the
    traditional IID assumption does not hold.

  279. Lower Bounds for BMRM and Faster Rates for Training SVMs.

    Authors: Ankan Saha, S.V.N. Vishwanathan, Xinhua Zhang
    Subjects: Learning
    Abstract

    Regularized risk minimization with the binary hinge loss and its variants
    lies at the heart of many machine learning problems. Bundle methods for
    regularized risk minimization (BMRM) and the closely related SVMStruct are
    considered the best general purpose solvers to tackle this problem. It was
    recently shown that BMRM requires $O(1/\epsilon)$ iterations to converge to an
    $\epsilon$ accurate solution. In the first part of the paper we use the
    Hadamard matrix to construct a regularized risk minimization problem and show
    that these rates cannot be improved.

  280. Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling.

    Authors: Nataliya Sokolovska, Thomas Lavergne, Olivier Capp&#xe9;, Fran&#xe7;ois Yvon
    Subjects: Learning
    Abstract

    Conditional Random Fields (CRFs) constitute a popular and efficient approach
    for supervised sequence labelling. CRFs can cope with large description spaces
    and can integrate some form of structural dependency between labels. In this
    contribution, we address the issue of efficient feature selection for CRFs
    based on imposing sparsity through an L1 penalty. We first show how sparsity of
    the parameter set can be exploited to significantly speed up training and
    labelling. We then introduce coordinate descent parameter update schemes for
    CRFs with L1 regularization.

  281. High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    We consider the problem of high-dimensional non-linear variable selection for
    supervised learning. Our approach is based on performing linear selection among
    exponentially many appropriately defined positive definite kernels that
    characterize non-linear interactions between the original variables.

  282. Median topographic maps for biomedical data sets.

    Authors: Fabrice Rossi, Barbara Hammer, Alexander Hasenfu&#xdf;
    Subjects: Learning
    Abstract

    Median clustering extends popular neural data analysis methods such as the
    self-organizing map or neural gas to general data structures given by a
    dissimilarity matrix only. This offers flexible and robust global data
    inspection methods which are particularly suited for a variety of data as
    occurs in biomedical domains. In this chapter, we give an overview about median
    clustering and its properties and extensions, with a particular focus on
    efficient implementations adapted to large scale data analysis.

  283. Advances in Feature Selection with Mutual Information.

    Authors: Michel Verleysen, Fabrice Rossi, Damien Fran&#xe7;ois
    Subjects: Learning
    Abstract

    The selection of features that are relevant for a prediction or
    classification problem is an important problem in many domains involving
    high-dimensional data. Selecting features helps fighting the curse of
    dimensionality, improving the performances of prediction or classification
    methods, and interpreting the application. In a nonlinear context, the mutual
    information is widely used as relevance criterion for features and sets of
    features.

  284. ABC-LogitBoost for Multi-class Classification.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    We develop abc-logitboost, based on the prior work on abc-boost and robust
    logitboost. Our extensive experiments on a variety of datasets demonstrate the
    considerable improvement of abc-logitboost over logitboost and abc-mart.

Syndicate content