Francis Bach

  1. Optimization with Sparsity-Inducing Penalties.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    Sparse estimation methods are aimed at using or obtaining parsimonious
    representations of data or models. They were first dedicated to linear variable
    selection but numerous extensions have now emerged such as structured sparsity
    or kernel selection. It turns out that many of the related estimation problems
    can be cast as convex optimization problems by regularizing the empirical risk
    with appropriate non-smooth norms. The goal of this paper is to present from a
    general perspective optimization tools and techniques dedicated to such
    sparsity-inducing penalties.

  2. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization.

    Authors: Francis Bach, Mark Schmidt, Nicolas Le Roux
    Subjects: Learning
    Abstract

    We consider the problem of optimizing the sum of a smooth convex function and
    a non-smooth convex function using proximal-gradient methods, where an error is
    present in the calculation of the gradient of the smooth term or in the
    proximity operator with respect to the non-smooth term.

  3. Structured sparsity through convex optimization.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    Sparse estimation methods are aimed at using or obtaining parsimonious
    representations of data or models. While naturally cast as a combinatorial
    optimization problem, variable or feature selection admits a convex relaxation
    through the regularization by the $\ell_1$-norm. In this paper, we consider
    situations where we are not only interested in sparsity, but where some
    structural prior knowledge is available as well.

  4. Multi-task Regression using Minimal Penalties.

    Authors: Francis Bach, Sylvain Arlot, Matthieu Solnon
    Subjects: Statistics
    Abstract

    In this paper we study the kernel multiple ridge regression framework, which
    we refer to as multi-task regression, using penalization techniques. The
    theoretical analysis of this problem shows that the key element appearing for
    an optimal calibration is the covariance matrix of the noise between the
    different tasks. We present a new algorithm to estimate this covariance matrix,
    based on the concept of minimal penalty, which was previously used in the
    single-task regression framework to estimate the variance of the noise.

  5. Online algorithms for Nonnegative Matrix Factorization with the Itakura-Saito divergence.

    Authors: Francis Bach, Augustin Lefèvre, Cédric Févotte
    Subjects: Machine Learning
    Abstract

    Nonnegative matrix factorization (NMF) is now a common tool for audio source
    separation. When learning NMF on large audio databases, one major drawback is
    that the complexity in time is O(FKN) when updating the dictionary (where (F;N)
    is the dimension of the input power spectrograms, and K the number of basis
    spectra), thus forbidding its application on signals longer than an hour. We
    provide an online algorithm with a complexity of O(FK) in time and memory for
    updates in the dictionary.

  6. Multi-scale Mining of fMRI data with Hierarchical Structured Sparsity.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Bertrand Thirion, Alexandre Gramfort, Vincent Michel, Evelyn Eger
    Subjects: Machine Learning
    Abstract

    Inverse inference, or "brain reading", is a recent paradigm for analyzing
    functional magnetic resonance imaging (fMRI) data, based on pattern recognition
    and statistical learning. By predicting some cognitive variables related to
    brain activation maps, this approach aims at decoding brain activity. Inverse
    inference takes into account the multivariate information between voxels and is
    currently the only way to assess how precisely some cognitive information is
    encoded by the activity of neural populations within the whole brain.

  7. Convex and Network Flow Optimization for Structured Sparsity.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Optimization and Control
    Abstract

    We consider a class of learning problems regularized by a structured
    sparsity-inducing norm defined as the sum of l_2 or l_infinity norms over
    groups of variables. Whereas much effort has been put in developing fast
    optimization techniques when the groups are disjoint or embedded in a
    hierarchy, we address here the case of general overlapping groups.

  8. Task-Driven Dictionary Learning.

    Authors: Francis Bach, Julien Mairal, Jean Ponce
    Subjects: Machine Learning
    Abstract

    Modeling data with linear combinations of a few elements from a learned
    dictionary has been the focus of much recent research in machine learning,
    neuroscience and signal processing. For signals such as natural images that
    admit such sparse representations, it is now well established that these models
    are well suited to restoration tasks. In this context, learning the dictionary
    amounts to solving a large-scale matrix factorization problem, which can be
    done efficiently with classical optimization tools.

  9. Structured sparsity-inducing norms through submodular functions.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    Sparse methods for supervised learning aim at finding good linear predictors
    from as few variables as possible, i.e., with small cardinality of their
    supports. This combinatorial selection problem is often turned into a convex
    optimization problem by replacing the cardinality function by its convex
    envelope (tightest convex lower bound), in this case the L1-norm.

  10. Proximal Methods for Hierarchical Sparse Coding.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Machine Learning
    Abstract

    Sparse coding consists in representing signals as sparse linear combinations
    of atoms selected from a dictionary. We consider an extension of this framework
    where the atoms are further assumed to be embedded in a tree. This is achieved
    using a recently introduced tree-structured sparse regularization norm, which
    has proven useful in several applications. This norm leads to regularized
    problems that are difficult to optimize, and we propose in this paper efficient
    algorithms for solving them.

  11. Network Flow Algorithms for Structured Sparsity.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski, Julien Mairal
    Subjects: Learning
    Abstract

    We consider a class of learning problems that involve a structured
    sparsity-inducing norm defined as the sum of $\ell_\infty$-norms over groups of
    variables. Whereas a lot of effort has been put in developing fast optimization
    methods when the groups are disjoint or embedded in a specific hierarchical
    structure, we address here the case of general overlapping groups. To this end,
    we show that the corresponding optimization problem is related to network flow
    optimization. More precisely, the proximal problem associated with the norm we
    consider is dual to a quadratic min-cost flow problem.

  12. Convex Relaxations for Subset Selection.

    Authors: Francis Bach, Selin Damla Ahipasaoglu, Alexandre d'Aspremont
    Subjects: Optimization and Control
    Abstract

    We use convex relaxation techniques to produce lower bounds on the optimal
    value of subset selection problems and generate good approximate solutions. We
    then explicitly bound the quality of these relaxations by studying the
    approximation ratio of sparse eigenvalue relaxations. Our results are used to
    improve the performance of branch-and-bound algorithms to produce exact
    solutions to subset selection problems.

  13. Many-to-Many Graph Matching: a Continuous Relaxation Approach.

    Authors: Francis Bach, Jean-Philippe Vert, Mikhail Zaslavskiy
    Subjects: Machine Learning
    Abstract

    Graphs provide an efficient tool for object representation in various
    computer vision applications. Once graph-based representations are constructed,
    an important question is how to compare graphs. This problem is often
    formulated as a graph matching problem where one seeks a mapping between
    vertices of two graphs which optimally aligns their structure. In the classical
    formulation of graph matching, only one-to-one correspondences between vertices
    are considered.

  14. Online Learning for Matrix Factorization and Sparse Coding.

    Authors: Francis Bach, Julien Mairal, Jean Ponce, Guillermo Sapiro
    Subjects: Machine Learning
    Abstract

    Sparse coding--that is, modelling data vectors as sparse linear combinations
    of basis elements--is widely used in machine learning, neuroscience, signal
    processing, and statistics. This paper focuses on the large-scale matrix
    factorization problem that consists of learning the basis set, adapting it to
    specific data. Variations of this problem include dictionary learning in signal
    processing, non-negative matrix factorization and sparse principal component
    analysis.

  15. Self-concordant analysis for logistic regression.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    Most of the non-asymptotic theoretical work in regression is carried out for
    the square loss, where estimators can be obtained through closed-form
    expressions. In this paper, we use and extend tools from the convex
    optimization literature, namely self-concordant functions, to provide simple
    extensions of theoretical results for the square loss to the logistic loss.

  16. Structured Variable Selection with Sparsity-Inducing Norms.

    Authors: Francis Bach, Jean-Yves Audibert, Rodolphe Jenatton
    Subjects: Machine Learning
    Abstract

    We consider the empirical risk minimization problem for linear supervised
    learning, with regularization by structured sparsity-inducing norms. These are
    defined as sums of Euclidean norms on certain subsets of variables, extending
    the usual $\ell_1$-norm and the group $\ell_1$-norm by allowing the subsets to
    overlap. This leads to a specific set of allowed nonzero patterns for the
    solutions of such problems.

  17. Data-driven calibration of linear estimators with minimal penalties.

    Authors: Francis Bach, Sylvain Arlot
    Subjects: Statistics
    Abstract

    This paper tackles the problem of selecting among several linear estimators
    in non-parametric regression; this includes model selection for linear
    regression, the choice of a regularization parameter in kernel ridge regression
    or spline smoothing, and the choice of a kernel in multiple kernel learning. We
    propose a new algorithm which first estimates consistently the variance of the
    noise, based upon the concept of minimal penalty which was previously
    introduced in the context of model selection.

  18. Structured Sparse Principal Component Analysis.

    Authors: Francis Bach, Rodolphe Jenatton, Guillaume Obozinski
    Subjects: Machine Learning
    Abstract

    We present an extension of sparse PCA, or sparse dictionary learning, where
    the sparsity patterns of all dictionary elements are structured and constrained
    to belong to a prespecified set of shapes. This \emph{structured sparse PCA} is
    based on a structured regularization recently introduced by [1]. While
    classical sparse priors only deal with \textit{cardinality}, the regularization
    we use encodes higher-order information about the data. We propose an efficient
    and simple optimization procedure to solve this problem.

  19. High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning.

    Authors: Francis Bach
    Subjects: Learning
    Abstract

    We consider the problem of high-dimensional non-linear variable selection for
    supervised learning. Our approach is based on performing linear selection among
    exponentially many appropriately defined positive definite kernels that
    characterize non-linear interactions between the original variables.

RSS-материал