Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties.
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. While naturally cast as a combinatorial
optimization problem, variable or feature selection admits a convex relaxation
through the regularization by the $\ell_1$-norm. In this paper, we consider
situations where we are not only interested in sparsity, but where some
structural prior knowledge is available as well.
Inverse inference, or "brain reading", is a recent paradigm for analyzing
functional magnetic resonance imaging (fMRI) data, based on pattern recognition
and statistical learning. By predicting some cognitive variables related to
brain activation maps, this approach aims at decoding brain activity. Inverse
inference takes into account the multivariate information between voxels and is
currently the only way to assess how precisely some cognitive information is
encoded by the activity of neural populations within the whole brain.
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2 or l_infinity norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups.
Sparse coding consists in representing signals as sparse linear combinations
of atoms selected from a dictionary. We consider an extension of this framework
where the atoms are further assumed to be embedded in a tree. This is achieved
using a recently introduced tree-structured sparse regularization norm, which
has proven useful in several applications. This norm leads to regularized
problems that are difficult to optimize, and we propose in this paper efficient
algorithms for solving them.
We consider a class of learning problems that involve a structured
sparsity-inducing norm defined as the sum of $\ell_\infty$-norms over groups of
variables. Whereas a lot of effort has been put in developing fast optimization
methods when the groups are disjoint or embedded in a specific hierarchical
structure, we address here the case of general overlapping groups. To this end,
we show that the corresponding optimization problem is related to network flow
optimization. More precisely, the proximal problem associated with the norm we
consider is dual to a quadratic min-cost flow problem.
We consider the empirical risk minimization problem for linear supervised
learning, with regularization by structured sparsity-inducing norms. These are
defined as sums of Euclidean norms on certain subsets of variables, extending
the usual $\ell_1$-norm and the group $\ell_1$-norm by allowing the subsets to
overlap. This leads to a specific set of allowed nonzero patterns for the
solutions of such problems.
We present an extension of sparse PCA, or sparse dictionary learning, where
the sparsity patterns of all dictionary elements are structured and constrained
to belong to a prespecified set of shapes. This \emph{structured sparse PCA} is
based on a structured regularization recently introduced by [1]. While
classical sparse priors only deal with \textit{cardinality}, the regularization
we use encodes higher-order information about the data. We propose an efficient
and simple optimization procedure to solve this problem.