This paper develops a general theoretical framework to analyze structured
sparse recovery problems using the notation of dual certificate. Although
certain aspects of the dual certificate idea have already been used in some
previous work, due to the lack of a general and coherent theory, the analysis
has so far only been carried out in limited scopes for specific problems. In
this context the current paper makes two contributions. First, we introduce a
general definition of dual certificate, which we then use to develop a unified
theory of sparse recovery analysis for convex programming.
We consider solving the $\ell_1$-regularized least-squares ($\ell_1$-LS)
problem in the context of sparse recovery, for applications such as compressed
sensing. The standard proximal gradient method, also known as iterative
soft-thresholding when applied to this problem, has low computational cost per
iteration but a rather slow convergence rate. Nevertheless, when the solution
is sparse, it often exhibits fast linear convergence in the final stage.
High-capacity NAND flash memories use multi-level cells (MLCs) to store
multiple bits per cell and achieve high storage densities. Higher densities
cause increased raw bit error rates (BERs), which demand powerful error
correcting codes. Low-density parity-check (LDPC) codes are a well-known class
of capacity-approaching codes in AWGN channels. However, LDPC codes
traditionally use soft information while the flash read channel provides only
hard information. Low resolution soft information may be obtained by performing
multiple reads per cell with distinct word-line voltages.
We prove an effective upper bound on the number of effective sections of a
nef hermitian line bundle over an arithmetic surface. It is an effective
version of the arithmetic Hilbert--Samuel formula. As a consequence, we obtain
effective lower bounds on the Faltings height and on the self-intersection of
the canonical bundle in terms of the number of singular points on fibers of the
arithmetic surface.
This paper considers the sparse eigenvalue problem, which is to extract
dominant (largest) sparse eigenvectors with at most $k$ non-zero components. We
propose a simple yet effective solution called truncated power method that can
approximately solve the underlying nonconvex optimization problem. A strong
sparse recovery result is proved for the truncated power method, and this
theory is our key motivation for developing the new algorithm. The proposed
method is tested on applications such as sparse principal component analysis
and the densest $k$-subgraph problem.
We apply the concept of structured sparsity to improve boosted decision trees
with general loss functions. The existing approach to this problem is
Friedman's gradient boosting procedure using a regression tree base learner.
Although this method has led to many successful industrial applications, it
suffers from several theoretical and practical drawbacks. By employing the idea
of structured greedy search, we are able to design a regularized greedy forest
procedure to address these issues.
We prove an exponential probability tail inequality for positive semidefinite
quadratic forms in a subgaussian random vector. The bound is analogous to one
that holds when the vector has independent Gaussian entries.
Concave regularization methods provide natural procedures for sparse
recovery. However, they are difficult to analyze in the high dimensional
setting. Only recently a few sparse recovery results have been established for
some specific local solutions obtained via specialized numerical procedures.
Still, the fundamental relationship between these solutions such as whether
they are identical or their relationship to the global minimizer of the
underlying nonconvex formulation is unknown.
This work considers the problem of learning the structure of a broad class of
multivariate latent variable tree models, which include a variety of continuous
and discrete models (including the widely used linear-Gaussian models, hidden
Markov models, and Markov evolutionary trees). The setting is one where we only
have samples from certain observed variables in the tree and our goal is to
estimate the tree structure (i.e., the graph of how the underlying hidden
variables are connected to the observed variables).
We address the problem of learning in an online setting where the learner
repeatedly observes features, selects among a set of actions, and receives
reward for the action taken. We provide the first efficient algorithm with an
optimal regret. Our algorithm uses a cost sensitive classification learner as
an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number
of classification rules among which the oracle might choose. This is
exponentially faster than all previous algorithms that achieve optimal regret
in this setting.
The random design setting for linear regression concerns estimators based on
a random sample of covariate/response pairs. This work gives explicit bounds on
the prediction error for the ordinary least squares estimator and the ridge
regression estimator under mild assumptions on the covariate/response
distributions. In particular, this work provides sharp results on the
"out-of-sample" prediction error, as opposed to the "in-sample" (fixed design)
error. Our analysis also explicitly reveals the effect of noise vs. modeling
errors.
Suppose a given observation matrix can be decomposed as the sum of a low-rank
matrix and a sparse matrix (outliers), and the goal is to recover these
individual components from the observed sum. Such additive decompositions have
applications in a variety of numerical problems including system
identification, latent variable graphical modeling, and principal components
analysis. We study conditions under which recovering such a decomposition is
possible via a combination of $\ell_1$ norm and trace norm minimization.
We derive sharp performance bounds for least squares regression with $L_1$
regularization from parameter estimation accuracy and feature selection quality
perspectives. The main result proved for $L_1$ regularization extends a similar
result in [Ann. Statist. 35 (2007) 2313--2351] for the Dantzig selector. It
gives an affirmative answer to an open question in [Ann. Statist. 35 (2007)
2358--2364]. Moreover, the result leads to an extended view of feature
selection that allows less restrictive conditions than some recent work.