The regularization path of the Lasso can be shown to be piecewise linear,
making it possible to "follow" and explicitly compute the entire path. We
analyze in this paper this popular strategy, and prove that its worst case
complexity is exponential in the number of variables. We then oppose this
pessimistic result to an (optimistic) approximate analysis: We show that an
approximate path with at most O(1/sqrt(epsilon)) linear segments can always be
obtained, where every point on the path is guaranteed to be optimal up to a
relative epsilon-duality gap.
We consider supervised learning problems where the features are embedded in a
graph, such as gene expressions in a gene network. In this context, it is of
much interest to take into account the problem structure, and automatically
select a subgraph with a small number of connected components. By exploiting
prior knowledge, one can indeed improve the prediction performance and/or
obtain better interpretable results. Regularization or penalty functions for
selecting features in graphs have recently been proposed but they raise new
algorithmic challenges.
Communities of highly connected actors form an essential feature in the
structure of several empirical directed and undirected networks. However,
compared to the amount of research on clustering for undirected graphs, there
is relatively little understanding of clustering in directed networks.
Functional MRI (fMRI) has become the most common method for investigating the
human brain. However, fMRI data present some complications for statistical
analysis and modeling. One recently developed approach to these data focuses on
estimation of computational encoding models that describe how stimuli are
transformed into brain activity measured in individual voxels. Here we aim at
building encoding models for fMRI signals recorded in primary visual cortex of
the human brain. We use residual analyses to reveal systematic nonlinearity
across voxels not taken into account by previous models.
I do not remember when was the first time that I met Leo, but I have a clear
memory of going to Leo's office on the 4th floor of Evans Hall to talk to him
in my second year in Berkeley's Ph.D. program in 1986. The details of the
conversation are not retained but a visual image of his clean and orderly
office remains, in a stark contrast to a high entropy state of the same office
now being used by myself.
The performance of the Lasso is well understood under the assumptions of the
standard linear model with homoscedastic noise. However, in several
applications, the standard model does not describe the important features of
the data. This paper examines how the Lasso performs on a non-standard model
that is motivated by medical imaging applications. In these applications, the
variance of the noise scales linearly with the expectation of the observation.
Like all heteroscedastic models, the noise terms in this Poisson-like model are
\textit{not} independent of the design matrix.
High-dimensional statistical inference deals with models in which the the
number of parameters $p$ is comparable to or larger than the sample size $n$.
Since it is usually impossible to obtain consistent procedures unless $p/n
\rightarrow 0$, a line of recent work has studied models with various types of
low-dimensional structure (e.g., sparse vectors; block-structured matrices;
low-rank matrices; Markov assumptions).
Sparse additive models are families of $d$-variate functions that have the
additive decomposition \mbox{$f^* = \sum_{j \in S} f^*_j$,} where $S$ is a
unknown subset of cardinality $s \ll d$. We consider the case where each
component function $f^*_j$ lies in a reproducing kernel Hilbert space, and
analyze a simple kernel-based convex program for estimating the unknown
function $f^*$. Working within a high-dimensional framework that allows both
the dimension $d$ and sparsity $s$ to scale, we derive convergence rates in the
$L^2(\mathbb{P})$ and $L^2(\mathbb{P}_n)$ norms.
Networks or graphs can easily represent a diverse set of data sources that
are characterized by interacting units or actors. Social networks, representing
people who communicate with each other, are one example. Communities or
clusters of highly connected actors form an essential feature in the structure
of several empirical networks. Spectral clustering is a popular and
computationally feasible method to discover these communities. The Stochastic
Block Model (Holland et al., 1983) is a social network model with well defined
communities; each node is a member of one community.
This paper focuses on obtaining clustering information about a distribution
from its i.i.d. samples. We develop theoretical results to understand and use
clustering information contained in the eigenvectors of data adjacency matrices
based on a radial kernel function with a sufficiently fast tail decay. In
particular, we provide population analyses to gain insights into which
eigenvectors should be used and when the clustering information for the
distribution can be recovered from the sample.
Consider the standard linear regression model $\y = \Xmat \betastar + w$,
where $\y \in \real^\numobs$ is an observation vector, $\Xmat \in
\real^{\numobs \times \pdim}$ is a design matrix, $\betastar \in \real^\pdim$
is the unknown regression vector, and $w \sim \mathcal{N}(0, \sigma^2 I)$ is
additive Gaussian noise. This paper studies the minimax rates of convergence
for estimation of $\betastar$ for $\ell_\rpar$-losses and in the
$\ell_2$-prediction loss, assuming that $\betastar$ belongs to an
$\ell_{\qpar}$-ball $\Ballq(\myrad)$ for some $\qpar \in [0,1]$.
Consider the standard linear regression model $\y = \Xmat \betastar + w$,
where $\y \in \real^\numobs$ is an observation vector, $\Xmat \in
\real^{\numobs \times \pdim}$ is a design matrix, $\betastar \in \real^\pdim$
is the unknown regression vector, and $w \sim \mathcal{N}(0, \sigma^2 I)$ is
additive Gaussian noise. This paper studies the minimax rates of convergence
for estimation of $\betastar$ for $\ell_\rpar$-losses and in the
$\ell_2$-prediction loss, assuming that $\betastar$ belongs to an
$\ell_{\qpar}$-ball $\Ballq(\myrad)$ for some $\qpar \in [0,1]$.
Extracting useful information from high-dimensional data is an important
focus of today's statistical research and practice. Penalized loss function
minimization has been shown to be effective for this task both theoretically
and empirically. With the virtues of both regularization and sparsity, the
$L_1$-penalized squared error minimization method Lasso has been popular in
regression models and beyond.
Extracting useful information from high-dimensional data is an important
focus of today's statistical research and practice. Penalized loss function
minimization has been shown to be effective for this task both theoretically
and empirically. With the virtues of both regularization and sparsity, the
$L_1$-penalized squared error minimization method Lasso has been popular in
regression models and beyond.