The sum-product or belief propagation (BP) algorithm is a widely-used
message-passing algorithm for computing marginal distributions in graphical
models with discrete variables. At the core of the BP updates, when applied to
a graphical model with pairwise interactions, lies a matrix-vector product with
complexity that is quadratic in the state dimension $d$, and requires
transmission of a $(d-1)$-dimensional vector of real numbers (messages) to its
neighbors.
Although the standard formulations of prediction problems involve
fully-observed and noiseless data drawn in an i.i.d. manner, many applications
involve noisy and/or missing data, possibly involving dependence as well. We
study these issues in the context of high-dimensional sparse linear regression,
and propose novel estimators for the cases of noisy, missing, and/or dependent
data.
We consider the sampling problem for functional PCA (fPCA), where the
simplest example is the case of taking time samples of the underlying
functional components. More generally, model the sampling operation as a
continuous linear map from H to R^m, where the functional components to lie in
some Hilbert subspace H of L^2, for example an RKHS of smooth functions. This
model includes time and frequency sampling as special cases.
We consider the hypothesis testing problem of detecting a shift between the
means of two multivariate normal distributions in the high-dimensional setting,
allowing for the data dimension p to exceed the sample size n. Specifically, we
propose a new test statistic for the two-sample test of means that integrates a
random projection with the classical Hotelling T^2 statistic.
Many statistical M-estimators are based on convex optimization problems
formed by the combination of a data-dependent loss function with a norm-based
regularizer. We analyze the convergence rates of projected gradient methods for
solving such problems, working within a high-dimensional framework that allows
the data dimension d to grow with (and possibly exceed) the sample size n. This
high-dimensional structure precludes the usual global assumptions---namely,
strong convexity and smoothness conditions---that underlie much of classical
optimization analysis.
We analyze convergence rates of stochastic optimization procedures for
non-smooth convex optimization problems. By combining randomized smoothing
techniques with accelerated gradient methods, we obtain optimal variance-based
convergence rates of stochastic optimization procedures, both in expectation
and with high probability, To the best of our knowledge, these are the first
variance-based rates for non-smooth optimization.
We analyze a class of estimators based on convex relaxation for solving
high-dimensional matrix decomposition problems. The observations are the noisy
realizations of the sum of an (appproximately) low rank matrix $\Theta^\star$
with a second matrix $\Gamma^\star$ endowed with a complementary form of
low-dimensional structure. We derive a general theorem that gives upper bounds
on the Frobenius norm error for an estimate of the pair $(\Theta^\star,
\Gamma^\star)$ obtained by solving a convex optimization problem that combines
the nuclear norm with a general decomposable regularizer.
High-dimensional statistical inference deals with models in which the the
number of parameters $p$ is comparable to or larger than the sample size $n$.
Since it is usually impossible to obtain consistent procedures unless $p/n
\rightarrow 0$, a line of recent work has studied models with various types of
low-dimensional structure (e.g., sparse vectors; block-structured matrices;
low-rank matrices; Markov assumptions).
We consider the problem of estimating the graph associated with a binary
Ising Markov random field. We describe a method based on $\ell_1$-regularized
logistic regression, in which the neighborhood of any given node is estimated
by performing logistic regression subject to an $\ell_1$-constraint. The method
is analyzed under high-dimensional scaling in which both the number of nodes
$p$ and maximum neighborhood size $d$ are allowed to grow as a function of the
number of observations $n$.
We consider the matrix completion problem under a form of row/column weighted
entrywise sampling, including the case of uniform entrywise sampling as a
special case. We analyze the associated random observation operator, and prove
that with high probability, it satisfies a form of restricted strong convexity
with respect to weighted Frobenius norm. Using this property, we obtain as
corollaries a number of error bounds on matrix completion in the weighted
Frobenius norm under noisy sampling and for both exact and near low-rank
matrices.
Relative to the large literature on upper bounds on complexity of convex
optimization, lesser attention has been paid to the fundamental hardness of
these problems. Given the extensive use of convex optimization in machine
learning and statistics, gaining an understanding of these complexity-theoretic
issues is important. In this paper, we study the complexity of stochastic
convex optimization in an oracle model of computation. We improve upon known
results and obtain tight minimax complexity estimates for various function
classes.
Sparse additive models are families of $d$-variate functions that have the
additive decomposition \mbox{$f^* = \sum_{j \in S} f^*_j$,} where $S$ is a
unknown subset of cardinality $s \ll d$. We consider the case where each
component function $f^*_j$ lies in a reproducing kernel Hilbert space, and
analyze a simple kernel-based convex program for estimating the unknown
function $f^*$. Working within a high-dimensional framework that allows both
the dimension $d$ and sparsity $s$ to scale, we derive convergence rates in the
$L^2(\mathbb{P})$ and $L^2(\mathbb{P}_n)$ norms.
High-dimensional inference refers to problems of statistical estimation in
which the ambient dimension of the data may be comparable to or possibly even
larger than the sample size. We study an instance of high-dimensional inference
in which the goal is to estimate a matrix $\Theta^* \in \real^{k \times p}$ on
the basis of $N$ noisy observations, and the unknown matrix $\Theta^*$ is
assumed to be either exactly low rank, or ``near'' low-rank, meaning that it
can be well-approximated by a matrix with low rank.
Consider the standard linear regression model $\y = \Xmat \betastar + w$,
where $\y \in \real^\numobs$ is an observation vector, $\Xmat \in
\real^{\numobs \times \pdim}$ is a design matrix, $\betastar \in \real^\pdim$
is the unknown regression vector, and $w \sim \mathcal{N}(0, \sigma^2 I)$ is
additive Gaussian noise. This paper studies the minimax rates of convergence
for estimation of $\betastar$ for $\ell_\rpar$-losses and in the
$\ell_2$-prediction loss, assuming that $\betastar$ belongs to an
$\ell_{\qpar}$-ball $\Ballq(\myrad)$ for some $\qpar \in [0,1]$.
Consider the standard linear regression model $\y = \Xmat \betastar + w$,
where $\y \in \real^\numobs$ is an observation vector, $\Xmat \in
\real^{\numobs \times \pdim}$ is a design matrix, $\betastar \in \real^\pdim$
is the unknown regression vector, and $w \sim \mathcal{N}(0, \sigma^2 I)$ is
additive Gaussian noise. This paper studies the minimax rates of convergence
for estimation of $\betastar$ for $\ell_\rpar$-losses and in the
$\ell_2$-prediction loss, assuming that $\betastar$ belongs to an
$\ell_{\qpar}$-ball $\Ballq(\myrad)$ for some $\qpar \in [0,1]$.
Principal component analysis (PCA) is a classical method for dimensionality
reduction based on extracting the dominant eigenvectors of the sample
covariance matrix. However, PCA is well known to behave poorly in the ``large
$p$, small $n$'' setting, in which the problem dimension $p$ is comparable to
or larger than the sample size $n$. This paper studies PCA in this
high-dimensional regime, but under the additional assumption that the maximal
eigenvector is sparse, say, with at most $k$ nonzero components.
Principal component analysis (PCA) is a classical method for dimensionality
reduction based on extracting the dominant eigenvectors of the sample
covariance matrix. However, PCA is well known to behave poorly in the ``large
$p$, small $n$'' setting, in which the problem dimension $p$ is comparable to
or larger than the sample size $n$. This paper studies PCA in this
high-dimensional regime, but under the additional assumption that the maximal
eigenvector is sparse, say, with at most $k$ nonzero components.