Ery Arias-Castro

  1. Oracle inequalities and minimax rates for non-local means and related adaptive kernel-based methods.

    Authors: Ery Arias-Castro, Rebecca Willett, Joseph Salmon
    Subjects: Statistics
    Abstract

    This paper describes a novel theoretical characterization of the performance
    of non-local means (NLM) for noise removal. NLM has proven effective in a
    variety of empirical studies, but little is understood fundamentally about how
    it performs relative to classical methods based on wavelets or how various
    parameters (e.g., patch size) should be chosen.

  2. Compressive binary search.

    Authors: Mark A. Davenport, Ery Arias-Castro
    Subjects: Information Theory
    Abstract

    In this paper we consider the problem of locating a nonzero entry in a
    high-dimensional vector from possibly adaptive linear measurements. We consider
    a recursive bisection method which we dub the compressive binary search and
    show that it improves on what any nonadaptive method can achieve. We establish
    a non-asymptotic bound that applies to all methods, regardless of their
    computational complexity. Combined, these results show that the compressive
    binary search is within a double logarithmic factor of the optimal performance.

  3. Detection of correlations.

    Authors: Ery Arias-Castro, Sébastien Bubeck, Gábor Lugosi
    Subjects: Statistics
    Abstract

    We consider the hypothesis testing problem of deciding whether an observed
    high-dimensional normal vector has independent components or, alternatively, if
    it has a small subset of correlated components. The correlated components may
    have a certain combinatorial structure known to the statistician. We establish
    upper and lower bounds for the Bayes risk in terms of the size of the
    correlated subset, the level of correlation, and the structure of the class of
    possibly correlated sets.

  4. Cluster Detection in Networks using Percolation.

    Authors: Ery Arias-Castro, Geoffrey R. Grimmett
    Subjects: Statistics
    Abstract

    We consider the task of detecting a salient cluster in a sensor network,
    i.e., an undirected graph with a random variable attached to each node.
    Motivated by recent research in environmental statistics and the drive to
    compete with the reigning scan statistic, we explore alternatives based on the
    percolative properties of the network. The first method is based on the size of
    the largest connected component after removing the nodes in the network whose
    value is lower than a given threshold. The second one is the upper level set
    scan test introduced by Patil and Taillie (2003).

  5. Global Testing under Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism.

    Authors: Emmanuel J. Candès, Yaniv Plan, Ery Arias-Castro
    Subjects: Statistics
    Abstract

    Testing for the significance of a subset of regression coefficients in a
    linear model, a staple of statistical analysis, goes back at least to the work
    of Fisher who introduced the analysis of variance (ANOVA). We study this
    problem under the assumption that the coefficient vector is sparse, a common
    situation in modern high-dimensional settings. Suppose the regression vector is
    of dimension p with S non-zero coefficients with S = p^{1 -alpha}. Under
    moderate sparsity levels, i.e. alpha <= 1/2, we show that ANOVA is essentially
    optimal under some conditions on the design.

  6. The Cheeger Constant: from Discrete to Continuous.

    Authors: Ery Arias-Castro, Bruno Pelletier, Pierre Pudlo
    Subjects: Statistics
    Abstract

    Let M be a bounded domain of a Euclidian space with smooth boundary. We
    relate the Cheeger constant of M and the conductance of a neighborhood graph
    defined on a random sample from M. By restricting the minimization defining the
    latter over a particular class of subsets, we obtain consistency (after
    normalization) as the sample size increases, and show that any minimizing
    sequence of subsets has a subsequence converging to a Cheeger set of M.

  7. Detection of an anomalous cluster in a network.

    Authors: Ery Arias-Castro, Emmanuel J. Candes, Arnaud Durand
    Subjects: Statistics
    Abstract

    We consider the problem of detecting whether or not in a given sensor
    network, there is a cluster of sensors which exhibit an "unusual behavior".
    Formally, suppose we are given a set of nodes and attach a random variable to
    each node. We observe a realization of this process and want to decide between
    the following two hypotheses: under the null, the variables are i.i.d. standard
    normal; under the alternative, there is a cluster of variables that are i.i.d.
    normal with positive mean and unit variance, while the rest are i.i.d. standard
    normal.

  8. Spectral clustering based on local linear approximations.

    Authors: Ery Arias-Castro, Gilad Lerman, Guangliang Chen
    Subjects: Machine Learning
    Abstract

    In the context of clustering, we assume a generative model where each cluster
    is the result of sampling points in the neighborhood of an embedded smooth
    surface, possibly contaminated with outliers. We consider a prototype for a
    higher-order spectral clustering method based on the residual from a local
    linear approximation.

  9. Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions.

    Authors: Ery Arias-Castro
    Subjects: Machine Learning
    Abstract

    In the context of clustering, we consider a generative model in a Euclidean
    ambient space with clusters of different shapes, dimensions, sizes and
    densities. In an asymptotic setting where the number of points becomes large,
    we obtain theoretical guaranties for a few emblematic methods based on pairwise
    distances: a simple algorithm based on the extraction of connected components
    in a neighborhood graph; the spectral clustering method of Ng, Jordan and
    Weiss; and hierarchical clustering with single linkage.

RSS-материал