Peter Hall

  1. Asymptotic normality and valid inference for Gaussian variational approximation.

    Authors: Peter Hall, Tung Pham, M. P. Wand, S. S. J. Wang
    Subjects: Statistics
    Abstract

    We derive the precise asymptotic distributional behavior of Gaussian
    variational approximate estimators of the parameters in a single-predictor
    Poisson mixed model. These results are the deepest yet obtained concerning the
    statistical properties of a variational approximation method. Moreover, they
    give rise to asymptotically valid statistical inference. A simulation study
    demonstrates that Gaussian variational approximate confidence intervals possess
    good to excellent coverage properties, and have a similar precision to their
    exact likelihood counterparts.

  2. Modeling the variability of rankings.

    Authors: Peter Hall, Hugh Miller
    Subjects: Statistics
    Abstract

    For better or for worse, rankings of institutions, such as universities,
    schools and hospitals, play an important role today in conveying information
    about relative performance. They inform policy decisions and budgets, and are
    often reported in the media.

  3. Strong approximations of level exceedences related to multiple hypothesis testing.

    Authors: Peter Hall, Qiying Wang
    Subjects: Statistics
    Abstract

    Particularly in genomics, but also in other fields, it has become commonplace
    to undertake highly multiple Student's $t$-tests based on relatively small
    sample sizes. The literature on this topic is continually expanding, but the
    main approaches used to control the family-wise error rate and false discovery
    rate are still based on the assumption that the tests are independent.

  4. Innovated higher criticism for detecting sparse signals in correlated noise.

    Authors: Peter Hall, Jiashun Jin
    Subjects: Statistics
    Abstract

    Higher criticism is a method for detecting signals that are both sparse and
    weak. Although first proposed in cases where the noise variables are
    independent, higher criticism also has reasonable performance in settings where
    those variables are correlated.

  5. Local polynomial regression and variable selection.

    Authors: Peter Hall, Hugh Miller
    Subjects: Statistics
    Abstract

    We propose a method for incorporating variable selection into local
    polynomial regression. This can improve the accuracy of the regression by
    extending the bandwidth in directions corresponding to those variables judged
    to be are unimportant. It also increases our understanding of the dataset by
    highlighting areas where these variables are redundant. The approach has the
    potential to effect complete variable removal as well as perform partial
    removal when a variable redundancy applies only to particular regions of the
    data.

  6. Kernel methods and minimum contrast estimators for empirical deconvolution.

    Authors: Peter Hall, Aurore Delaigle
    Subjects: Methodology
    Abstract

    We survey classical kernel methods for providing nonparametric solutions to
    problems involving measurement error. In particular we outline kernel-based
    methodology in this setting, and discuss its basic properties. Then we point to
    close connections that exist between kernel methods and much newer approaches
    based on minimum contrast techniques. The connections are through use of the
    sinc kernel for kernel-based inference.

  7. Defining probability density for a distribution of random functions.

    Authors: Peter Hall, Aurore Delaigle
    Subjects: Statistics
    Abstract

    The notion of probability density for a random function is not as
    straightforward as in finite-dimensional cases. While a probability density
    function generally does not exist for functional data, we show that it is
    possible to develop the notion of density when functional data are considered
    in the space determined by the eigenfunctions of principal component analysis.
    This leads to a transparent and meaningful surrogate for density defined in
    terms of the average value of the logarithms of the densities of the
    distributions of principal components for a given dimension.

  8. Optimal properties of centroid-based classifiers for very high-dimensional data.

    Authors: Peter Hall, Tung Pham
    Subjects: Statistics
    Abstract

    We show that scale-adjusted versions of the centroid-based classifier enjoys
    optimal properties when used to discriminate between two very high-dimensional
    populations where the principal differences are in location. The scale
    adjustment removes the tendency of scale differences to confound differences in
    means. Certain other distance-based methods, for example, those founded on
    nearest-neighbor distance, do not have optimal performance in the sense that we
    propose.

  9. Robustness and accuracy of methods for high dimensional data analysis based on Student's t statistic.

    Authors: Peter Hall, Jiashun Jin, Aurore Delaigle
    Subjects: Methodology
    Abstract

    Student's $t$ statistic is finding applications today that were never
    envisaged when it was introduced more than a century ago. Many of these
    applications rely on properties, for example robustness against heavy tailed
    sampling distributions, that were not explicitly considered until relatively
    recently. In this paper we explore these features of the $t$ statistic in the
    context of its application to very high dimensional problems, including feature
    selection and ranking, highly multiple hypothesis testing, and sparse, high
    dimensional signal detection.

  10. Feature Selection when There are Many Influential Features.

    Authors: Peter Hall, Hugh Miller, Jiashun Jin
    Subjects: Statistics
    Abstract

    Recent discussion of the success of feature selection methods has argued that
    focusing on a relatively small number of features has been counterproductive.
    Instead, it is suggested, the number of significant features can be in the
    thousands or tens of thousands, rather than (as is commonly supposed at
    present) approximately in the range from five to fifty. This change, in orders
    of magnitude, in the number of influential features, necessitates alterations
    to the way in which we choose features and to the manner in which the success
    of feature selection is assessed.

  11. Using the bootstrap to quantify the authority of an empirical ranking.

    Authors: Peter Hall, Hugh Miller
    Subjects: Statistics
    Abstract

    The bootstrap is a popular and convenient method for quantifying the
    authority of an empirical ordering of attributes, for example of a ranking of
    the performance of institutions or of the influence of genes on a response
    variable. In the first of these examples, the number, $p$, of quantities being
    ordered is sometimes only moderate in size; in the second it can be very large,
    often much greater than sample size. However, we show that in both types of
    problem the conventional bootstrap can produce inconsistency.

  12. Estimation of functional derivatives.

    Authors: Peter Hall, Hans-Georg Müller, Fang Yao
    Subjects: Statistics
    Abstract

    Situations of a functional predictor paired with a scalar response are
    increasingly encountered in data analysis. Predictors are often appropriately
    modeled as square integrable smooth random functions. Imposing minimal
    assumptions on the nature of the functional relationship, we aim to estimate
    the directional derivatives and gradients of the response with respect to the
    predictor functions.

  13. Nonparametric "regression" when errors are positioned at end-points.

    Authors: Peter Hall, Ingrid Van Keilegom
    Subjects: Statistics
    Abstract

    Increasing practical interest has been shown in regression problems where the
    errors, or disturbances, are centred in a way that reflects particular
    characteristics of the mechanism that generated the data. In economics this
    occurs in problems involving data on markets, productivity and auctions, where
    it can be natural to centre at an end-point of the error distribution rather
    than at the distribution's mean.

  14. Robust nearest-neighbor methods for classifying high-dimensional data.

    Authors: Yao-ban Chan, Peter Hall
    Subjects: gr. Statistics
    Abstract

    We suggest a robust nearest-neighbor approach to classifying high-dimensional
    data. The method enhances sensitivity by employing a threshold and truncates to
    a sequence of zeros and ones in order to reduce the deleterious impact of
    heavy-tailed data. Empirical rules are suggested for choosing the threshold.
    They require the bare minimum of data; only one data vector is needed from each
    population. Theoretical and numerical aspects of performance are explored,
    paying particular attention to the impacts of correlation and heterogeneity
    among data components.

  15. Robust nearest-neighbor methods for classifying high-dimensional data.

    Authors: Yao-ban Chan, Peter Hall
    Subjects: gr. Statistics
    Abstract

    We suggest a robust nearest-neighbor approach to classifying high-dimensional
    data. The method enhances sensitivity by employing a threshold and truncates to
    a sequence of zeros and ones in order to reduce the deleterious impact of
    heavy-tailed data. Empirical rules are suggested for choosing the threshold.
    They require the bare minimum of data; only one data vector is needed from each
    population. Theoretical and numerical aspects of performance are explored,
    paying particular attention to the impacts of correlation and heterogeneity
    among data components.

Syndicate content