We derive the precise asymptotic distributional behavior of Gaussian
variational approximate estimators of the parameters in a single-predictor
Poisson mixed model. These results are the deepest yet obtained concerning the
statistical properties of a variational approximation method. Moreover, they
give rise to asymptotically valid statistical inference. A simulation study
demonstrates that Gaussian variational approximate confidence intervals possess
good to excellent coverage properties, and have a similar precision to their
exact likelihood counterparts.
For better or for worse, rankings of institutions, such as universities,
schools and hospitals, play an important role today in conveying information
about relative performance. They inform policy decisions and budgets, and are
often reported in the media.
Particularly in genomics, but also in other fields, it has become commonplace
to undertake highly multiple Student's $t$-tests based on relatively small
sample sizes. The literature on this topic is continually expanding, but the
main approaches used to control the family-wise error rate and false discovery
rate are still based on the assumption that the tests are independent.
Higher criticism is a method for detecting signals that are both sparse and
weak. Although first proposed in cases where the noise variables are
independent, higher criticism also has reasonable performance in settings where
those variables are correlated.
We propose a method for incorporating variable selection into local
polynomial regression. This can improve the accuracy of the regression by
extending the bandwidth in directions corresponding to those variables judged
to be are unimportant. It also increases our understanding of the dataset by
highlighting areas where these variables are redundant. The approach has the
potential to effect complete variable removal as well as perform partial
removal when a variable redundancy applies only to particular regions of the
data.
We survey classical kernel methods for providing nonparametric solutions to
problems involving measurement error. In particular we outline kernel-based
methodology in this setting, and discuss its basic properties. Then we point to
close connections that exist between kernel methods and much newer approaches
based on minimum contrast techniques. The connections are through use of the
sinc kernel for kernel-based inference.
The notion of probability density for a random function is not as
straightforward as in finite-dimensional cases. While a probability density
function generally does not exist for functional data, we show that it is
possible to develop the notion of density when functional data are considered
in the space determined by the eigenfunctions of principal component analysis.
This leads to a transparent and meaningful surrogate for density defined in
terms of the average value of the logarithms of the densities of the
distributions of principal components for a given dimension.
We show that scale-adjusted versions of the centroid-based classifier enjoys
optimal properties when used to discriminate between two very high-dimensional
populations where the principal differences are in location. The scale
adjustment removes the tendency of scale differences to confound differences in
means. Certain other distance-based methods, for example, those founded on
nearest-neighbor distance, do not have optimal performance in the sense that we
propose.
Student's $t$ statistic is finding applications today that were never
envisaged when it was introduced more than a century ago. Many of these
applications rely on properties, for example robustness against heavy tailed
sampling distributions, that were not explicitly considered until relatively
recently. In this paper we explore these features of the $t$ statistic in the
context of its application to very high dimensional problems, including feature
selection and ranking, highly multiple hypothesis testing, and sparse, high
dimensional signal detection.
Recent discussion of the success of feature selection methods has argued that
focusing on a relatively small number of features has been counterproductive.
Instead, it is suggested, the number of significant features can be in the
thousands or tens of thousands, rather than (as is commonly supposed at
present) approximately in the range from five to fifty. This change, in orders
of magnitude, in the number of influential features, necessitates alterations
to the way in which we choose features and to the manner in which the success
of feature selection is assessed.
The bootstrap is a popular and convenient method for quantifying the
authority of an empirical ordering of attributes, for example of a ranking of
the performance of institutions or of the influence of genes on a response
variable. In the first of these examples, the number, $p$, of quantities being
ordered is sometimes only moderate in size; in the second it can be very large,
often much greater than sample size. However, we show that in both types of
problem the conventional bootstrap can produce inconsistency.
Situations of a functional predictor paired with a scalar response are
increasingly encountered in data analysis. Predictors are often appropriately
modeled as square integrable smooth random functions. Imposing minimal
assumptions on the nature of the functional relationship, we aim to estimate
the directional derivatives and gradients of the response with respect to the
predictor functions.
Increasing practical interest has been shown in regression problems where the
errors, or disturbances, are centred in a way that reflects particular
characteristics of the mechanism that generated the data. In economics this
occurs in problems involving data on markets, productivity and auctions, where
it can be natural to centre at an end-point of the error distribution rather
than at the distribution's mean.
We suggest a robust nearest-neighbor approach to classifying high-dimensional
data. The method enhances sensitivity by employing a threshold and truncates to
a sequence of zeros and ones in order to reduce the deleterious impact of
heavy-tailed data. Empirical rules are suggested for choosing the threshold.
They require the bare minimum of data; only one data vector is needed from each
population. Theoretical and numerical aspects of performance are explored,
paying particular attention to the impacts of correlation and heterogeneity
among data components.
We suggest a robust nearest-neighbor approach to classifying high-dimensional
data. The method enhances sensitivity by employing a threshold and truncates to
a sequence of zeros and ones in order to reduce the deleterious impact of
heavy-tailed data. Empirical rules are suggested for choosing the threshold.
They require the bare minimum of data; only one data vector is needed from each
population. Theoretical and numerical aspects of performance are explored,
paying particular attention to the impacts of correlation and heterogeneity
among data components.