gr. Statistics

  1. Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution.

    Authors: Johan Segers, John H. J. Einmahl
    Subjects: gr. Statistics
    Abstract

    Consider a random sample from a bivariate distribution function $F$ in the
    max-domain of attraction of an extreme-value distribution function $G$. This
    $G$ is characterized by two extreme-value indices and a spectral measure, the
    latter determining the tail dependence structure of $F$. A major issue in
    multivariate extreme-value theory is the estimation of the spectral measure
    $\Phi_p$ with respect to the $L_p$ norm.

  2. Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution.

    Authors: Johan Segers, John H. J. Einmahl
    Subjects: gr. Statistics
    Abstract

    Consider a random sample from a bivariate distribution function $F$ in the
    max-domain of attraction of an extreme-value distribution function $G$. This
    $G$ is characterized by two extreme-value indices and a spectral measure, the
    latter determining the tail dependence structure of $F$. A major issue in
    multivariate extreme-value theory is the estimation of the spectral measure
    $\Phi_p$ with respect to the $L_p$ norm.

  3. Goodness-of-fit problem for errors in nonparametric regression: Distribution free approach.

    Authors: Estate V. Khmaladze, Hira L. Koul
    Subjects: gr. Statistics
    Abstract

    This paper discusses asymptotically distribution free tests for the classical
    goodness-of-fit hypothesis of an error distribution in nonparametric regression
    models. These tests are based on the same martingale transform of the residual
    empirical process as used in the one sample location model. This transformation
    eliminates extra randomization due to covariates but not due the errors, which
    is intrinsically present in the estimators of the regression function. Thus,
    tests based on the transformed process have, generally, better power.

  4. Goodness-of-fit problem for errors in nonparametric regression: Distribution free approach.

    Authors: Estate V. Khmaladze, Hira L. Koul
    Subjects: gr. Statistics
    Abstract

    This paper discusses asymptotically distribution free tests for the classical
    goodness-of-fit hypothesis of an error distribution in nonparametric regression
    models. These tests are based on the same martingale transform of the residual
    empirical process as used in the one sample location model. This transformation
    eliminates extra randomization due to covariates but not due the errors, which
    is intrinsically present in the estimators of the regression function. Thus,
    tests based on the transformed process have, generally, better power.

  5. Robust nearest-neighbor methods for classifying high-dimensional data.

    Authors: Yao-ban Chan, Peter Hall
    Subjects: gr. Statistics
    Abstract

    We suggest a robust nearest-neighbor approach to classifying high-dimensional
    data. The method enhances sensitivity by employing a threshold and truncates to
    a sequence of zeros and ones in order to reduce the deleterious impact of
    heavy-tailed data. Empirical rules are suggested for choosing the threshold.
    They require the bare minimum of data; only one data vector is needed from each
    population. Theoretical and numerical aspects of performance are explored,
    paying particular attention to the impacts of correlation and heterogeneity
    among data components.

  6. Robust nearest-neighbor methods for classifying high-dimensional data.

    Authors: Yao-ban Chan, Peter Hall
    Subjects: gr. Statistics
    Abstract

    We suggest a robust nearest-neighbor approach to classifying high-dimensional
    data. The method enhances sensitivity by employing a threshold and truncates to
    a sequence of zeros and ones in order to reduce the deleterious impact of
    heavy-tailed data. Empirical rules are suggested for choosing the threshold.
    They require the bare minimum of data; only one data vector is needed from each
    population. Theoretical and numerical aspects of performance are explored,
    paying particular attention to the impacts of correlation and heterogeneity
    among data components.

  7. Algebraic statistics for a directed random graph model with reciprocation.

    Authors: Alessandro Rinaldo, Sonja Petrović, Stephen E. Fienberg
    Subjects: gr. Statistics
    Abstract

    The p_1 model is a directed random graph model used to describe dyadic
    interactions in a social network in terms of effects due to differential
    attraction (popularity) and expansiveness, as well as an additional effect due
    to reciprocation. In this article we carry out an algebraic statistics analysis
    of this model. We show that the p_1 model is a toric model specified by a
    multi-homogeneous ideal. We conduct an extensive study of the Markov bases for
    p_1 models that incorporate explicitly the constraint arising from
    multi-homogeneity.

  8. Properties and refinements of the fused lasso.

    Authors: Alessandro Rinaldo
    Subjects: gr. Statistics
    Abstract

    We consider estimating an unknown signal, both blocky and sparse, which is
    corrupted by additive noise. We study three interrelated least squares
    procedures and their asymptotic properties. The first procedure is the fused
    lasso, put forward by Friedman et al. [Ann. Appl. Statist. 1 (2007) 302--332],
    which we modify into a different estimator, called the fused adaptive lasso,
    with better properties.

  9. Uniform limit theorems for wavelet density estimators.

    Authors: Evarist Giné, Richard Nickl
    Subjects: gr. Statistics
    Abstract

    Let $p_n(y)=\sum_k\hat{\alpha}_k\phi(y-k)+\sum_{l=0}^{j_n-1}\sum_k\hat
    {\beta}_{lk}2^{l/2}\psi(2^ly-k)$ be the linear wavelet density estimator, where
    $\phi$, $\psi$ are a father and a mother wavelet (with compact support),
    $\hat{\alpha}_k$, $\hat{\beta}_{lk}$ are the empirical wavelet coefficients
    based on an i.i.d.

  10. On Convergence Rates Equivalency and Sampling Strategies in Functional Deconvolution Models.

    Authors: Marianna Pensky, Theofanis Sapatinas
    Subjects: gr. Statistics
    Abstract

    Using the asymptotical minimax framework, we examine convergence rates
    equivalency between a continuous functional deconvolution model and its
    real-life discrete counterpart, over a wide range of Besov balls and for the
    $L^2$-risk. For this purpose, all possible models are divided into three
    groups: {\it uniform}, {\it regular} and {\it irregular}. We formulate the
    conditions when each of these situations takes place.

  11. Connecting tables with zero-one entries by a subset of a Markov basis.

    Authors: Hisayuki Hara, Akimichi Takemura
    Subjects: gr. Statistics
    Abstract

    We discuss connecting tables with zero-one entries by a subset of a Markov
    basis. In this paper, as a Markov basis we consider the Graver basis, which
    corresponds to the unique minimal Markov basis for the Lawrence lifting of the
    original configuration. Since the Graver basis tends to be large, it is of
    interest to clarify conditions such that a subset of the Graver basis, in
    particular a minimal Markov basis itself, connects tables with zero-one
    entries. We give some theoretical results on the connectivity of tables with
    zero-one entries.

  12. Connecting tables with zero-one entries by a subset of a Markov basis.

    Authors: Hisayuki Hara, Akimichi Takemura
    Subjects: gr. Statistics
    Abstract

    We discuss connecting tables with zero-one entries by a subset of a Markov
    basis. In this paper, as a Markov basis we consider the Graver basis, which
    corresponds to the unique minimal Markov basis for the Lawrence lifting of the
    original configuration. Since the Graver basis tends to be large, it is of
    interest to clarify conditions such that a subset of the Graver basis, in
    particular a minimal Markov basis itself, connects tables with zero-one
    entries. We give some theoretical results on the connectivity of tables with
    zero-one entries.

  13. Technical appendix to "Adaptive estimation of stationary Gaussian fields".

    Authors: Nicolas Verzelen
    Subjects: gr. Statistics
    Abstract

    This is a technical appendix to "Adaptive estimation of stationary Gaussian
    fields". We present several proofs that have been skipped in the main paper.

  14. Technical appendix to "Adaptive estimation of stationary Gaussian fields".

    Authors: Nicolas Verzelen
    Subjects: gr. Statistics
    Abstract

    This is a technical appendix to "Adaptive estimation of stationary Gaussian
    fields". We present several proofs that have been skipped in the main paper.

  15. Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing.

    Authors: Marek Omelka, Irène Gijbels, Noël Veraverbeke
    Subjects: gr. Statistics
    Abstract

    We reconsider the existing kernel estimators for a copula function, as
    proposed in Gijbels and Mielniczuk [Comm. Statist. Theory Methods 19 (1990)
    445--464], Fermanian, Radulovi\v{c} and Wegkamp [Bernoulli 10 (2004) 847--860]
    and Chen and Huang [Canad. J. Statist. 35 (2007) 265--282]. All of these
    estimators have as a drawback that they can suffer from a corner bias problem.
    A way to deal with this is to impose rather stringent conditions on the copula,
    outruling as such many classical families of copulas.

  16. Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing.

    Authors: Marek Omelka, Irène Gijbels, Noël Veraverbeke
    Subjects: gr. Statistics
    Abstract

    We reconsider the existing kernel estimators for a copula function, as
    proposed in Gijbels and Mielniczuk [Comm. Statist. Theory Methods 19 (1990)
    445--464], Fermanian, Radulovi\v{c} and Wegkamp [Bernoulli 10 (2004) 847--860]
    and Chen and Huang [Canad. J. Statist. 35 (2007) 265--282]. All of these
    estimators have as a drawback that they can suffer from a corner bias problem.
    A way to deal with this is to impose rather stringent conditions on the copula,
    outruling as such many classical families of copulas.

  17. One and two side generalisations of the log-Normal distribution by means of a new product definition.

    Authors: Silvio M. Duarte Queiros
    Subjects: gr. Statistics
    Abstract

    In this manuscript we introduce a generalisation of the log-Normal
    distribution that is inspired by a modification of the Kaypten multiplicative
    process using the $q$-product of Borges [Physica A \textbf{340}, 95 (2004)].
    Depending on the value of q the distribution increases the tail for small (when
    $q<1$) or large (when $q>1$) values of the variable upon analysis. The usual
    log-Normal distribution is retrieved when $q=1$. The main statistical features
    of this distribution are presented as well as a related random number
    generators and tables of quantiles of the Kolmogorov-Smirnov.

  18. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density.

    Authors: Madeleine Cule, Richard Samworth
    Subjects: gr. Statistics
    Abstract

    We present theoretical properties of the log-concave maximum likelihood
    estimator of a density based on an independent and identically distributed
    sample in $\mathbb{R}^d$. Our study covers both the case where the true
    underlying density is log-concave, and where this model is misspecified. We
    begin by showing that for a sequence of log-concave densities, convergence in
    distribution implies much stronger types of convergence -- in particular, it
    implies convergence in Hellinger distance and even in certain exponentially
    weighted total variation norms.

  19. Bernstein Von Mises Theorem for linear functionals of the density.

    Authors: Vincent Rivoirard, Judith Rousseau
    Subjects: gr. Statistics
    Abstract

    In this paper, we study the asymptotic posterior distribution of linear
    functionals of the density. In particular, we give general conditions to obtain
    a semiparametric version of the Bernstein-Von Mises theorem. We then apply this
    general result to nonparametric priors based on infinite dimensional
    exponential families. As a byproduct, we also derive adaptive nonparametric
    rates of concentration of the posterior distributions under these families of
    priors on the class of Sobolev and Besov spaces.

  20. Rank-based inference for bivariate extreme-value copulas.

    Authors: Christian Genest, Johan Segers
    Subjects: gr. Statistics
    Abstract

    Consider a continuous random pair $(X,Y)$ whose dependence is characterized
    by an extreme-value copula with Pickands dependence function $A$. When the
    marginal distributions of $X$ and $Y$ are known, several consistent estimators
    of $A$ are available. Most of them are variants of the estimators due to
    Pickands [Bull. Inst. Internat. Statist. 49 (1981) 859--878] and
    Cap\'{e}ra\`{a}, Foug\`{e}res and Genest [Biometrika 84 (1997) 567--577]. In
    this paper, rank-based versions of these estimators are proposed for the more
    common case where the margins of $X$ and $Y$ are unknown.

  21. High-dimensional analysis of semidefinite relaxations for sparse principal components.

    Authors: Arash A. Amini, Martin J. Wainwright
    Subjects: gr. Statistics
    Abstract

    Principal component analysis (PCA) is a classical method for dimensionality
    reduction based on extracting the dominant eigenvectors of the sample
    covariance matrix. However, PCA is well known to behave poorly in the ``large
    $p$, small $n$'' setting, in which the problem dimension $p$ is comparable to
    or larger than the sample size $n$. This paper studies PCA in this
    high-dimensional regime, but under the additional assumption that the maximal
    eigenvector is sparse, say, with at most $k$ nonzero components.

  22. High-dimensional analysis of semidefinite relaxations for sparse principal components.

    Authors: Arash A. Amini, Martin J. Wainwright
    Subjects: gr. Statistics
    Abstract

    Principal component analysis (PCA) is a classical method for dimensionality
    reduction based on extracting the dominant eigenvectors of the sample
    covariance matrix. However, PCA is well known to behave poorly in the ``large
    $p$, small $n$'' setting, in which the problem dimension $p$ is comparable to
    or larger than the sample size $n$. This paper studies PCA in this
    high-dimensional regime, but under the additional assumption that the maximal
    eigenvector is sparse, say, with at most $k$ nonzero components.

  23. Rank-based inference for bivariate extreme-value copulas.

    Authors: Christian Genest, Johan Segers
    Subjects: gr. Statistics
    Abstract

    Consider a continuous random pair $(X,Y)$ whose dependence is characterized
    by an extreme-value copula with Pickands dependence function $A$. When the
    marginal distributions of $X$ and $Y$ are known, several consistent estimators
    of $A$ are available. Most of them are variants of the estimators due to
    Pickands [Bull. Inst. Internat. Statist. 49 (1981) 859--878] and
    Cap\'{e}ra\`{a}, Foug\`{e}res and Genest [Biometrika 84 (1997) 567--577]. In
    this paper, rank-based versions of these estimators are proposed for the more
    common case where the margins of $X$ and $Y$ are unknown.

  24. Asymptotic expansion of the minimum covariance determinant estimators.

    Authors: E.A. Cator, H.P. Lopuha&#xe4;
    Subjects: gr. Statistics
    Abstract

    In arXiv:0907.0079 by Cator and Lopuhaa, an asymptotic expansion for the MCD
    estimators is established in a very general framework. This expansion requires
    the existence and non-singularity of the derivative in a first-order Taylor
    expansion. In this paper, we prove the existence of this derivative for
    multivariate distributions that have a density and provide an explicit
    expression. Moreover, under suitable symmetry conditions on the density, we
    show that this derivative is non-singular.

  25. Asymptotic expansion of the minimum covariance determinant estimators.

    Authors: E.A. Cator, H.P. Lopuha&#xe4;
    Subjects: gr. Statistics
    Abstract

    In arXiv:0907.0079 by Cator and Lopuhaa, an asymptotic expansion for the MCD
    estimators is established in a very general framework. This expansion requires
    the existence and non-singularity of the derivative in a first-order Taylor
    expansion. In this paper, we prove the existence of this derivative for
    multivariate distributions that have a density and provide an explicit
    expression. Moreover, under suitable symmetry conditions on the density, we
    show that this derivative is non-singular.

  26. Multivariate Archimedean copulas, $d$-monotone functions and $\ell_1$-norm symmetric distributions.

    Authors: Alexander J. McNeil, Johanna Ne&#x161;lehov&#xe1;
    Subjects: gr. Statistics
    Abstract

    It is shown that a necessary and sufficient condition for an Archimedean
    copula generator to generate a $d$-dimensional copula is that the generator is
    a $d$-monotone function. The class of $d$-dimensional Archimedean copulas is
    shown to coincide with the class of survival copulas of $d$-dimensional
    $\ell_1$-norm symmetric distributions that place no point mass at the origin.
    The $d$-monotone Archimedean copula generators may be characterized using a
    little-known integral transform of Williamson [Duke Math. J.

  27. On Optimality of the Shiryaev-Roberts Procedure for Detecting Changes in Distributions.

    Authors: Aleksey S. Polunchenko, Alexander G. Tartakovsky
    Subjects: gr. Statistics
    Abstract

    In 1985, for detecting changes in distributions Pollak introduced a specific
    minimax performance metric and a randomized version of the Shiryaev-Roberts
    procedure where the zero initial condition is replaced by a random variable
    sampled from the quasi-stationary distribution. Pollak proved that this
    procedure is third-order asymptotically optimal as the mean time to false alarm
    becomes large. The question whether Pollak's procedure is strictly minimax for
    any false alarm rate has been open for more than two decades, and there were
    several attempts to prove this strict optimality.

  28. Markov equivalence for ancestral graphs.

    Authors: R. Ayesha Ali, Thomas S. Richardson, Peter Spirtes
    Subjects: gr. Statistics
    Abstract

    Ancestral graphs can encode conditional independence relations that arise in
    directed acyclic graph (DAG) models with latent and selection variables.
    However, for any ancestral graph, there may be several other graphs to which it
    is Markov equivalent. We state and prove conditions under which two maximal
    ancestral graphs are Markov equivalent to each other, thereby extending
    analogous results for DAGs given by other authors. These conditions lead to an
    algorithm for determining Markov equivalence that runs in time that is
    polynomial in the number of vertices in the graph.

  29. Conditional predictive inference post model selection.

    Authors: Hannes Leeb
    Subjects: gr. Statistics
    Abstract

    We give a finite-sample analysis of predictive inference procedures after
    model selection in regression with random design. The analysis is focused on a
    statistically challenging scenario where the number of potentially important
    explanatory variables can be infinite, where no regularity conditions are
    imposed on unknown parameters, where the number of explanatory variables in a
    "good" model can be of the same order as sample size and where the number of
    candidate models can be of larger order than sample size.

  30. Statistical topology via Morse theory, persistence and nonparametric estimation.

    Authors: Peter Bubenik, Gunnar Carlsson, Peter T. Kim, Zhiming Luo
    Subjects: gr. Statistics
    Abstract

    In this paper we examine the use of topological methods for multivariate
    statistics. Using persistent homology from computational algebraic topology, a
    random sample is used to construct estimators of persistent homology. This
    estimation procedure can then be evaluated using the bottleneck distance
    between the estimated persistent homology and the true persistent homology. The
    connection to statistics comes from the fact that when viewed as a
    nonparametric regression problem, the bottleneck distance is bounded by the
    sup-norm loss.

  31. Improving SAMC using smoothing methods: Theory and applications to Bayesian model selection problems.

    Authors: Faming Liang
    Subjects: gr. Statistics
    Abstract

    Stochastic approximation Monte Carlo (SAMC) has recently been proposed by
    Liang, Liu and Carroll [J. Amer. Statist. Assoc. 102 (2007) 305--320] as a
    general simulation and optimization algorithm. In this paper, we propose to
    improve its convergence using smoothing methods and discuss the application of
    the new algorithm to Bayesian model selection problems. The new algorithm is
    tested through a change-point identification example. The numerical results
    indicate that the new algorithm can outperform SAMC and reversible jump MCMC
    significantly for the model selection problems.

  32. Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth.

    Authors: A. W. van der Vaart, J. H. van Zanten
    Subjects: gr. Statistics
    Abstract

    We consider nonparametric Bayesian estimation inference using a rescaled
    smooth Gaussian field as a prior for a multidimensional function. The rescaling
    is achieved using a Gamma variable and the procedure can be viewed as choosing
    an inverse Gamma bandwidth. The procedure is studied from a frequentist
    perspective in three statistical settings involving replicated observations
    (density estimation, regression and classification).

  33. Adaptive Hausdorff estimation of density level sets.

    Authors: Aarti Singh, Clayton Scott, Robert Nowak
    Subjects: gr. Statistics
    Abstract

    Consider the problem of estimating the $\gamma$-level set
    $G^*_{\gamma}=\{x:f(x)\geq\gamma\}$ of an unknown $d$-dimensional density
    function $f$ based on $n$ independent observations $X_1,...,X_n$ from the
    density. This problem has been addressed under global error criteria related to
    the symmetric set difference. However, in certain applications a spatially
    uniform mode of convergence is desirable to ensure that the estimated set is
    close to the target set everywhere. The Hausdorff error criterion provides this
    degree of uniformity and, hence, is more appropriate in such situations.

  34. Parameter tuning in pointwise adaptation using a propagation approach.

    Authors: Vladimir Spokoiny, C&#xe9;line Vial
    Subjects: gr. Statistics
    Abstract

    This paper discusses the problem of adaptive estimation of a univariate
    object like the value of a regression function at a given point or a linear
    functional in a linear inverse problem. We consider an adaptive procedure
    originated from Lepski [Theory Probab. Appl. 35 (1990) 454--466.] that selects
    in a data-driven way one estimate out of a given class of estimates ordered by
    their variability. A serious problem with using this and similar procedures is
    the choice of some tuning parameters like thresholds.

  35. Local linear quantile estimation for nonstationary time series.

    Authors: Zhou Zhou, Wei Biao Wu
    Subjects: gr. Statistics
    Abstract

    We consider estimation of quantile curves for a general class of
    nonstationary processes. Consistency and central limit results are obtained for
    local linear quantile estimates under a mild short-range dependence condition.
    Our results are applied to environmental data sets. In particular, our results
    can be used to address the problem of whether climate variability has changed,
    an important problem raised by IPCC (Intergovernmental Panel on Climate Change)
    in 2001.

  36. Hierarchical models in statistical inverse problems and the Mumford--Shah functional.

    Authors: Tapio Helin, Matti Lassas
    Subjects: gr. Statistics
    Abstract

    The Bayesian methods for linear inverse problems is studied using
    hierarchical Gaussian models. The problems are considered with different
    discretizations, and we analyze the phenomena which appear when the
    discretization becomes finer. A hierarchical solution method for signal
    restoration problems is introduced and studied with arbitrarily fine
    discretization. We show that the maximum a posteriori estimate converges to a
    minimizer of the Mumford--Shah functional. A new result regarding the existence
    of a minimizer of the Mumford--Shah functional is proved.

  37. Bayesian frequentist hybrid inference.

    Authors: Ao Yuan
    Subjects: gr. Statistics
    Abstract

    Bayesian and frequentist methods differ in many aspects, but share some basic
    optimality properties. In practice, there are situations in which one of the
    methods is more preferred by some criteria. We consider the case of inference
    about a set of multiple parameters, which can be divided into two disjoint
    subsets. On one set, a frequentist method may be favored and on the other, the
    Bayesian.

  38. Adaptive estimation in circular functional linear models.

    Authors: Jan Johannes, Fabienne Comte
    Subjects: gr. Statistics
    Abstract

    We consider the problem of estimating the slope parameter in circular
    functional linear regression, where scalar responses Y1,...,Yn are modeled in
    dependence of 1-periodic, second order stationary random functions X1,...,Xn.
    We consider an orthogonal series estimator of the slope function, by replacing
    the first m theoretical coefficients of its development in the trigonometric
    basis by adequate estimators.

  39. Asymptotic equivalence of empirical likelihood and Bayesian MAP.

    Authors: Marian Grend&#xe1;r, George Judge
    Subjects: gr. Statistics
    Abstract

    In this paper we are interested in empirical likelihood (EL) as a method of
    estimation, and we address the following two problems: (1) selecting among
    various empirical discrepancies in an EL framework and (2) demonstrating that
    EL has a well-defined probabilistic interpretation that would justify its use
    in a Bayesian context. Using the large deviations approach, a Bayesian law of
    large numbers is developed that implies that EL and the Bayesian maximum a
    posteriori probability (MAP) estimators are consistent under misspecification
    and that EL can be viewed as an asymptotic form of MAP.

  40. A semiparametric model for cluster data.

    Authors: Wenyang Zhang, Jianqing Fan, Yan Sun
    Subjects: gr. Statistics
    Abstract

    In the analysis of cluster data, the regression coefficients are frequently
    assumed to be the same across all clusters. This hampers the ability to study
    the varying impacts of factors on each cluster. In this paper, a semiparametric
    model is introduced to account for varying impacts of factors over clusters by
    using cluster-level covariates. It achieves the parsimony of parametrization
    and allows the explorations of nonlinear interactions. The random effect in the
    semiparametric model also accounts for within-cluster correlation.

  41. On asymptotically optimal tests under loss of identifiability in semiparametric models.

    Authors: Rui Song, Michael R. Kosorok, Jason P. Fine
    Subjects: gr. Statistics
    Abstract

    We consider tests of hypotheses when the parameters are not identifiable
    under the null in semiparametric models, where regularity conditions for
    profile likelihood theory fail. Exponential average tests based on integrated
    profile likelihood are constructed and shown to be asymptotically optimal under
    a weighted average power criterion with respect to a prior on the
    nonidentifiable aspect of the model. These results extend existing results for
    parametric models, which involve more restrictive assumptions on the form of
    the alternative than do our results.

  42. Asymptotic normality of a nonparametric estimator of sample coverage.

    Authors: Cun-Hui Zhang, Zhiyi Zhang
    Subjects: gr. Statistics
    Abstract

    This paper establishes a necessary and sufficient condition for the
    asymptotic normality of the nonparametric estimator of sample coverage proposed
    by Good [Biometrica 40 (1953) 237--264]. This new necessary and sufficient
    condition extends the validity of the asymptotic normality beyond the
    previously proven cases.

  43. Quarter-fraction factorial designs constructed via quaternary codes.

    Authors: Frederick K. H. Phoa, Hongquan Xu
    Subjects: gr. Statistics
    Abstract

    The research of developing a general methodology for the construction of good
    nonregular designs has been very active in the last decade. Recent research by
    Xu and Wong [Statist. Sinica 17 (2007) 1191--1213] suggested a new class of
    nonregular designs constructed from quaternary codes. This paper explores the
    properties and uses of quaternary codes toward the construction of
    quarter-fraction nonregular designs. Some theoretical results are obtained
    regarding the aliasing structure of such designs.

  44. Consistency of a recursive estimate of mixing distributions.

    Authors: Surya T. Tokdar, Ryan Martin, Jayanta K. Ghosh
    Subjects: gr. Statistics
    Abstract

    Mixture models have received considerable attention recently and Newton
    [Sankhy\={a} Ser. A 64 (2002) 306--322] proposed a fast recursive algorithm for
    estimating a mixing distribution. We prove almost sure consistency of this
    recursive estimate in the weak topology under mild conditions on the family of
    densities being mixed. This recursive estimate depends on the data ordering and
    a permutation-invariant modification is proposed, which is an average of the
    original over permutations of the data sequence.

  45. Hypothesis test for normal mixture models: The EM approach.

    Authors: Jiahua Chen, Pengfei Li
    Subjects: gr. Statistics
    Abstract

    Normal mixture distributions are arguably the most important mixture models,
    and also the most technically challenging. The likelihood function of the
    normal mixture model is unbounded based on a set of random samples, unless an
    artificial bound is placed on its component variance parameter. Moreover, the
    model is not strongly identifiable so it is hard to differentiate between over
    dispersion caused by the presence of a mixture and that caused by a large
    variance, and it has infinite Fisher information with respect to mixing
    proportions.

  46. Efficient randomized-adaptive designs.

    Authors: Feifang Hu, Li-Xin Zhang, Xuming He
    Subjects: gr. Statistics
    Abstract

    Response-adaptive randomization has recently attracted a lot of attention in
    the literature. In this paper, we propose a new and simple family of
    response-adaptive randomization procedures that attain the Cramer--Rao lower
    bounds on the allocation variances for any allocation proportions, including
    optimal allocation proportions. The allocation probability functions of
    proposed procedures are discontinuous. The existing large sample theory for
    adaptive designs relies on Taylor expansions of the allocation probability
    functions, which do not apply to nondifferentiable cases.

  47. On combinatorial testing problems.

    Authors: Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, Gabor Lugosi
    Subjects: gr. Statistics
    Abstract

    We study a class of hypothesis testing problems in which, upon observing the
    realization of an n-dimensional Gaussian vector, one has to decide whether the
    vector was drawn from a standard normal distribution or, alternatively, whether
    there is a subset of the components belonging to a certain given class of sets
    whose elements have been "contaminated," that is, have a mean different from
    zero. We establish some general conditions under which testing is possible and
    others under which testing is hopeless with a small risk.

  48. Near-ideal model selection by $\ell_1$ minimization.

    Authors: Emmanuel J. Cand&#xe8;s, Yaniv Plan
    Subjects: gr. Statistics
    Abstract

    We consider the fundamental problem of estimating the mean of a vector
    $y=X\beta+z$, where $X$ is an $n\times p$ design matrix in which one can have
    far more variables than observations, and $z$ is a stochastic error term--the
    so-called "$p>n$" setup. When $\beta$ is sparse, or, more generally, when there
    is a sparse subset of covariates providing a close approximation to the unknown
    mean vector, we ask whether or not it is possible to accurately estimate
    $X\beta$ using a computationally tractable algorithm.

  49. Consistent estimates of deformed isotropic Gaussian random fields on the plane.

    Authors: Sourav Chatterjee, Ethan Anderes
    Subjects: gr. Statistics
    Abstract

    This paper proves fixed domain asymptotic results for estimating a smooth
    invertible transformation $f:\Bbb{R}^2\to\Bbb{R}^2$ when observing the deformed
    random field $Z\circ f$ on a dense grid in a bounded, simply connected domain
    $\Omega$, where $Z$ is assumed to be an isotropic Gaussian random field on
    $\Bbb{R}^2$. The estimate $\hat{f}$ is constructed on a simply connected domain
    $U$, such that $\overline{U}\subset\Omega$ and is defined using kernel smoothed
    quadratic variations, Bergman projections and results from quasiconformal
    theory.

  50. Deconvolution with unknown error distribution.

    Authors: Jan Johannes
    Subjects: gr. Statistics
    Abstract

    We consider the problem of estimating a density $f_X$ using a sample
    $Y_1,...,Y_n$ from $f_Y=f_X\star f_{\epsilon}$, where $f_{\epsilon}$ is an
    unknown density. We assume that an additional sample
    $\epsilon_1,...,\epsilon_m$ from $f_{\epsilon}$ is observed. Estimators of
    $f_X$ and its derivatives are constructed by using nonparametric estimators of
    $f_Y$ and $f_{\epsilon}$ and by applying a spectral cut-off in the Fourier
    domain.

  51. Nonparametric inference for discretely sampled L\'evy processes.

    Authors: Shota Gugushvili
    Subjects: gr. Statistics
    Abstract

    Given a sample from a discretely observed L\'evy process $X=(X_t)_{t\geq 0}$
    of the finite jump activity, we study the problem of nonparametric estimation
    of the L\'evy density $\rho$ corresponding to the process $X.$ Our estimator of
    $\rho$ is based on a suitable inversion of the L\'evy-Khintchine formula and a
    plug-in device. The main result of the paper deals with an upper bound on the
    mean square error of the estimator of $\rho$ at a fixed point $x.$ We also show
    that the estimator attains the minimax convergence rate over a suitable class
    of L\'evy densities.

  52. Asymptotic theory for the semiparametric accelerated failure time model with missing data.

    Authors: Bin Nan, John D. Kalbfleisch, Menggang Yu
    Subjects: gr. Statistics
    Abstract

    We consider a class of doubly weighted rank-based estimating methods for the
    transformation (or accelerated failure time) model with missing data as arise,
    for example, in case-cohort studies. The weights considered may not be
    predictable as required in a martingale stochastic process formulation. We
    treat the general problem as a semiparametric estimating equation problem and
    provide proofs of asymptotic properties for the weighted estimators, with
    either true weights or estimated weights, by using empirical process theory
    where martingale theory may fail.

  53. Estimating the degree of activity of jumps in high frequency data.

    Authors: Jean Jacod, Yacine A&#xef;t-Sahalia
    Subjects: gr. Statistics
    Abstract

    We define a generalized index of jump activity, propose estimators of that
    index for a discretely sampled process and derive the estimators' properties.
    These estimators are applicable despite the presence of Brownian volatility in
    the process, which makes it more challenging to infer the characteristics of
    the small, infinite activity jumps. When the method is applied to high
    frequency stock returns, we find evidence of infinitely active jumps in the
    data and estimate their index of activity.

  54. Estimating linear functionals in nonlinear regression with responses missing at random.

    Authors: Ursula U. M&#xfc;ller
    Subjects: gr. Statistics
    Abstract

    We consider regression models with parametric (linear or nonlinear)
    regression function and allow responses to be ``missing at random.'' We assume
    that the errors have mean zero and are independent of the covariates. In order
    to estimate expectations of functions of covariate and response we use a fully
    imputed estimator, namely an empirical estimator based on estimators of
    conditional expectations given the covariate.

  55. Nonparametric estimation by convex programming.

    Authors: Anatoli B. Juditsky, Arkadi S. Nemirovski
    Subjects: gr. Statistics
    Abstract

    The problem we concentrate on is as follows: given (1) a convex compact set
    $X$ in ${\mathbb{R}}^n$, an affine mapping $x\mapsto A(x)$, a parametric family
    $\{p_{\mu}(\cdot)\}$ of probability densities and (2) $N$ i.i.d. observations
    of the random variable $\omega$, distributed with the density $p_{A(x)}(\cdot)$
    for some (unknown) $x\in X$, estimate the value $g^Tx$ of a given linear form
    at $x$.

  56. High-dimensional variable selection.

    Authors: Larry Wasserman, Kathryn Roeder
    Subjects: gr. Statistics
    Abstract

    This paper explores the following question: what kind of statistical
    guarantees can be given when doing variable selection in high-dimensional
    models? In particular, we look at the error rates and power of some multi-stage
    regression methods. In the first stage we fit a set of candidate models. In the
    second stage we select one model by cross-validation. In the third stage we use
    hypothesis testing to eliminate some variables.

  57. Functional linear regression that's interpretable.

    Authors: Gareth M. James, Jing Wang, Ji Zhu
    Subjects: gr. Statistics
    Abstract

    Regression models to relate a scalar $Y$ to a functional predictor $X(t)$ are
    becoming increasingly common. Work in this area has concentrated on estimating
    a coefficient function, $\beta(t)$, with $Y$ related to $X(t)$ through
    $\int\beta(t)X(t) dt$. Regions where $\beta(t)\ne0$ correspond to places where
    there is a relationship between $X(t)$ and $Y$. Alternatively, points where
    $\beta(t)=0$ indicate no relationship.

  58. Regression in random design and Bayesian warped wavelets estimators.

    Authors: Thanh Mai Pham Ngoc
    Subjects: gr. Statistics
    Abstract

    In this paper we deal with the regression problem in a random design setting.
    We investigate asymptotic optimality under minimax point of view of various
    Bayesian rules based on warped wavelets and show that they nearly attain
    optimal minimax rates of convergence over the Besov smoothness class
    considered. Warped wavelets have been introduced recently, they offer very good
    computable and easy-to-implement properties while being well adapted to the
    statistical problem at hand.

  59. Some sharp performance bounds for least squares regression with $L_1$ regularization.

    Authors: Tong Zhang
    Subjects: gr. Statistics
    Abstract

    We derive sharp performance bounds for least squares regression with $L_1$
    regularization from parameter estimation accuracy and feature selection quality
    perspectives. The main result proved for $L_1$ regularization extends a similar
    result in [Ann. Statist. 35 (2007) 2313--2351] for the Dantzig selector. It
    gives an affirmative answer to an open question in [Ann. Statist. 35 (2007)
    2358--2364]. Moreover, the result leads to an extended view of feature
    selection that allows less restrictive conditions than some recent work.

  60. A Backward Particle Interpretation of Feynman-Kac Formulae.

    Authors: Pierre Del Moral, Arnaud Doucet, Sumeetpal S. Singh
    Subjects: gr. Statistics
    Abstract

    We design a particle interpretation of Feynman-Kac measures on path spaces
    based on a backward Markovian representation combined with a traditional mean
    field particle interpretation of the flow of their final time marginals. In
    contrast to traditional genealogical tree based models, these new particle
    algorithms can be used to compute normalized additive functionals "on-the-fly"
    as well as their limiting occupation measures with a given precision degree
    that does not depend on the final time horizon.

  61. A bayesian approach to the estimation of maps between riemannian manifolds, II: examples.

    Authors: Leo T. Butler, Boris Levit
    Subjects: gr. Statistics
    Abstract

    Let M be a smooth compact oriented manifold without boundary, imbedded in a
    euclidean space E and let f be a smooth map of M into a Riemannian manifold N.
    An unknown state x in M is observed via X=x+su where s>0 is a small parameter
    and u is a white Gaussian noise. For a given smooth prior on M and smooth
    estimators g of the map f we have derived a second-order asymptotic expansion
    for the related Bayesian risk (see arXiv:0705.2540). In this paper, we apply
    this technique to a variety of examples.

  62. Radon needlet thresholding.

    Authors: Gerard Kerkyacharian, Erwan Le Pennec, Dominique Picard
    Subjects: gr. Statistics
    Abstract

    We provide a new algorithm for the treatment of the noisy inversion of the
    radon transform using an appropriate thresholding technique adapted to a well
    chosen new localized basis. We establish minimax results and prove their
    optimality. In particular we prove that the procedures provided here are able
    to attain minimax bounds for any $\bL_p$ loss. It is important to notice that
    most of the minimax bounds obtained here are new to our knowledge.

Syndicate content