Methodology

  1. Inference on Counterfactual Distributions.

    Authors: Victor Chernozhukov, Ivan Fernandez-Val, Blaise Melly
    Subjects: Methodology
    Abstract

    We develop inference procedures for policy analysis based on regression
    methods. We consider policy interventions that correspond to either changes in
    the distribution of covariates, or changes in the conditional distribution of
    the outcome given covariates, or both. Under either of these policy scenarios,
    we derive functional central limit theorems for regression-based estimators of
    the status quo and counterfactual marginal distributions.

  2. Exact Multivariate Tests - A New Effective Principle of Controlled Model Choice.

    Authors: Juergen Laeuter, Maciej Rosolowski, Ekkehard Glimm
    Subjects: Methodology
    Abstract

    High-dimensional tests are applied to find relevant sets of variables and
    relevant models. If variables are selected by analyzing the sums of products
    matrices and a corresponding mean-value test is performed, there is the danger
    that the nominal error of first kind is exceeded. In the paper, well-known
    multivariate tests receive a new mathematical interpretation such that the
    error of first kind of the combined testing and selecting procedure can more
    easily be kept.

  3. Detecting regime switches in the dependence structure of high dimensional financial data.

    Authors: Claudia Czado, Jakob Stoeber
    Subjects: Methodology
    Abstract

    Misperceptions about extreme dependencies between different financial assets
    have been an im- portant element of the recent financial crisis. This paper
    studies inhomogeneity in dependence structures using Markov switching regular
    vine copulas. These account for asymmetric depen- dencies and tail dependencies
    in high dimensional data. We develop methods for fast maximum likelihood as
    well as Bayesian inference. Our algorithms are validated in simulations and
    applied to financial data.

  4. Modeling high dimensional time-varying dependence using D-vine SCAR models.

    Authors: Hans Manner, Carlos Almeida, Claudia Czado
    Subjects: Methodology
    Abstract

    We consider the problem of modeling the dependence among many time series. We
    build high dimensional time-varying copula models by combining pair-copula
    constructions (PCC) with stochastic autoregressive copula (SCAR) models to
    capture dependence that changes over time. We show how the estimation of this
    highly complex model can be broken down into the estimation of a sequence of
    bivariate SCAR models, which can be achieved by using the method of simulated
    maximum likelihood.

  5. Selecting and estimating regular vine copulae and application to financial returns.

    Authors: J. Dissmann, E. C. Brechmann, C. Czado, D. Kurowicka
    Subjects: Methodology
    Abstract

    Since Aas et al. (2009) introduced inference of multivariate copulae
    constructed through pair-copula decompositions to the statistical community,
    interest in these models has been growing steadily and they are finding
    successful applications in various fields. Research so far has however been
    concentrating on so-called canonical and D-vine copulae. In this article, we
    discuss the more general class of regular vines.

  6. Hierarchical Kendall copulas: Properties and inference.

    Authors: Eike Christian Brechmann
    Subjects: Methodology
    Abstract

    While there is substantial need for dependence models in high dimensions,
    most existing models strongly suffer from the curse of dimensionality and
    barely balance parsimony and flexibility. In this paper, the new class of
    hierarchical Kendall copulas is proposed which tackles these problems.
    Constructed with flexible copulas specified for groups of variables in
    different hierarchical levels, hierarchical Kendall copulas are able to model
    complex dependence patterns without severe restrictions.

  7. Covariance Estimation: The GLM and Regularization Perspectives.

    Authors: Mohsen Pourahmadi
    Subjects: Methodology
    Abstract

    Finding an unconstrained and statistically interpretable reparameterization
    of a covariance matrix is still an open problem in statistics. Its solution is
    of central importance in covariance estimation, particularly in the recent
    high-dimensional data environment where enforcing the positive-definiteness
    constraint could be computationally expensive.

  8. Tree Models for Difference and Change Detection in a Complex Environment.

    Authors: Mark Holmes, Yong Wang, Ilze Ziedins, Neal Challands
    Subjects: Methodology
    Abstract

    A new family of tree models is proposed, which we call "differential trees."
    A differential tree model is constructed from multiple data sets and aims to
    detect distributional differences between them. The new methodology differs
    from the existing difference and change detection techniques in its
    nonparametric nature, model construction from multiple data sets, and
    applicability to high-dimensional data.

  9. Posterior Consistency via Precision Operators for Bayesian Nonparametric Drift Estimation in SDEs.

    Authors: J. H. van Zanten, Y. Pokern, A. M. Stuart
    Subjects: Methodology
    Abstract

    We study a Bayesian approach to nonparametric estimation of the periodic
    drift function of a one-dimensional diffusion from continuous-time data. We
    rewrite the likelihood in terms of Riemann integrals, by introducing the local
    time of the process, and specify a centered Gaussian prior on the drift with a
    precision operator that is of differential form. It is proved that this is a
    conjugate prior for the likelihood and hence that the posterior is also
    Gaussian.

  10. Bayesian filtering for multi-object systems with independently generated observations.

    Authors: Daniel Edward Clark
    Subjects: Methodology
    Abstract

    A general approach for Bayesian filtering of multi-object systems is studied,
    with particular emphasis on the model where each object generates observations
    independently of other objects. The approach is based on variational calculus
    applied to generating functionals, using the general version of Faa di Bruno's
    formula for Gateaux differentials. This result enables us to determine some
    general formulae for the updated generating functional after the application of
    a multi-object analogue of Bayes' rule.

  11. Sign-constrained least squares estimation for high-dimensional regression.

    Authors: Nicolai Meinshausen
    Subjects: Methodology
    Abstract

    Many regularization schemes for high-dimensional regression have been put
    forward. Most require the choice of a tuning parameter, using model selection
    criteria or cross-validation schemes. We show that a simple non-negative or
    sign-constrained least squares is a very simple and effective regularization
    technique for a certain class of high-dimensional regression problems. The sign
    constraint has to be derived via prior knowledge or an initial estimator but no
    further tuning or cross-validation is necessary. The success depends on
    conditions that are easy to check in practice.

  12. The Relationship Among the Standard Deviation of a Dataset and the Standard Deviation and Average Two Part of This Set.

    Authors: Jose Fausto de Morais
    Subjects: Methodology
    Abstract

    Meta-analysis involves combining summary information for related but
    independent studies. It uses different relationship to combine position measure
    as well as dispersion measures. The objective of this study is to discuss a
    relationship among the standard deviation of a data set and the standard
    deviation and mean of two part of this set. The problem was proposed in a
    systematic review with meta-analysis that combined two studies with missing
    data.

  13. Frasian Inference.

    Authors: Larry Wasserman
    Subjects: Methodology
    Abstract

    Don Fraser has given an interesting account of the agreements and
    disagreements between Bayesian posterior probabilities and confidence levels.
    In this comment I discuss some cases where the lack of such agreement is
    extreme. I then discuss a few cases where it is possible to have Bayes
    procedures with frequentist validity. Such frequentist-Bayesian---or
    Frasian---methods deserve more attention [arXiv:1112.5582].

  14. Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A. S. Fraser.

    Authors: Kesar Singh, Minge Xie
    Subjects: Methodology
    Abstract

    Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A.
    S. Fraser [arXiv:1112.5582].

  15. Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A. S. Fraser.

    Authors: Christian P. Robert
    Subjects: Methodology
    Abstract

    Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A.
    S. Fraser [arXiv:1112.5582].

  16. On the Asymptotic Distribution of Variance Weighted KS Statistics.

    Authors: Timothy B. Armstrong
    Subjects: Methodology
    Abstract

    This paper derives the asymptotic distribution of variance weighted
    Kolmogorov-Smirnov statistics for conditional moment inequality models for the
    case of a one dimensional covariate. The asymptotic distribution depends on the
    data generating process only through the variance of a single random variable,
    leading to critical values that can be calculated analytically. By arguments in
    Armstrong (2011b), the resulting tests achieve the best minimax rate for local
    alternatives out of available approaches in a broad class of settings.

  17. A Class Coupler for Perfect Sampling from Continuous Distributions With and Without Atoms.

    Authors: Wenjin Mao, Jem Corcoran
    Subjects: Methodology
    Abstract

    We consider the simulation of distributions that are a mixture of discrete
    and continuous components. We extend a Metropolis-Hastings-based perfect
    sampling algorithm of Corcoran and Tweedie to allow for a broader class of
    transition candidate densities. The resulting algorithm, know as a "class
    coupler", is fast to implement and is applicable to purely discrete or purely
    continuous densities as well. Our work is motivated by the study of a composite
    hypothesis test in a Bayesian setting via posterior simulation and we give
    simulation results for some problems in this area.

  18. Conditional Transformation Models.

    Authors: Peter Bühlmann, Thomas Kneib, Torsten Hothorn
    Subjects: Methodology
    Abstract

    The ultimate goal of regression analysis is to obtain information about the
    conditional distribution of a response given a set of explanatory variables.
    This goal is, however, seldom achieved because most established regression
    models only estimate the conditional mean as a function of the explanatory
    variables and assume that higher moments are not affected by the regressors.
    The underlying reason for such a restriction is the assumption of additivity of
    signal and noise. We propose to relax this common assumption in the framework
    of transformation models.

  19. Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps.

    Authors: Giovanni Montana, Matt Silver
    Subjects: Methodology
    Abstract

    Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within
    biological pathways, the incorporation of prior pathways information into a
    statistical model is expected to increase the power to detect true associations
    in a genetic association study. Most existing pathways-based methods rely on
    marginal SNP statistics and do not fully exploit the dependence patterns among
    SNPs within pathways.

  20. Robust model-based clustering with gene ranking.

    Authors: Ajay Jasra, Giovanni Montana, Alberto Cozzini
    Subjects: Methodology
    Abstract

    Cluster analysis of biological samples using gene expression measurements is
    a common task which aids the discovery of heterogeneous biological
    sub-populations having distinct mRNA profiles. Several model-based clustering
    algorithms have been proposed in which the distribution of gene expression
    values within each sub-group is assumed to be Gaussian. In the presence of
    noise and extreme observations, a mixture of Gaussian densities may over-fit
    and overestimate the true number of clusters.

  21. Dynamic trees for streaming and massive data contexts.

    Authors: Robert B. Gramacy, Christoforos Anagnostopoulos
    Subjects: Methodology
    Abstract

    Data collection at a massive scale is becoming ubiquitous in a wide variety
    of settings, from vast offline databases to streaming real-time information.
    Learning algorithms deployed in such contexts must rely on single-pass
    inference, where the data history is never revisited. In streaming contexts,
    learning must also be temporally adaptive to remain up-to-date against
    unforeseen changes in the data generating mechanism. Although rapidly growing,
    the online Bayesian inference literature remains challenged by massive data and
    transient, evolving data streams.

  22. The logistic conditionals binary family.

    Authors: Christian Schäfer
    Subjects: Methodology
    Abstract

    We discuss a parametric family of binary distributions for modelling and
    sampling high-dimensional binary data with strong dependencies. We extend the
    linear conditionals family proposed by Qaqish (2003) to a non-linear
    conditionals family which we show to encompass every feasible combination of
    mean vector and correlation matrix. We can both sample from this parametric
    family and evaluate its mass function point-wise which allows for immediate use
    in the context of stochastic optimization, importance sampling or Markov chain
    algorithms.

  23. Bootstrapping data arrays of arbitrary order.

    Authors: Art B. Owen, Dean Eckles
    Subjects: Methodology
    Abstract

    In this paper we study a bootstrap strategy for estimating the variance of a
    mean taken over large multifactor crossed random effects data sets. We apply
    bootstrap reweighting independently to the levels of each factor, giving each
    observation the product of independently sampled factor weights. No exact
    bootstrap exists for this problem (McCullagh, 2000). We show that the proposed
    bootstrap is mildly conservative, meaning biased towards overestimating the
    variance, under sufficient conditions that allow very unbalanced and
    heteroscedastic inputs.

  24. Nonparametric estimation of pair-copula constructions with the empirical pair-copula.

    Authors: Johan Segers, Ingrid Hobaek Haff
    Subjects: Methodology
    Abstract

    A pair-copula construction is a decomposition of a multivariate copula into a
    structured system, called regular vine, of bivariate copulae or pair-copulae.
    The standard practice is to model these pair-copulae parametrically, which
    comes at the cost of a large model risk, with errors propagating throughout the
    vine structure. The empirical pair-copula proposed in the paper provides a
    nonparametric alternative still achieving the parametric convergence rate.

  25. Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian Processes.

    Authors: Bin Zhu, David B. Dunson
    Subjects: Methodology
    Abstract

    We propose a nested Gaussian process (nGP) as a locally adaptive prior for
    Bayesian nonparametric regression. Specified through a set of stochastic
    differential equations (SDEs), the nGP imposes a Gaussian process prior for the
    function's $m$th-order derivative. The nesting comes in through including a
    local instantaneous mean function, which is drawn from another Gaussian process
    inducing adaptivity to locally-varying smoothness. We discuss the support of
    the nGP prior in terms of the closure of a reproducing kernel Hilbert space,
    and consider theoretical properties of the posterior.

  26. Mixture Likelihood Ratio Scan Statistic for Disease Outbreak Detection.

    Authors: Michael D. Porter, Jarad B. Niemi, Brian J. Reich
    Subjects: Methodology
    Abstract

    Early detection of disease outbreaks is of paramount importance to
    implementing intervention strategies to mitigate the severity and duration of
    the outbreak. We build methodology that utilizes the characteristic profile of
    disease outbreaks to reduce the time to detection and false positive rate. We
    model daily counts through a Poisson distribution with additive background plus
    outbreak components. The outbreak component has a parametric form with unknown
    underlying parameters. A mixture likelihood ratio scan statistic is developed
    to maximize parameters over a window in time.

  27. The coverage probability of confidence intervals in one-way analysis of covariance after two F tests.

    Authors: Paul Kabaila, Waruni Abeysekera, Oguzhan Yilmaz
    Subjects: Methodology
    Abstract

    Consider a one-way analysis of covariance model. Suppose that the parameter
    of interest theta is a specified linear contrast of the expected responses, for
    a given value of the covariate. Also suppose that the inference of interest is
    a 1-alpha confidence interval for theta. The following two-stage procedure has
    been proposed to determine the form of the model. In Stage 1, we carry out an F
    test of the null hypothesis that the slopes are all zero against the
    alternative hypothesis that they are not all zero.

  28. A consistent multivariate test of association based on ranks of distances.

    Authors: Ruth Heller, Yair Heller, Malka Gorfine
    Subjects: Methodology
    Abstract

    We are concerned with the problem of detecting whether an associations of any
    kind exists between random vectors of any dimension. Few tests of independence
    exist to date that are consistent against all dependent alternatives. We
    propose a powerful test that is applicable in all dimensions, is robust to
    outliers, and is consistent against all alternatives. The test has a simple
    form and is easy to implement. We demonstrate its good power properties in
    simulations and on an example.

  29. Parameter Estimation using Empirical Likelihood combined with Market Information.

    Authors: Zhiliang Ying, Steven Kou, Tony Sit
    Subjects: Methodology
    Abstract

    During the last decade Levy processes with jumps have received increasing
    popularity for modelling market behaviour for both derviative pricing and risk
    management purposes. Chan et al. (2009) introduced the use of empirical
    likelihood methods to estimate the parameters of various diffusion processes
    via their characteristic functions which are readily avaiable in most cases.
    Return series from the market are used for estimation.

  30. On Instrumental Variables Estimation of Causal Odds Ratios.

    Authors: Stijn Vansteelandt, Jack Bowden, Manoochehr Babanezhad, Els Goetghebeur
    Subjects: Methodology
    Abstract

    Inference for causal effects can benefit from the availability of an
    instrumental variable (IV) which, by definition, is associated with the given
    exposure, but not with the outcome of interest other than through a causal
    exposure effect.

  31. Statistical estimation of gap of decomposability of the general poverty index.

    Authors: Gane Samb Lo, Mohamed Cheikh Haidara
    Subjects: Methodology
    Abstract

    For the decomposability property is very a practical one in Welfare analysis,
    most researchers and users favor decomposable poverty indices such as the
    Foster-Greer-Thorbeck poverty index. This may lead to neglect the so important
    weighted indices like the Kakwani and Shorrocks ones which have interesting
    other properties in Welfare analysis.

  32. A Conversation with David R. Brillinger.

    Authors: Victor M. Panaretos
    Subjects: Methodology
    Abstract

    David Ross Brillinger was born on the 27th of October 1937, in Toronto,
    Canada. In 1955, he entered the University of Toronto, graduating with a B.A.
    with Honours in Pure Mathematics in 1959, while also serving as a Lieutenant in
    the Royal Canadian Naval Reserve. He was one of the five winners of the Putnam
    mathematical competition in 1958. He then went on to obtain his M.A. and Ph.D.
    in Mathematics at Princeton University, in 1960 and 1961, the latter under the
    guidance of John W. Tukey.

  33. Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter.

    Authors: Charles E. McCulloch, John M. Neuhaus
    Subjects: Methodology
    Abstract

    Statistical models that include random effects are commonly used to analyze
    longitudinal and correlated data, often with strong and parametric assumptions
    about the random effects distribution. There is marked disagreement in the
    literature as to whether such parametric assumptions are important or
    innocuous.

  34. A clustering algorithm by self-updating process.

    Authors: Shang-Ying Shiu, Ting-Li Chen
    Subjects: Methodology
    Abstract

    We propose a simple and intuitive algorithm for clustering analysis. This
    algorithm stands from the viewpoint of elements to be clustered, and simulates
    the process of how they perform self-clustering. At the end of the process,
    elements belong to the same cluster converge to the same position, which
    represents the cluster's location in a p-dimensional space. The algorithm also
    manages to isolate noise, therefore is able to produce satisfactory clustering
    results even when the level of noise is high enough to obscure or distort the
    underlying patterns in the data.

  35. Respondent-driven Sampling on Directed Networks.

    Authors: Tom Britton, Xin Lu, Fredrik Liljeros, Jens Malmros
    Subjects: Methodology
    Abstract

    Respondent-driven sampling (RDS) is a commonly used substitute for random
    sampling when studying hidden populations, such as injective drug users or men
    who have sex with men, for which no sampling frame is known. The method works
    like a snowball sample but can, given that some assumptions are met, generate
    unbiased population estimates. One key assumption, not likely to be met, is
    that the acquaintance network in which the recruitment process takes place is
    undirected, meaning that all recruiters should have the potential to be
    recruited by the person they recruit.

  36. Bayesian hierarchical modeling of simply connected 2D shapes.

    Authors: David B. Dunson, Debdeep Pati, Kelvin Gu
    Subjects: Methodology
    Abstract

    Models for distributions of shapes contained within images can be widely used
    in biomedical applications ranging from tumor tracking for targeted radiation
    therapy to classifying cells in a blood sample. Our focus is on hierarchical
    probability models for the shape and size of simply connected 2D closed curves,
    avoiding the need to specify landmarks through modeling the entire curve while
    borrowing information across curves for related objects.

  37. A Bias-reduced Estimator for the Mean of a Heavy-tailed Distribution with an Infinite Second Moment.

    Authors: Djamel Meraghni, Abdelhakim Necir, Brahim Brahimi, Djabrane Yahia
    Subjects: Methodology
    Abstract

    We use bias-reduced estimators of high quantiles, of heavy-tailed
    distributions, to introduce a new estimator of the mean in the case of infinite
    second moment. The asymptotic normality of the proposed estimator is
    established and checked, in a simulation study, by four of the most popular
    goodness-of-fit tests for different sample sizes. Moreover, we compare, in
    terms of bias and mean squared error, our estimator with Peng's estimator
    (Peng, 2001) and we evaluate the accuracy of some resulting confidence
    intervals.

  38. Conditional inference with a complex sampling: exact computations and Monte Carlo estimations.

    Authors: François Coquet, Éric Lesage
    Subjects: Methodology
    Abstract

    In survey statistics, the usual technique for estimating a population total
    consists in summing appropriately weighted variable values for the units in the
    sample. Different weighting systems exit: sampling weights, GREG weights or
    calibration weights for example. In this article, we propose to use the inverse
    of conditional inclusion probabilities as weighting system. We study examples
    where an auxiliary information enables to perform an a posteriori
    stratification of the population. We show that, in these cases, exact
    computations of the conditional weights are possible.

  39. Testing the significance of assuming homogeneity in contingency-tables/cross-tabulations.

    Authors: Mark Tygert
    Subjects: Methodology
    Abstract

    The model for homogeneity of proportions in a two-way
    contingency-table/cross-tabulation is the same as the model of independence,
    except that the probabilistic process generating the data is viewed as fixing
    the column totals (but not the row totals).

  40. An introduction to how chi-square and classical exact tests often wildly misreport significance and how the remedy lies in computers.

    Authors: Mark Tygert, Rachel Ward, William Perkins
    Subjects: Methodology
    Abstract

    Goodness-of-fit tests based on the Euclidean distance often outperform
    chi-square and other classical tests (including the standard exact tests) by at
    least an order of magnitude when the model being tested for goodness-of-fit is
    a discrete probability distribution that is not close to uniform. The present
    article discusses numerous examples of this.

  41. Rejoinder to "Feature Matching in Time Series Modeling".

    Authors: Yingcun Xia, Howell Tong
    Subjects: Methodology
    Abstract

    Rejoinder to "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong
    [arXiv:1104.3073]

  42. Efficient Estimation of Nonlinear Finite Population Parameters Using Nonparametrics.

    Authors: Camelia Goga, Anne Ruiz-Gazen
    Subjects: Methodology
    Abstract

    Nowadays, the high-precision estimation of nonlinear parameters such as
    quantiles, Gini indices or other measures of inequality is particularly
    crucial. In the present paper, we propose a general class of estimators for
    such parameters that take into account complete univariate auxiliary
    information. We construct unique survey weights through a nonparametric
    model-assisted approach that can be used by means of the plugg-in principle to
    estimate the nonlinear parameters.

  43. Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong.

    Authors: Qiwei Yao
    Subjects: Methodology
    Abstract

    Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
    Tong [arXiv:1104.3073]

  44. Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong.

    Authors: Edward L. Ionides
    Subjects: Methodology
    Abstract

    Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
    Tong [arXiv:1104.3073]

  45. Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong.

    Authors: Ruey S. Tsay, Kung-Sik Chan
    Subjects: Methodology
    Abstract

    Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
    Tong [arXiv:1104.3073]

  46. Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong.

    Authors: Bruce E. Hansen
    Subjects: Methodology
    Abstract

    Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
    Tong [arXiv:1104.3073]

  47. Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation".

    Authors: Arnaud Doucet, Sumeetpal S. Singh, Christian P. Robert, Nicolas Chopin, Jean-Michel Marin, Julien Cornebise, Ioannis Kosmidis, Christophe Andrieu, Pierre Pudlo, Ajay Jasra, Anthony Lee, Simon Barthelme, Mark Girolami, Mohammed Sedki.
    Subjects: Methodology
    Abstract

    This report is a collection of comments on the Read Paper of Fearnhead and
    Prangle (2011), to appear in the Journal of the Royal Statistical Society
    Series B, along with a reply from the authors.

  48. A Generic Dynamic Emulator.

    Authors: Carlo Albert
    Subjects: Methodology
    Abstract

    In applied sciences, we often deal with deterministic simulation models that
    are too slow for simulation-intensive tasks such as calibration or real-time
    control. In this paper, an emulator for a generic dynamic model, given by a
    system of ordinary non-linear differential equations, is developed. The
    non-linear differential equations are linearized and Gaussian white noise is
    added to account for the non-linearities. The resulting linear stochastic
    system is conditioned on a set of solutions of the non-linear equations that
    have been calculated prior to the emulation.

  49. An adaptive sequential optimum design for model selection and parameter estimation in non-linear nested models.

    Authors: Caterina May, Chiara Tommasi
    Subjects: Methodology
    Abstract

    This paper has been withdrawn by the author because it has been substantially
    modified.

  50. Approximate Bayesian computation and Bayes linear analysis: Towards high-dimensional ABC.

    Authors: Y. Fan, S. A. Sisson, D. J. Nott, L. Marshall
    Subjects: Methodology
    Abstract

    Bayes linear analysis and approximate Bayesian computation (ABC) are
    techniques commonly used in the Bayesian analysis of complex models. In this
    article we connect these ideas by demonstrating that regression-adjustment ABC
    algorithms produce samples for which first and second order moment summaries
    approximate adjusted expectation and variance for a Bayes linear analysis. This
    gives regression-adjustment methods a useful interpretation and role in
    exploratory analysis in high-dimensional problems.

  51. False discovery rate controlling procedures for discrete tests.

    Authors: Ruth Heller, Hadas Gur
    Subjects: Methodology
    Abstract

    Benjamini and Hochberg (1995) proposed the false discovery rate (FDR) as an
    alternative to the FWER in multiple testing problems, and proposed a procedure
    to control the FDR. For discrete data this procedure may be highly
    conservative. We investigate alternative, more powerful, procedures that
    exploit the discreteness of the tests and have FDR levels closer in magnitude
    to the desired nominal level. Moreover, we develop a novel step-down procedure
    that dominates the step-down procedure of Benjamini and Liu (1999) for discrete
    data.

  52. Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation.

    Authors: S. N. Lahiri, S. Mukhopadhyay, Emanuel Parzen
    Subjects: Methodology
    Abstract

    This paper outlines a uni?ed framework for high dimensional variable
    selection for classification problems. Traditional approaches to ?nding
    interesting variables mostly utilize only partial information through moments
    (like mean difference). On the contrary, in this paper we address the question
    of variable selection in full generality from a distributional point of view.
    If a variable is not important for classification, then it will have similar
    distributional aspect under different classes.

  53. Random Differential Privacy.

    Authors: Larry Wasserman, Alessandro Rinaldo, Rob Hall
    Subjects: Methodology
    Abstract

    We propose a relaxed privacy definition called {\em random differential
    privacy} (RDP). Differential privacy requires that adding any new observation
    to a database will have small effect on the output of the data-release
    procedure. Random differential privacy requires that adding a {\em randomly
    drawn new observation} to a database will have small effect on the output. We
    show an analog of the composition property of differentially private procedures
    which applies to our new definition.

  54. Autoregressive model selection with simultaneous sparse coefficient estimation.

    Authors: Yan Sun, Hailin Sang
    Subjects: Methodology
    Abstract

    In this paper we propose a sparse coefficient estimation procedure for
    autoregressive (AR) models based on penalized conditional maximum likelihood.
    The penalized conditional maximum likelihood estimator (PCMLE) thus developed
    has the advantage of performing simultaneous coefficient estimation and model
    selection. Mild conditions are given on the penalty function and the innovation
    process, under which the PCMLE satisfies a strong consistency, local $N^{-1/2}$
    consistency, and oracle property, respectively, where N is sample size.

  55. Signal Identification for Rare and Weak Features: Higher Criticism or False Discovery Rates?.

    Authors: Korbinian Strimmer, Bernd Klaus
    Subjects: Methodology
    Abstract

    Signal identification in large-dimensional settings is a challenging problem
    in biostatistics. Recently, the method of higher criticism (HC) was shown to be
    an effective means for determining appropriate decision thresholds. Here, we
    study HC from a false discovery rate (FDR) perspective. We show that the HC
    threshold is best viewed as an approximation to a natural Bayesian decision
    threshold which in turn is expressible as a specific FDR threshold.

  56. A Sparse SVD Method for High-dimensional Data.

    Authors: Zongming Ma, Dan Yang, Andreas Buja
    Subjects: Methodology
    Abstract

    We present a new computational approach to approximating a large, noisy data
    table by a low-rank matrix with sparse singular vectors. The approximation is
    obtained from thresholded subspace iterations that produce the singular vectors
    simultaneously, rather than successively as in competing proposals. We
    introduce novel ways to estimate thresholding parameters which obviate the need
    for computationally expensive cross-validation.

  57. The Variational Garrote.

    Authors: H. J. Kappen
    Subjects: Methodology
    Abstract

    In this paper, I present a new solution method for sparse regression using L0
    regularization. The model introduces a sparseness mechanism in the likelihood,
    instead of in the prior, as is done in the spike and slab model. The posterior
    probability is computed in the variational approximation. The variational
    parameters appear in the approximate model in a way that is similar to
    Breiman's Garrote model. I refer to this method as the variational Garrote
    (VG). The VG is compared numerically with the Lasso method and with ridge
    regression.

  58. Functional modelling of microarray time series with covariate curves.

    Authors: Maurice Berk, Giovanni Montana
    Subjects: Methodology
    Abstract

    In this paper we have demonstrated a complete framework for the analysis of
    microarray time series data. The unique characteristics of microarry data lend
    themselves well to a functional data analysis approach and we have shown how
    this naturally extends to the inclusion of covariates such as age and sex.

  59. Robust detection of exotic infectious diseases in animal herds: A comparative study of three decision methodologies under severe uncertainty.

    Authors: Matthias C. M. Troffaes, John Paul Gosling
    Subjects: Methodology
    Abstract

    When animals are transported and pass through customs, some of them may have
    dangerous infectious diseases. Typically, due to the cost of testing, not all
    animals are tested: a reasonable selection must be made. How to test
    effectively, yet avoid cataclysmic events?

  60. On best subset regression.

    Authors: Shifeng Xiong
    Subjects: Methodology
    Abstract

    In this paper we discuss the variable selection method from \ell0-norm
    constrained regression, which is equivalent to the problem of finding the best
    subset of a fixed size. Our study focuses on two aspects, consistency and
    computation. We prove that the sparse estimator from such a method can retain
    all of the important variables asymptotically for even exponentially growing
    dimensionality under regularity conditions.

  61. Estimation and inference for high-dimensional non-sparse models.

    Authors: Lu Lin, Lixing Zhu, Yujie Gai
    Subjects: Methodology
    Abstract

    To successfully work on variable selection, sparse model structure has become
    a basic assumption for all existing methods. However, this assumption is
    questionable as it is hard to hold in most of cases and none of existing
    methods may provide consistent estimation and accurate model prediction in
    nons-parse scenarios.

  62. Methods to distinguish between polynomial and exponential tails.

    Authors: Joan del Castillo, Jalila Daoudi, Richard Lockhart
    Subjects: Methodology
    Abstract

    In this article two methods to distinguish between polynomial and exponential
    tails are introduced. The methods are mainly based on the properties of the
    residual coefficient of variation for the exponential and non-exponential
    distributions. A graphical method, called CV-plot, shows departures from
    exponentiality in the tails. It is, in fact, the empirical coefficient of
    variation of the conditional excedance over a threshold. The plot is applied to
    the daily log-returns of exchange rates of US dollar and Japan yen.

  63. A Skew-t-Normal Multi-Level Reduced-Rank Functional PCA Model with Applications to Replicated `Omics Time Series Data Sets.

    Authors: Maurice Berk, Giovanni Montana
    Subjects: Methodology
    Abstract

    A powerful study design in the fields of genomics and metabolomics is the
    'replicated time course experiment' where individual time series are observed
    for a sample of biological units, such as human patients, termed replicates.
    Standard practice for analysing these data sets is to fit each variable (e.g.
    gene transcript) independently with a functional mixed-effects model to account
    for between-replicate variance. However, such an independence assumption is
    biologically implausible given that the variables are known to be highly
    correlated.

  64. Extended Generalised Pareto Models for Tail Estimation.

    Authors: Ioannis Papastathopoulos, Jonathan A. Tawn
    Subjects: Methodology
    Abstract

    The most popular approach in extreme value statistics is the modelling of
    threshold exceedances using the asymptotically motivated generalised Pareto
    distribution. This approach involves the selection of a high threshold above
    which the model fits the data well. Sometimes, few observations of a
    measurement process might be recorded in applications and so selecting a high
    quantile of the sample as the threshold leads to almost no exceedances.

  65. Semiparametric Time Series Models with Log-concave Innovations.

    Authors: Yining Chen
    Subjects: Methodology
    Abstract

    We study a class of semiparametric time series models with innovations
    following a log-concave distribution. We propose a general maximum likelihood
    framework which allows us to estimate simultaneously the parameters of a model
    and the density of the innovations. This framework can be easily adapted to
    many well-known models, including ARMA and GARCH. Furthermore, we show that the
    estimator under our new framework is consistent in both ARMA and GARCH
    settings.

  66. On Measure Transformed Canonical Correlation Analysis.

    Authors: Alfred O. Hero, Koby Todros
    Subjects: Methodology
    Abstract

    In this paper linear canonical correlation analysis (LCCA) is generalized by
    applying a structured transform to the joint probability distribution of the
    considered pair of random vectors, i.e., a transformation of the joint
    probability measure defined on their joint observation space. This framework,
    called measure transformed canonical correlation analysis (MTCCA), applies LCCA
    to the data after transformation of the joint probability measure.

  67. Resolving conflicts between statistical methods by probability combination: Application to empirical Bayes analyses of genomic data.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    In the typical analysis of a data set, a single method is selected for
    statistical reporting even when equally applicable methods yield very different
    results. Examples of equally applicable methods can correspond to those of
    different ancillary statistics in frequentist inference and of different prior
    distributions in Bayesian inference. More broadly, choices are made between
    parametric and nonparametric methods and between frequentist and Bayesian
    methods.

  68. Bandwidth Selection for Weighted Kernel Density Estimation.

    Authors: Bin Wang, Xiaofeng Wang
    Subjects: Methodology
    Abstract

    In the this paper, the authors propose to estimate the density of a targeted
    population with a weighted kernel density estimator (wKDE) based on a weighted
    sample. Bandwidth selection for wKDE is discussed. Three mean integrated
    squared error based bandwidth estimators are introduced and their performance
    is illustrated via Monte Carlo simulation. The least-squares cross-validation
    method and the adaptive weight kernel density estimator are also studied.

  69. Error and Inference: an outsider stand on a frequentist philosophy.

    Authors: Christian P. Robert
    Subjects: Methodology
    Abstract

    This note is an extended review of the book Error and Inference, edited by
    Deborah Mayo and Aris Spanos, about their frequentist and philosophical
    perspective on testing of hypothesis and on the criticisms of alternatives like
    the Bayesian approach.

  70. Bootstrap Inference for Network Construction.

    Authors: Pei Wang, Li Hsu, Jie Peng, Shuang Li
    Subjects: Methodology
    Abstract

    Regularization techniques are widely used for tackling
    high-dimension-low-sample-size problems. Yet, finding the right amount of
    regularization can be challenging, especially in the unsupervised setting such
    as structure learning problems where traditional methods such as BIC or
    cross-validation often do not work well. In this paper, we propose a new method
    --- Bootstrap Inference for Network COnstruction (BINCO) --- to infer networks
    by directly controlling the false discovery rates (FDRs) of the selected edges.
    This method utilizes the idea of model aggregation.

  71. On the Generalized Hill Process for Small Parameters and Applications.

    Authors: El Hadji Deme, Gane Samb Lo, Aliou Diop
    Subjects: Methodology
    Abstract

    Let $X_{1},X_{2},...$ be a sequence of independent copies (s.i.c) of a real
    random variable (r.v.) $X\geq 1$, with distribution function $df$
    $F(x)=\mathbb{P}% (X\leq x)$ and let $X_{1,n}\leq X_{2,n} \leq ... \leq
    X_{n,n}$ be the order statistics based on the $n\geq 1$ first of these
    observations.

  72. Structured Sparse Aggregation.

    Authors: Daniel Percival
    Subjects: Methodology
    Abstract

    We introduce a method for aggregating many least squares estimator so that
    the resulting estimate has two properties: sparsity and structure. That is,
    only a few candidate covariates are used in the resulting model, and the
    selected covariates follow some structure over the candidate covariates that is
    assumed to be known a priori. While sparsity is well studied in many settings,
    including aggregation, structured sparse methods are still emerging.

  73. Sparse Group Selection Through Co-Adaptive Penalties.

    Authors: Zhou Fang
    Subjects: Methodology
    Abstract

    Recent work has focused on the problem of conducting linear regression when
    the number of covariates is very large, potentially greater than the sample
    size. To facilitate this, one useful tool is to assume that the model can be
    well approximated by a fit involving only a small number of covariates -- a so
    called sparsity assumption, which leads to the Lasso and other methods.

  74. On the Pickands stochastic process.

    Authors: Gane Samb Lo, Adja Mbarka Fall
    Subjects: Methodology
    Abstract

    We consider the Pickands process {equation*} P_{n}(s)=\log (1/s)^{-1}\log
    \frac{X_{n-k+1,n}-X_{n-[k/s]+1,n}}{% X_{n-[k/s]+1,n}-X_{n-[k/s^{2}]+1,n}},
    {equation*} {equation*} (\frac{k}{n}\leq s^2 \leq 1), {equation*} which is a
    generalization of the classical Pickands estimate $P_{n}(1/2)$ of the extremal
    index. We undertake here a purely stochastic process view for the asymptotic
    theory of that process by using the
    Cs\"{o}rg\H{o}-Cs\"{o}rg\H{o}-Horv\'{a}th-Mason (1986) \cite{cchm} weighted
    approximation of the empirical and quantile processes to suitable Brownian
    bridges.

  75. Conditional Modeling and the Jitter Method of Spike Re-sampling: Supplement.

    Authors: Matthew T. Harrison, Asohan Amarasingham, Nicholas G. Hatsopoulos, Stuart Geman
    Subjects: Methodology
    Abstract

    This technical report accompanies the manuscript "Conditional Modeling and
    the Jitter Method of Spike Re-sampling." It contains further details, comments,
    references, and equations concerning various simulations and data analyses
    presented in that manuscript, as well as a self-contained Mathematical Appendix
    that provides a formal treatment of jitter-based spike re-sampling methods.

  76. Joint Modeling of Multiple Related Time Series via the Beta Process.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    We propose a Bayesian nonparametric approach to the problem of jointly
    modeling multiple related time series. Our approach is based on the discovery
    of a set of latent, shared dynamical behaviors. Using a beta process prior, the
    size of the set and the sharing pattern are both inferred from data. We develop
    efficient Markov chain Monte Carlo methods based on the Indian buffet process
    representation of the predictive distribution of the beta process, without
    relying on a truncated model.

  77. A functional Generalized Hill process and applications.

    Authors: Lo Gane Samb, El Hadji Deme
    Subjects: Methodology
    Abstract

    We are concerned in this paper with the functional asymptotic behaviour of
    the sequence of stochastic processes T_{n}(f)=\sum_{j=1}^{j=k}f(j)(\log
    X_{n-j+1,n}-\log X_{n-j,n}), indexed by some classes $\mathcal{F}$ of functions
    $f:\mathbb{N} \backslash {0} \longmapsto \mathbb{R}_{+}$ and where $k=k(n)$
    satisfies 1\leq k\leq n,k/n\rightarrow 0\text{as}n\rightarrow \infty. This is a
    functional generalized Hill process including as many new estimators of the
    extremal index when $F$ is in the extremal domain.

  78. Extreme value analysis of actuarial risks: estimation and model validation.

    Authors: Holger Drees
    Subjects: Methodology
    Abstract

    We give an overview of several aspects arising in the statistical analysis of
    extreme risks with actuarial applications in view. In particular it is
    demonstrated that empirical process theory is a very powerful tool, both for
    the asymptotic analysis of extreme value estimators and to devise tools for the
    validation of the underlying model assumptions. While the focus of the paper is
    on univariate tail risk analysis, the basic ideas of the analysis of the
    extremal dependence between different risks are also outlined.

  79. Issues in designing hybrid algorithms.

    Authors: Christian Robert, Kerrie Mengersen, Jeong Lee, Ross McVinish
    Subjects: Methodology
    Abstract

    In the Bayesian community, an ongoing imperative is to develop efficient
    algorithms. An appealing approach is to form a hybrid algorithm by combining
    ideas from competing existing techniques. This paper addresses issues in
    designing hybrid methods by considering selected case studies: the delayed
    rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin
    algorithm, and the population Monte Carlo algorithm. We observe that even if
    each component of a hybrid algorithm has individual strengths, they may not
    contribute equally or even positively when they are combined.

  80. Modelling sources of ecological fallacy within a revised Brown and Payne model of voting transitions.

    Authors: Antonio Forcina
    Subjects: Methodology
    Abstract

    We present a model of voting behaviour based on a version of aggregated
    overdispersed multinomial distributions; relative to a similar model by
    \citet{BP86}, our model is based on more realistic assumptions and free from
    certain shortcomings of the previous model.

  81. Further properties of frequentist confidence intervals in regression that utilize uncertain prior information.

    Authors: Paul Kabaila, Khageswor Giri
    Subjects: Methodology
    Abstract

    Consider a linear regression model with n-dimensional response vector,
    regression parameter \beta = (\beta_1, ..., \beta_p) and independent and
    identically N(0, \sigma^2) distributed errors. Suppose that the parameter of
    interest is \theta = a^T \beta where a is a specified vector. Define the
    parameter \tau = c^T \beta - t where c and t are specified and a and c are
    linearly independent. Also suppose that we have uncertain prior information
    that \tau = 0.

  82. Estimating the evidence -- a review.

    Authors: Nial Friel, Jason Wyse
    Subjects: Methodology
    Abstract

    The model evidence is a vital quantity in the comparison of statistical
    models under the Bayesian paradigm. This paper presents a review of commonly
    used methods. We outline some guidelines and offer some practical advice. The
    reviewed methods are compared for two examples; non-nested Gaussian linear
    regression and covariate subset selection in logistic regression.

  83. Combinatorial clustering and the beta negative binomial process.

    Authors: Michael I. Jordan, Tamara Broderick, Lester Mackey, John Paisley
    Subjects: Methodology
    Abstract

    In this work, we establish novel connections between the Bayesian
    nonparametric clustering and featural paradigms by considering the problem of
    admixture modeling. We examine the Dirichlet process-and its unnormalized
    Poisson point process generation via the gamma process-on the traditional
    clustering side of Bayesian nonparametrics. On the featural side, we examine
    the beta process and introduce a new model, the beta negative binomial process
    (BNBP), for admixture modeling.

  84. Inherent Difficulties of Non-Bayesian Likelihood-based Inference, as Revealed by an Examination of a Recent Book by Aitkin.

    Authors: Judith Rousseau, Christian P. Robert, Andrew Gelman
    Subjects: Methodology
    Abstract

    For many decades, statisticians have made attempts to prepare the Bayesian
    omelette without breaking the Bayesian eggs; that is, to obtain probabilistic
    likelihood-based inferences without relying on informative prior distributions.
    A recent example is Murray Aitkin's recent book, {\em Statistical Inference},
    which presents an approach to statistical hypothesis testing based on
    comparisons of posterior distributions of likelihoods under competing models.
    Aitkin develops and illustrates his method using some simple examples of
    inference from iid data and two-way tests of independence.

  85. Bayesian Methods for Genetic Association Analysis with Heterogeneous Subgroups: from Meta-Analyses to Gene-Environment Interactions.

    Authors: Xiaoquan Wen, Matthew Stephens
    Subjects: Methodology
    Abstract

    In genetic association analyses, it is often desired to analyze data from
    multiple potentially-heterogeneous subgroups. The amount of expected
    heterogeneity can vary from modest (as might typically be expected in a
    meta-analysis of multiple studies of the same phenotype, for example), to large
    (e.g. a strong gene-environment interaction, where the environmental exposure
    defines discrete subgroups). Here, we consider a flexible set of Bayesian
    models and priors that can capture these different levels of heterogeneity.

  86. High Dimensional Low Rank and Sparse Covariance Matrix Estimation via Convex Minimization.

    Authors: Xi Luo
    Subjects: Methodology
    Abstract

    This paper introduces a general framework of covariance structures that can
    be verified in many popular statistical models, such as factor and random
    effect models. The new structure is a summation of low rank and sparse
    matrices. We propose a LOw Rank and sparsE Covariance estimator (LOREC) to
    exploit this general structure in the high-dimensional setting. Analysis of
    this estimator shows that it recovers exactly the rank and support of the two
    components respectively. Convergence rates under various norms are also
    presented.

  87. The Average Likelihood Ratio for Large-scale Multiple Testing and Detecting Sparse Mixtures.

    Authors: Guenther Walther
    Subjects: Methodology
    Abstract

    Large-scale multiple testing problems require the simultaneous assessment of
    many p-values. This paper compares several methods to assess the evidence in
    multiple binomial counts of p-values: the maximum of the binomial counts after
    standardization (the `higher-criticism statistic'), the maximum of the binomial
    counts after a log-likelihood ratio transformation (the `Berk-Jones
    statistic'), and a newly introduced average of the binomial counts after a
    likelihood ratio transformation.

  88. The joint graphical lasso for inverse covariance estimation across multiple classes.

    Authors: Pei Wang, Patrick Danaher, Daniela M. Witten
    Subjects: Methodology
    Abstract

    We consider the problem of estimating multiple related but distinct graphical
    models on the basis of a high-dimensional data set with observations that
    belong to distinct classes. A motivating example occurs in the analysis of gene
    expression data for tissue samples with and without cancer. In this case, we
    might wish to estimate a gene expression network for the normal tissue and a
    gene expression network for the tumor tissue.

  89. Bayesian Gaussian Copula Factor Models for Mixed Data.

    Authors: David B. Dunson, Lawrence Carin, Joseph E. Lucas, Jared S. Murray
    Subjects: Methodology
    Abstract

    Gaussian factor models have proven widely useful for parsimoniously
    characterizing dependence in multivariate data. There is a rich literature on
    their extension to mixed categorical and continuous variables, using latent
    Gaussian variables or through generalized latent trait models acommodating
    measurements in the exponential family. However, when generalizing to
    non-Gaussian measured variables the latent variables typically influence both
    the dependence structure and the form of the marginal distributions,
    complicating interpretation and introducing artifacts.

  90. Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease.

    Authors: Murali Haran, Roman Jandarov, Ottar Bjørnstad, Bryan Grenfell
    Subjects: Methodology
    Abstract

    Extremely contagious, acute, immunizing childhood infections like measles can
    exhibit spatiotemporal dynamics that depend on the nature of spatial contagion
    and spatiotemporal variations in population structure and demography. We study
    a metapopulation model for regional measles dynamics that uses a gravity
    coupling model and a time series susceptible- infected-recovered (TSIR) model
    for local dynamics.

  91. Consistent estimation of a mean pattern in deformable models for high-dimensional shape analysis.

    Authors: Jérémie Bigot, Benjamin Charlier
    Subjects: Methodology
    Abstract

    We consider the problem of estimating a mean shape from a set of J planar
    configurations described by a sequence of k landmarks. We study the consistency
    of a smoothed Procrustean mean when the observations obey a deformable model
    including some nuisance parameters such as random translations, rotations and
    scaling. The main contribution of the paper is to analyze the influence of the
    dimension k of the data and of the number J of observed configurations on the
    convergence of the smoothed Procrustean estimator to the mean pattern of the
    model.

  92. Bivariate Instantaneous Frequency and Bandwidth.

    Authors: Sofia C. Olhede, Jonathan M. Lilly
    Subjects: Methodology
    Abstract

    The generalizations of instantaneous frequency and instantaneous bandwidth to
    a bivariate signal are derived. These are uniquely defined whether the signal
    is represented as a pair of real-valued signals, or as one analytic and one
    anti-analytic signal. A nonstationary but oscillatory bivariate signal has a
    natural representation as an ellipse whose properties evolve in time, and this
    representation provides a simple geometric interpretation for the bivariate
    instantaneous moments.

  93. Testing over a continuum of null hypotheses.

    Authors: Gilles Blanchard, Etienne Roquain, Sylvain Delattre
    Subjects: Methodology
    Abstract

    We introduce a theoretical framework for performing statistical hypothesis
    testing simultaneously over a fairly general, possibly uncountably infinite,
    set of null hypotheses. This extends the standard statistical setting for
    multiple hypotheses testing, which is restricted to a finite set. This work is
    motivated by numerous modern applications where the observed signal is modeled
    by a stochastic process over a continuum. As a measure of type I error, we
    extend the concept of false discovery rate (FDR) to this setting.

  94. Multi-Domain Sampling With Applications to Structural Inference of Bayesian Networks.

    Authors: Qing Zhou
    Subjects: Methodology
    Abstract

    When a posterior distribution has multiple modes, unconditional expectations,
    such as the posterior mean, may not offer informative summaries of the
    distribution. Motivated by this problem, we propose to decompose the sample
    space of a multimodal distribution into domains of attraction of local modes.
    Domain-based representations are defined to summarize the probability masses of
    and conditional expectations on domains of attraction, which are much more
    informative than the mean and other unconditional expectations.

  95. Minimum penalized Hellinger distance for model selection in small samples.

    Authors: Papa Ngom, Bertrand Ntep
    Subjects: Methodology
    Abstract

    In statistical modeling area, the Akaike information criterion AIC, is a
    widely known and extensively used tool for model choice. The {\phi}-divergence
    test statistic is a recently developed tool for statistical model selection.
    The popularity of the divergence criterion is however tempered by their known
    lack of robustness in small sample. In this paper the penalized minimum
    Hellinger distance type statistics are considered and some properties are
    established.

  96. Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data.

    Authors: Cun-Hui Zhang, Stephanie S. Zhang
    Subjects: Methodology
    Abstract

    The purpose of this paper is to propose methodologies for statistical
    inference of low-dimensional parameters with high-dimensional data. We focus on
    constructing confidence intervals for individual coefficients and linear
    combinations of several of them in a linear regression model, although our
    ideas are applicable in a much broad context. The theoretical results presented
    here provide sufficient conditions for the asymptotic normality of the proposed
    estimators along with a consistent estimator for their finite-dimensional
    covariance matrices.

  97. Basic statistics for probabilistic symbolic variables: a novel metric-based approach.

    Authors: Antonio Irpino, Rosanna Verde
    Subjects: Methodology
    Abstract

    In data mining, it is usually to describe a set of individuals using some
    summaries (means, standard deviations, histograms, confidence intervals) that
    generalize individual descriptions into a typology description. In this case,
    data can be described by several values. In this paper, we propose an approach
    for computing basic statics for such data, and, in particular, for data
    described by numerical multi-valued variables (interval, histograms, discrete
    multi-valued descriptions). We propose to treat all numerical multi-valued
    variables as distributional data, i.e.

  98. Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood's block gradient.

    Authors: Sylvain Sardy
    Subjects: Methodology
    Abstract

    The proposed smooth blockwise iterative thresholding estimator (SBITE) is a
    model selection technique defined as a fixed point reached by iterating a
    likelihood gradient-based thresholding function. The smooth James-Stein
    thresholding function has two regularization parameters $\lambda$ and $\nu$,
    and a smoothness parameter $s$. It enjoys smoothness like ridge regression and
    selects variables like lasso.

  99. Point process modeling for directed interaction networks.

    Authors: Patrick J. Wolfe, Patrick O. Perry
    Subjects: Methodology
    Abstract

    Network data often take the form of repeated interactions between senders and
    receivers tabulated over time. A primary question to ask of such data is which
    traits and behaviors are predictive of interaction. To answer this question, a
    model is introduced for treating directed interactions as a multivariate point
    process: a Cox multiplicative intensity model using covariates that depend on
    the history of the process.

  100. Properties and applications of Fisher distribution on the rotation group.

    Authors: Akimichi Takemura, Tomonari Sei, Nobuki Takayama, Katsuyoshi Ohara, Hiroki Shibata
    Subjects: Methodology
    Abstract

    We study properties of Fisher distribution (von Mises-Fisher distribution,
    matrix Langevin distribution) on the rotation group SO(3). In particular we
    apply the holonomic gradient descent, introduced by Nakayama et al. (2011), and
    a method of series expansion for evaluating the normalizing constant of the
    distribution and for computing the maximum likelihood estimate. The rotation
    group can be identified with the Stiefel manifold of two orthonormal vectors.
    Therefore from the viewpoint of statistical modeling, it is of interest to
    compare Fisher distributions on these manifolds.

  101. Introduction.

    Authors: Eric Slud, P. Lahiri
    Subjects: Methodology
    Abstract

    The Statistics Consortium at the University of Maryland, College Park, hosted
    a two-day workshop on Bayesian Methods that Frequentists Should Know during
    April 30--May 1, 2008. The event was co-sponsored by the Institute of
    Mathematical Statistics (IMS), Office of Research and Methodology, National
    Center for Health Statistics, Survey Research Methods Section (SRMS) of the
    American Statistical Association, and Washington Statistical Society.

  102. Log-mean linear models for binary data.

    Authors: Alberto Roverato, Monia Lupparelli, Luca La Rocca
    Subjects: Methodology
    Abstract

    This paper is devoted to the theory and application of a novel class of
    models for binary data, which we call log-mean linear (LML) models. The
    characterizing feature of these models is that they are specified by linear
    constraints on the LML parameter, defined as a log-linear expansion of the mean
    parameter of the multivariate Bernoulli distribution. We show that marginal
    independence relationships between variables can be specified by setting
    certain LML interactions to zero and, more specifically, that graphical models
    of marginal independence are LML models.

  103. Robust Parametric Classification and Variable Selection by a Minimum Distance Criterion.

    Authors: Eric C. Chi, David W. Scott
    Subjects: Methodology
    Abstract

    We investigate a robust penalized logistic regression algorithm based on a
    minimum distance criterion. Influential outliers are often associated with the
    explosion of parameter vector estimates, but in the context of standard
    logistic regression, the bias due to outliers always causes the parameter
    vector to implode, that is shrink towards the zero vector. Thus, using
    LASSO-like penalties to perform variable selection in the presence of outliers
    can result in missed detections of relevant covariates.

  104. Sparse Choice Models.

    Authors: Devavrat Shah, Vivek F. Farias, Srikanth Jagabathula
    Subjects: Methodology
    Abstract

    Choice models, which capture popular preferences over objects of interest,
    play a key role in making decisions whose eventual outcome is impacted by human
    choice behavior. In most scenarios, the choice model, which can effectively be
    viewed as a distribution over permutations, must be learned from observed data.
    The observed data, in turn, may frequently be viewed as (partial, noisy)
    information about marginals of this distribution over permutations.

  105. A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data.

    Authors: Eric D. Kolaczyk, Qi Ding
    Subjects: Methodology
    Abstract

    Random projection is widely used as a method of dimension reduction. In
    recent years, its combination with standard techniques of regression and
    classification has been explored. Here we examine its use with principal
    component analysis (PCA) and subspace detection methods. Specifically, we show
    that, under appropriate conditions, with high probability the magnitude of the
    residuals of a PCA analysis of randomly projected data behaves comparably to
    that of the residuals of a similar PCA analysis of the original data.

  106. Default Bayesian analysis for multi-way tables: a data-augmentation approach.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Methodology
    Abstract

    This paper proposes a strategy for regularized estimation in multi-way
    contingency tables, which are common in meta-analyses and multi-center clinical
    trials. Our approach is based on data augmentation, and appeals heavily to a
    novel class of Polya-Gamma distributions. Our main contributions are to build
    up the relevant distributional theory and to demonstrate three useful features
    of this data-augmentation scheme.

  107. The effect of a preliminary test of homogeneity of stratum-specific odds ratios on confidence intervals for these odds ratios.

    Authors: Paul Kabaila, Dilshani Tissera
    Subjects: Methodology
    Abstract

    Consider a case-control study in which the aim is to assess the effect of a
    factor on disease occurrence. We suppose that this factor is dichotomous. Also
    suppose that the data consists of two strata, each stratum summarized by a
    two-by-two table. A commonly-proposed two-stage analysis of this type of data
    is the following. We carry out a preliminary test of homogeneity of the
    stratum-specific odds ratios. If the null hypothesis of homogeneity is accepted
    then we find a confidence interval for the assumed common value (across strata)
    of the odds ratio.

  108. Efficient algorithm to select tuning parameters in sparse regression modeling with regularization.

    Authors: Sadanori Konishi, Kei Hirose, Shohei Tateishi
    Subjects: Methodology
    Abstract

    In sparse regression modeling with regularization such as the lasso, elastic
    net and bridge regression, it is important to select appropriate values of
    tuning parameters including regularization parameters. The choice of tuning
    parameters can be viewed as a model selection and evaluation problem. The
    degrees of freedom, which leads to Mallows' $C_p$ criterion, plays a key role
    in the theory of model selection. In the present paper, we propose an efficient
    algorithm which computes the degrees of freedom sequentially by extending the
    generalized path seeking (GPS) algorithm.

  109. On a Class of Shrinkage Priors for Covariance Matrix Estimation.

    Authors: Natesh S. Pillai, Hao Wang
    Subjects: Methodology
    Abstract

    We propose a class of scale mixture of uniform distributions to generate
    shrinkage priors for the covariance matrix. This new class of priors enjoys a
    number of advantages over the traditional scale mixture of normal priors,
    including its simplicity in characterizing the prior density based on its
    first-order derivative and computationally efficiency based on a Gibbs sampler.
    We first discuss the theory and computational details of this new approach for
    the covariance matrix estimation.

  110. Tests for multivariate normality based on canonical correlations.

    Authors: Måns Thulin
    Subjects: Methodology
    Abstract

    We propose new affine invariant tests for multivariate normality, based on
    independence characterizations of the sample moments of the normal
    distribution. The test statistics are obtained using canonical correlations
    between sets of sample moments, generalizing the Lin-Mudholkar test for
    normality. The tests are compared to some popular tests based on Mardia's
    skewness and kurtosis measures in an extensive simulation power study and are
    found to offer higher power against many of the alternatives.

  111. Mediation Analysis Without Sequential Ignorability: Using Baseline Covariates Interacted with Random Assignment as Instrumental Variables.

    Authors: Dylan S. Small
    Subjects: Methodology
    Abstract

    In randomized trials, researchers are often interested in mediation analysis
    to understand how a treatment works, in particular how much of a treatment's
    effect is mediated by an intermediated variable and how much the treatment
    directly affects the outcome not through the mediator. The standard regression
    approach to mediation analysis assumes sequential ignorability of the mediator,
    that is that the mediator is effectively randomly assigned given baseline
    covariates and the randomized treatment.

  112. Fibre-Generated Point Processes And Fields Of Orientations.

    Authors: Wilfrid S. Kendall, Bryony J. Hill, Elke Thonnes
    Subjects: Methodology
    Abstract

    This paper introduces a new approach to analysing spatial point data
    clustered along or around a system of curves or `fibres'. Such data arise in
    catalogues of galaxy locations, recorded locations of earthquakes, aerial
    images of minefields, and pore patterns on fingerprints. Finding the underlying
    curvilinear structure of these point-pattern data sets may not only facilitate
    a better understanding of how they arise but also aid reconstruction of missing
    data. We base the space of fibres on the set of integral lines of an
    orientation field.

  113. Estimating Within-School Contact Networks to Understand Influenza Transmission.

    Authors: Jr., Mark S. Handcock, Gail E. Potter, Ira M. Longini, M. Elizabeth Halloran
    Subjects: Methodology
    Abstract

    Many epidemic models approximate social contact behavior by assuming random
    mixing within mixing groups (e.g., homes, schools, workplaces). The effect of
    more realistic social network structure on epidemic parameter estimates is an
    open area of exploration. We develop a statistical model to estimate the social
    contact network within a high school using friendship network data and a
    contact survey. Our model includes classroom structure and longer and more
    frequent contacts to friends than non-friends, based on reports in the contact
    survey.

  114. Stable Graphical Model Estimation with Random Forests for Discrete, Continuous, and Mixed Variables.

    Authors: Peter Bühlmann, Bernd Fellinghauer, Martin Ryffel, Michael von Rhein, Jan D. Reinhardt
    Subjects: Methodology
    Abstract

    A conditional independence graph is a concise representation of pairwise
    conditional independence among many variables. We propose Graphical Random
    Forests (GRaFo) for estimating pairwise conditional independence relationships
    among mixed-type, i.e. continuous and discrete, variables. The number of edges
    is a tuning parameter in any graphical model estimator and there is no obvious
    number that constitutes a good choice. Stability Selection helps choosing this
    parameter with respect to a bound on the expected number of false positives
    (error control).

  115. Phylogenetic Ornstein-Ulenhbeck regression curves.

    Authors: Vasileios Maroulas, Dwueng-Chwuan Jhwueng
    Subjects: Methodology
    Abstract

    A novel method is developed to jointly estimate regression curves applied to
    the evolutionary biology for studying the trait relationships. The adaptive
    evolution model is built on a coupled system of Ornstein-Ulenhbeck processes.
    Our method is then applied to a set of ecological data and it is compared with
    the recent regression method established in [9].

  116. Penalized Q-Learning for Dynamic Treatment Regimes.

    Authors: Rui Song, Michael R. Kosorok, Weiwei Wang, Donglin Zeng
    Subjects: Methodology
    Abstract

    A dynamic treatment regime effectively incorporates both accrued information
    and long-term effects of treatment from specially designed clinical trials. As
    these become more and more popular in conjunction with longitudinal data from
    clinical studies, the development of statistical inference for optimal dynamic
    treatment regimes is a high priority.

  117. On least favorable configurations for step-up-down tests.

    Authors: Gilles Blanchard, Etienne Roquain, Fanny Villers, Thorsten Dickhaus
    Subjects: Methodology
    Abstract

    This paper investigates an open issue related to false discovery rate (FDR)
    control of step-up-down (SUD) multiple testing procedures. It has been
    established in earlier literature that for this type of procedure, under some
    broad conditions, and in an asymptotical sense, the FDR is maximum when the
    signal strength under the alternative is maximum. In other words, so-called
    "Dirac uniform configurations" are asymptotically {\em least favorable} in this
    setting.

  118. Function Based Nonlinear Least Squares and Application to Jelinski--Moranda Software Reliability Model.

    Authors: Jingwei Liu, Meizhi Xu
    Subjects: Methodology
    Abstract

    A function based nonlinear least squares estimation (FNLSE) method is
    proposed and investigated in parameter estimation of Jelinski-Moranda software
    reliability model. FNLSE extends the potential fitting functions of traditional
    least squares estimation (LSE), and takes the logarithm transformed nonlinear
    least squares estimation (LogLSE) as a special case.

  119. Classification Based on Permanental Process with Cyclic Approximations.

    Authors: Jie Yang, Klaus Miescke, Peter McCullagh
    Subjects: Methodology
    Abstract

    In this paper we introduce a statistical model based on a permanental process
    for supervised classification problems. Unlike many research work in the
    literature, we assume only exchangeability instead of independence on
    observations. Regardless of the number of classes or the dimension of the
    feature variables, the model may require only 2-3 parameters for fitting the
    covariance structure within clusters. It works well even if each class occupies
    non-convex, disjoint regions, or regions overlapped with other classes in the
    feature space.

  120. The Importance of Prior Choice in Model Selection: a Density Dependence Example.

    Authors: James D. Lawrence, Dr. Robert B. Gramacy, Dr. Len Thomas, Prof. Stephen T. Buckland
    Subjects: Methodology
    Abstract

    We perform a Bayesian analysis on abundance data for ten species of North
    American duck, using the results to investigate the evidence in favour of
    biologically motivated hypotheses about the causes and mechanisms of density
    dependence in these species. We explore the capabilities of our methods to
    detect density dependent effects, both by simulation and through analyzes of
    real data. The effect of the prior choice on predictive accuracy is also
    examined.

  121. Compound p-Value Statistics for Multiple Testing Procedures.

    Authors: Edsel A. Pena, Joshua D. Habiger
    Subjects: Methodology
    Abstract

    Many multiple testing procedures make use of the p-values from the individual
    pairs of hypothesis tests, and are valid if the p-value statistics are
    independent and uniformly distributed under the null hypotheses.

  122. Chi-square and classical exact tests often wildly misreport significance; the remedy lies in computers.

    Authors: Mark Tygert, Rachel Ward, William Perkins
    Subjects: Methodology
    Abstract

    If a discrete probability distribution in a model being tested for
    goodness-of-fit is not close to uniform, then forming the Pearson chi-square
    statistic can involve division by nearly zero. This often leads to serious
    trouble in practice -- even in the absence of round-off errors -- as the
    present article illustrates via numerous examples.

  123. Rejoinder.

    Authors: Carl Morris
    Subjects: Methodology
    Abstract

    Rejoinder of "Estimating Random Effects via Adjustment for Density
    Maximization" by C. Morris and R. Tang [arXiv:1108.3234]

  124. Discussion of "Estimating Random Effects via Adjustment for Density Maximization" by C. Morris and R. Tang.

    Authors: P. Lahiri, Santanu Pramanik
    Subjects: Methodology
    Abstract

    Discussion of "Estimating Random Effects via Adjustment for Density
    Maximization" by C. Morris and R. Tang [arXiv:1108.3234]

  125. Discussion of "Estimating Random Effects via Adjustment for Density Maximization" by C. Morris and R. Tang.

    Authors: George Casella, Claudio Fuentes
    Subjects: Methodology
    Abstract

    Discussion of "Estimating Random Effects via Adjustment for Density
    Maximization" by C. Morris and R. Tang [arXiv:1108.3234]

  126. Rejoinder.

    Authors: J. N. K. Rao
    Subjects: Methodology
    Abstract

    Rejoinder of "Impact of Frequentist and Bayesian Methods on Survey Sampling
    Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356]

  127. Discussion of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao.

    Authors: Glen Meeden
    Subjects: Methodology
    Abstract

    Discussion of "Impact of Frequentist and Bayesian Methods on Survey Sampling
    Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356]

  128. Sampling from a Bayesian Menu.

    Authors: Alan M. Zaslavsky
    Subjects: Methodology
    Abstract

    Discussion of "Bayesian Models and Methods in Public Policy and Government
    Settings" by S. E. Fienberg [arXiv:1108.2177]

  129. Rejoinder.

    Authors: Stephen E. Fienberg
    Subjects: Methodology
    Abstract

    Rejoinder of "Bayesian Models and Methods in Public Policy and Government
    Settings" by S. E. Fienberg [arXiv:1108.2177]

  130. Discussion of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao.

    Authors: J. Sedransk
    Subjects: Methodology
    Abstract

    This comment emphasizes the importance of model checking and model fitting
    when making inferences about finite population quantities. It also suggests the
    value of using unit level models when making inferences for small
    subpopulations, that is, "small area" analyses [arXiv:1108.2356].

  131. Discussion of "Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal" by J. N. K. Rao.

    Authors: Eric Slud
    Subjects: Methodology
    Abstract

    Discussion of "Impact of Frequentist and Bayesian Methods on Survey Sampling
    Practice: A Selective Appraisal" by J. N. K. Rao [arXiv:1108.2356]

  132. Discussion of "Bayesian Models and Methods in Public Policy and Government Settings" by S. E. Fienberg.

    Authors: Graham Kalton
    Subjects: Methodology
    Abstract

    Discussion of "Bayesian Models and Methods in Public Policy and Government
    Settings" by S. E. Fienberg [arXiv:1108.2177]

  133. Shrinkage Estimation and Selection for Multiple Functional Regression.

    Authors: Heng Lian
    Subjects: Methodology
    Abstract

    Functional linear regression is a useful extension of simple linear
    regression and has been investigated by many researchers. However, functional
    variable selection problems when multiple functional observations exist, which
    is the counterpart in the functional context of multiple linear regression, is
    seldom studied. Here we propose a method using group smoothly clipped absolute
    deviation penalty (gSCAD) which can perform regression estimation and variable
    selection simultaneously.

  134. Discussion of "Bayesian Models and Methods in Public Policy and Government Settings" by S. E. Fienberg.

    Authors: David J. Hand
    Subjects: Methodology
    Abstract

    Fienberg convincingly demonstrates that Bayesian models and methods represent
    a powerful approach to squeezing illumination from data in public policy
    settings. However, no school of inference is without its weaknesses, and, in
    the face of the ambiguities, uncertainties, and poorly posed questions of the
    real world, perhaps we should not expect to find a formally correct inferential
    strategy which can be universally applied, whatever the nature of the question:
    we should not expect to be able to identify a "norm" approach.

  135. Assortment Optimization Under General Choice.

    Authors: Devavrat Shah, Srikanth Jagabathula, Vivek Farias
    Subjects: Methodology
    Abstract

    We consider the problem of static assortment optimization, where the goal is
    to find the assortment of size at most $C$ that maximizes revenues. This is a
    fundamental decision problem in the area of Operations Management. It has been
    shown that this problem is provably hard for most of the important families of
    parametric of choice models, except the multinomial logit (MNL) model. In
    addition, most of the approximation schemes proposed in the literature are
    tailored to a specific parametric structure.

  136. Hyper-g Priors for Generalised Additive Model Selection with Penalised Splines.

    Authors: Daniel Sabanés Bové, Leonhard Held, Göran Kauermann
    Subjects: Methodology
    Abstract

    We propose an automatic Bayesian approach to the selection of covariates and
    their penalised splines transformations in generalised additive models.
    Specification of a default, hyper-g prior for the model parameters and a
    multiplicity-correction prior for the models themselves is crucial for this
    task. We introduce the methodology in the normal model and extend it to
    non-normal exponential families. Two applications from the literature
    illustrate the proposed approach. An efficient implementation is available in
    an R-package.

  137. Rejoinder.

    Authors: Malay Ghosh
    Subjects: Methodology
    Abstract

    Rejoinder of ``Objective Priors: An Introduction for Frequentists'' by M.
    Ghosh [arXiv:1108.2120]

  138. Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh.

    Authors: Trevor Sweeting
    Subjects: Methodology
    Abstract

    Discussion of "Objective Priors: An Introduction for Frequentists" by M.
    Ghosh [arXiv:1108.2120]

  139. Discussion of "Objective Priors: An Introduction for Frequentists" by M. Ghosh.

    Authors: José M. Bernardo
    Subjects: Methodology
    Abstract

    Discussion of "Objective Priors: An Introduction for Frequentists" by M.
    Ghosh [arXiv:1108.2120]

  140. Rejoinder.

    Authors: Roderick Little
    Subjects: Methodology
    Abstract

    Rejoinder of "Calibrated Bayes, for Statistics in General, and Missing Data
    in Particular" by R. Little [arXiv:1108.1917]

  141. Likelihood-Free Parallel Tempering.

    Authors: Meili Baragatti, Agnès Grimaud, Denys Pommeret
    Subjects: Methodology
    Abstract

    Approximate Bayesian Computational (ABC) methods (or likelihood-free methods)
    have appeared in the past fifteen years as useful methods to perform Bayesian
    analyses when the likelihood is analytically or computationally intractable.
    Several ABC methods have been proposed: Monte Carlo Markov Chains (MCMC)
    methods have been developped by Marjoramet al. (2003) and by Bortotet al.
    (2007) for instance, and sequential methods have been proposed among others by
    Sissonet al. (2007), Beaumont et al. (2009) and Del Moral et al. (2009).

  142. Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data in Particular" by R. J. A. Little.

    Authors: Nathaniel Schenker
    Subjects: Methodology
    Abstract

    Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data
    in Particular" by R. Little [arXiv:1108.1917]

  143. Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data in Particular" by R. J. A. Little.

    Authors: Michael D. Larsen
    Subjects: Methodology
    Abstract

    Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data
    in Particular" by R. Little [arXiv:1108.1917]

  144. Bayesian Inference in Nonparametric Dynamic State-Space Models.

    Authors: Sourabh Bhattacharya, Anurag Ghosh, Soumalya Mukhopadhyay, Sandipan Roy
    Subjects: Methodology
    Abstract

    We introduce state-space models where the functionals of the observational
    and the evolu- tionary equations are unknown, and treated as random functions
    evolving with time. Thus, our model is nonparametric and generalizes parametric
    state-space models, such as the extended Kalman filter. This random function
    approach also frees us from the restrictive assumption that the functional
    forms, although time-dependent, are of fixed forms.

  145. Estimating Random Effects via Adjustment for Density Maximization.

    Authors: Carl Morris, Ruoxi Tang
    Subjects: Methodology
    Abstract

    We develop and evaluate point and interval estimates for the random effects
    $\theta_i$, having made observations $y_i|\theta_i\stackrel{\m
    athit{ind}}{\sim}N[\theta_i,V_i],i=1,...,k$ that follow a two-level Normal
    hierarchical model. Fitting this model requires assessing the Level-2 variance
    $A\equiv\operatorname {Var}(\theta_i)$ to estimate shrinkages $B_i\equiv
    V_i/(V_i+A)$ toward a (possibly estimated) subspace, with $B_i$ as the target
    because the conditional means and variances of $\theta_i$ depend linearly on
    $B_i$, not on $A$.

  146. Statistical Analysis in Genetic Studies of Mental Illnesses.

    Authors: Heping Zhang
    Subjects: Methodology
    Abstract

    Identifying the risk factors for mental illnesses is of significant public
    health importance. Diagnosis, stigma associated with mental illnesses,
    comorbidity, and complex etiologies, among others, make it very challenging to
    study mental disorders. Genetic studies of mental illnesses date back at least
    a century ago, beginning with descriptive studies based on Mendelian laws of
    inheritance.

  147. Adaptive sequential Monte Carlo by means of mixture of experts.

    Authors: J. Cornebise, E. Moulinesy, J. Olsson
    Subjects: Methodology
    Abstract

    Selecting appropriately the proposal kernel of particle filters is an issue
    of significant importance, since a bad choice may lead to deterioration of the
    particle sample and, consequently, waste of computational power. In this paper
    we introduce a novel algorithm approximating adaptively the so-called optimal
    proposal kernel by a mixture of integrated curved exponential distributions
    with logistic weights. This family of distributions is broad enough to be used
    in the presence of multi-modality or strongly skewed distributions.

  148. Local degeneracy of Markov chain Monte Carlo methods.

    Authors: Kengo Kamatani
    Subjects: Methodology
    Abstract

    We study asymptotic behavior of Monte Carlo method. Local consistency is one
    of an ideal property of Monte Carlo method. However, it may fail to hold local
    consistency for several reason. In fact, in practice, it is more important to
    study such a non-ideal behavior. We call local degeneracy for one of a
    non-ideal behavior of Monte Carlo methods. We show some equivalent conditions
    for local degeneracy.

  149. Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal.

    Authors: J. N. K. Rao
    Subjects: Methodology
    Abstract

    According to Hansen, Madow and Tepping [J. Amer. Statist. Assoc. 78 (1983)
    776--793], "Probability sampling designs and randomization inference are widely
    accepted as the standard approach in sample surveys." In this article, reasons
    are advanced for the wide use of this design-based approach, particularly by
    federal agencies and other survey organizations conducting complex large scale
    surveys on topics related to public policy.

  150. Bayesian Models and Methods in Public Policy and Government Settings.

    Authors: Stephen E. Fienberg
    Subjects: Methodology
    Abstract

    Starting with the neo-Bayesian revival of the 1950s, many statisticians
    argued that it was inappropriate to use Bayesian methods, and in particular
    subjective Bayesian methods in governmental and public policy settings because
    of their reliance upon prior distributions. But the Bayesian framework often
    provides the primary way to respond to questions raised in these settings and
    the numbers and diversity of Bayesian applications have grown dramatically in
    recent years.

  151. Intrinsic Means on the Circle: Uniqueness, Locus and Asymptotics.

    Authors: Thomas Hotz, Stephan Huckemann
    Subjects: Methodology
    Abstract

    This paper gives a comprehensive treatment of local uniqueness, asymptotics
    and numerics for intrinsic means on the circle. It turns out that local
    uniqueness as well as rates of convergence are governed by the distribution
    near the antipode. In a nutshell, if the distribution there is locally less
    than uniform, we have local uniqueness and asymptotic normality with a rate of
    1 / \surdn. With increased proximity to the uniform distribution the rate can
    be arbitrarly slow, and in the limit, local uniqueness is lost. Further, we
    give general distributional conditions, e.g.

  152. Objective Priors: An Introduction for Frequentists.

    Authors: Malay Ghosh
    Subjects: Methodology
    Abstract

    Bayesian methods are increasingly applied in these days in the theory and
    practice of statistics. Any Bayesian inference depends on a likelihood and a
    prior. Ideally one would like to elicit a prior from related sources of
    information or past data. However, in its absence, Bayesian methods need to
    rely on some "objective" or "default" priors, and the resulting posterior
    inference can still be quite valuable. Not surprisingly, over the years, the
    catalog of objective priors also has become prohibitively large, and one has to
    set some specific criteria for the selection of such priors.

  153. Fully Bayes Factors with a Generalized g-prior.

    Authors: Yuzo Maruyama, Edward I. George
    Subjects: Methodology
    Abstract

    For the normal linear model variable selection problem, we propose selection
    criteria based on a fully Bayes formulation with a generalization of Zellner's
    g-prior which allows for p > n. A special case of the prior formulation is seen
    to yield tractable closed forms for marginal densities and Bayes factors which
    reveal new model evaluation characteristics of potential interest.

  154. Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data.

    Authors: Heng Lian, Peng Lai, Qihua Wang
    Subjects: Methodology
    Abstract

    In this paper, we present a generalized estimating equations based estimation
    approach and a variable selection procedure for single-index models when the
    observed data are clustered. Unlike the case of independent observations,
    bias-correction is necessary when general working correlation matrices are used
    in the estimating equations.

  155. Incorporating Individual and Collective Ethics into Phase I Cancer Trial Designs.

    Authors: Jay Bartroff, Tze Leung Lai
    Subjects: Methodology
    Abstract

    A general framework is proposed for Bayesian model-based designs of Phase I
    cancer trials, in which a general criterion for coherence (Cheung, 2005) of a
    design is also developed. This framework can incorporate both "individual" and
    "collective" ethics into the design of the trial. We propose a new design which
    minimizes a risk function composed of two terms, with one representing the
    individual risk of the current dose and the other representing the collective
    risk.

  156. Improved testing inference in mixed linear models.

    Authors: Silvia L.P. Ferrari, Francisco Cribari-Neto, Tatiane F.N. Melo
    Subjects: Methodology
    Abstract

    Mixed linear models are commonly used in repeated measures studies. They
    account for the dependence amongst observations obtained from the same
    experimental unit. Oftentimes, the number of observations is small, and it is
    thus important to use inference strategies that incorporate small sample
    corrections. In this paper, we develop modified versions of the likelihood
    ratio test for fixed effects inference in mixed linear models. In particular,
    we derive a Bartlett correction to such a test and also to a test obtained from
    a modified profile likelihood function.

  157. Minimum power divergence estimators, maximum likelihood and exponential families.

    Authors: Michel Broniatowski
    Subjects: Methodology
    Abstract

    This note introduces the dual representation of the divergence between two
    distributions in a parametric model. Resulting estimators for the divergence as
    for the parameter are derived. These estimators do not make use of any grouping
    nor smoothing. It is proved that all power-type divergences with non negative
    index induce the same estimator of the parameter on any exponential family,
    which is the MLE.

  158. An Empirical Likelihood Approach to Nonparametric Covariate Adjustment in Randomized Clinical Trials.

    Authors: Zhiliang Ying, Xiaoru Wu
    Subjects: Methodology
    Abstract

    Covariate adjustment is an important tool in the analysis of randomized
    clinical trials and observational studies. It can be used to increase
    efficiency and thus power, and to reduce possible bias. While most statistical
    tests in randomized clinical trials are nonparametric in nature, approaches for
    covariate adjustment typically rely on specific regression models, such as the
    linear model for a continuous outcome, the logistic regression model for a
    dichotomous outcome and the Cox model for survival time. Several recent efforts
    have focused on model-free covariate adjustment.

  159. Measurement and Application of Entropy Production Rate in Human Subject Social Interaction Systems.

    Authors: Bin Xu, Zhijian Wang
    Subjects: Methodology
    Abstract

    This paper illustrates the measurement and the applications of the
    observable, entropy production rate (EPR), in human subject social interaction
    systems. To this end, we show (1) how to test the minimax randomization model
    with experimental economics' 2$\times$2 games data and with the Wimbledon
    Tennis data; (2) how to identify the Edgeworth price cycle in experimental
    market data; and (3) the relationship within EPR and motion in data.

  160. Multiplicative Propagation of Error During Recursive Wavelet Estimation.

    Authors: Michael A. Cohen, Can Ozan Tan
    Subjects: Methodology
    Abstract

    Wavelet coefficients are estimated recursively at progressively coarser
    scales recursively. As a result, the estimation is prone to multiplicative
    propagation of truncation errors due to quantization and round-off at each
    stage. Yet, the influence of this propagation on wavelet filter output has not
    been explored systematically.

  161. Estimating Extremal Dependence in Univariate and Multivariate Time Series via the Extremogram.

    Authors: Thomas Mikosch, Richard A. Davis, Ivor Cribben
    Subjects: Methodology
    Abstract

    Davis and Mikosch [7] introduced the extremogram as a flexible quantitative
    tool for measuring various types of extremal dependence in a stationary time
    series. There we showed some standard statistical properties of the sample
    extremogram. A major difficulty was the construction of credible confidence
    bands for the extremogram. In this paper, we employ the stationary bootstrap to
    overcome this problem. Moreover, we introduce the cross extremogram as a
    measure of extremal serial dependence between two or more time series.

  162. Measuring Association between Random Vectors.

    Authors: Johan Segers, Oliver Grothe, Friedrich Schmid, Julius Schnieders
    Subjects: Methodology
    Abstract

    This paper suggests five measures of association between two random vectors X
    = (X_1, ..., X_p) and Y = (Y_1, ..., Y_q). They are copula based and therefore
    invariant with respect to the marginal distributions of the components X_i and
    Y_j. The measures capture positive as well as negative association of X and Y.
    In case p = q = 1 they reduce to Spearman's rho. Various properties of these
    new measures are investigated. Nonparametric estimators, based on ranks, for
    the measures are derived and their small sample behaviour is investigated by
    simulation.

  163. Max-stable processes for modelling extremes observed in space and time.

    Authors: Richard A. Davis, Claudia Klüppelberg, Christina Steinkohl
    Subjects: Methodology
    Abstract

    Max-stable processes have proved to be useful for the statistical modelling
    of spatial extremes. Several representations of max-stable random fields have
    been proposed in the literature. For statistical inference it is often assumed
    that there is no temporal dependence, i.e., the observations at spatial
    locations are independent in time. We use two representations of stationary
    max-stable spatial random fields and extend the concepts to the space-time
    domain.

  164. Detection with the scan and the average likelihood ratio.

    Authors: Hock Peng Chan, Guenther Walther
    Subjects: Methodology
    Abstract

    We investigate the performance of the scan (maximum likelihood ratio
    statistic) and of the average likelihood ratio statistic in the problem of
    detecting a deterministic signal with unknown spatial extent in the
    prototypical univariate sampled data model with white Gaussian noise. Our
    results show that the scan statistic, a popular tool for detection problems, is
    optimal only for the detection of signals with the smallest spatial extent. For
    signals with larger spatial extent the scan is suboptimal, and the power loss
    can be considerable.

  165. Sequential Monte Carlo EM for multivariate probit models.

    Authors: Giusi Moffa, Jack Kuipers
    Subjects: Methodology
    Abstract

    A Monte Carlo EM algorithm is considered for the maximum likelihood
    estimation of multivariate probit models. To sample from truncated multivariate
    normals we introduce a sequential Monte Carlo approach, while to improve the
    efficiency in driving the sample particles to the truncation region Student $t$
    distributions are invoked before taking their limit to a normal. After the
    initial sampling, a sequential Monte Carlo step can be performed to shift to
    new parameter values, recycling the samples and so reducing the computational
    cost.

  166. A Bayesian Surrogate Model for Rapid Time Series Analysis and Application to Exoplanet Observations.

    Authors: Eric B. Ford, Althea V. Moorhead, Dimitri Veras
    Subjects: Methodology
    Abstract

    We present a Bayesian surrogate model for the analysis of periodic or
    quasi-periodic time series data. We describe a computationally efficient
    implementation that enables Bayesian model comparison. We apply this model to
    simulated and real exoplanet observations. We discuss the results and
    demonstrate some of the challenges for applying our surrogate model to
    realistic exoplanet data sets. In particular, we find that analyses of real
    world data should pay careful attention to the effects of uneven spacing of
    observations and the choice of prior for the "jitter" parameter.

  167. A new semi-parametric family of probability distributions for survival analysis.

    Authors: Jean-Michel Marin, Damien Bousquet, Jean-Pierre Daurès
    Subjects: Methodology
    Abstract

    In the context of survival analysis, Marshall and Olkin (1997) introduced
    families of distributions by adding a scalar parameter to a given survival
    function, parameterized or not. In that paper, we generalize their approach. We
    show how it is possible to add more than a single parameter to a given
    distribution. We then introduce very flexible families of distributions for
    which we calculate some moments. Notably, we give some tractable expressions of
    these moments when the given baseline distribution is Log-logistic.

  168. Maximum likelihood estimation and confidence bands for a discrete log-concave distribution.

    Authors: Fadoua Balabdaoui, Kaspar Rufibach, Hanna Jankowski
    Subjects: Methodology
    Abstract

    The assumption of log-concavity is an attractive and flexible nonparametric
    shape constraint in distribution modelling. In this work, we study the maximum
    likelihood estimator (MLE) of a log-concave probability mass function. We show
    that the MLE is strongly consistent and derive pointwise asymptotic theory,
    which is used to calculate confidence bands for the true probability mass
    function. The proposed estimator and associated confidence bands may be easily
    computed using the R package logcondiscr.

  169. Varying-coefficient modeling via regularized basis functions.

    Authors: Shuichi Kawano, Hidetoshi Matsui, Toshihiro Misumi
    Subjects: Methodology
    Abstract

    We address the problem of constructing varying-coefficient models based on
    basis expansions along with the technique of regularization. A crucial point in
    our modeling procedure is the selection of smoothing parameters in the
    regularization method. In order to choose the parameters objectively, we derive
    model selection criteria from the viewpoints of information-theoretic and
    Bayesian approach. We demonstrate the effectiveness of proposed modeling
    strategy through Monte Carlo simulations and analyzing a real data set.

  170. A note on global Markov properties for mixed graphs.

    Authors: Michael Eichler
    Subjects: Methodology
    Abstract

    Global Markov properties in mixed graphs are usually formulated in terms of
    the path-oriented m-separation or by use of augmented graphs (similar to moral
    graphs in the case of directed acyclic graphs). We provide an alternative
    characterization that can be easily implemented.

  171. Sequential Lasso for feature selection with ultra-high dimensional feature space.

    Authors: Shan Luo, Zehua Chen
    Subjects: Methodology
    Abstract

    We propose a novel approach, Sequential Lasso, for feature selection in
    linear regression models with ultra-high dimensional feature spaces. We
    investigate in this article the asymptotic properties of Sequential Lasso and
    establish its selection consistency. Like other sequential methods, the
    implementation of Sequential Lasso is not limited by the dimensionality of the
    feature space. It has advantages over other sequential methods. The simulation
    studies comparing Sequential Lasso with other sequential methods are reported.

  172. Heavy tailed priors: an alternative to non-informative priors in the estimation of proportions on small areas.

    Authors: Jairo Fuquene, Brenda Betancourt
    Subjects: Methodology
    Abstract

    We explore the Cauchy and a new heavy tailed (Fuquene, Perez and Pericchi
    (2011)) priors to estimate proportions on small areas. Hierarchical models and
    the Binomial likelihood in the exponential family form are used. We believe
    that the heavy tailed priors in survey sampling settings could be more
    effective than the choice of noninformative priors to eliminate antipathy
    towards methods that involve subjective elements or assumptions. To illustrate
    the robust Bayesian approach, we apply this methodology in a popular example:
    "the clement problem".

  173. An EM Algorithm for Continuous-time Bivariate Markov Chains.

    Authors: Brian L. Mark, Yariv Ephraim
    Subjects: Methodology
    Abstract

    We study properties and parameter estimation of finite-state homogeneous
    continuous-time bivariate Markov chains. Only one of the two processes of the
    bivariate Markov chain is observable. The general form of the bivariate Markov
    chain studied here makes no assumptions on the structure of the generator of
    the chain, and hence, neither the underlying process nor the observable process
    is necessarily Markov. The bivariate Markov chain allows for simultaneous jumps
    of the underlying and observable processes. Furthermore, the inter-arrival time
    of observed events is phase-type.

  174. Nonparametric estimation of multivariate extreme-value copulas.

    Authors: Johan Segers, Gordon Gudendorf
    Subjects: Methodology
    Abstract

    Extreme-value copulas arise in the asymptotic theory for componentwise maxima
    of independent random samples. An extreme-value copula is determined by its
    Pickands dependence function, which is a function on the unit simplex subject
    to certain shape constraints that arise from an integral transform of an
    underlying measure called spectral measure. Multivariate extensions are
    provided of certain rank-based nonparametric estimators of the Pickands
    dependence function.

  175. Strengthened Chernoff-type variance bounds.

    Authors: G. Afendras, N. Papadatos
    Subjects: Methodology
    Abstract

    Let X be any absolutely continuous random variable from the integrated
    Pearson family and assume that X has finite moments of any order. Equivalently,
    X is a linear (non-constant) transformation of Y where Y follows a Normal, a
    Beta or a Gamma density. Using some properties of the orthonormal polynomial
    system corresponding to X we provide a class of strengthened Chernoff-type
    variance bounds. (A detailed review on orthogonal polynomials within the
    Pearson system is included in the Appendix.)

  176. Modelling outliers and structural breaks in dynamic linear models with a novel use of a heavy tailed prior for the variances: An alternative to the Inverted Gamma.

    Authors: Maria Perez, Jairo Fuquene, Luis Pericchi
    Subjects: Methodology
    Abstract

    In this paper we propose a new wider class of hypergeometric heavy tailed
    priors that are given as the convolution of a Student-t density for the
    location parameter and a Scaled Beta2 prior for the variance. These priors have
    heavier tails than Student-t prior, and the variances have a sensible behavior
    both at the origin and at the tail, making it suitable for objective analysis.
    Since the representation of our proposal is a scale mixture, it is suitable to
    detect sudden changes in the model.

  177. Approximate Propagation of both Epistemic and Aleatory Uncertainty through Dynamic Systems.

    Authors: Gabriel Terejanu, Tarunraj Singh, Peter D. Scott, Puneet Singla
    Subjects: Methodology
    Abstract

    When ignorance due to the lack of knowledge, modeled as epistemic uncertainty
    using Dempster-Shafer structures on closed intervals, is present in the model
    parameters, a new uncertainty propagation method is necessary to propagate both
    aleatory and epistemic uncertainty. The new framework proposed here, combines
    both epistemic and aleatory uncertainty into a second-order uncertainty
    representation which is propagated through a dynamic system driven by white
    noise.

  178. Efficient Emulators of Computer Experiments Using Compactly Supported Correlation Functions, With an Application to Cosmology.

    Authors: Derek Bingham, Cari Kaufman, Salman Habib, Katrin Heitmann, Joshua Frieman
    Subjects: Methodology
    Abstract

    Statistical emulators of computer simulators have proven to be useful in a
    variety of applications.

  179. The Lasso, correlated design, and improved oracle inequalities.

    Authors: Sara van de Geer, Johannes Lederer
    Subjects: Methodology
    Abstract

    We study high-dimensional linear models and the $\ell_1$-penalized least
    squares estimator, also known as the Lasso estimator. In literature, oracle
    inequalities have been derived under restricted eigenvalue or compatibility
    conditions. In this paper, we complement this with entropy conditions which
    allow one to improve the dual norm bound, and demonstrate how this leads to new
    oracle inequalities. The new oracle inequalities show that a smaller choice for
    the tuning parameter and a trade-off between $\ell_1$-norms and small
    compatibility constants are possible.

  180. On false discovery rate thresholding for classification under sparsity.

    Authors: Etienne Roquain, Pierre Neuvial
    Subjects: Methodology
    Abstract

    We study the properties of false discovery rate (FDR) thresholding, viewed as
    a classification procedure. The "0"-class (null) is assumed to have a known,
    symmetric log-concave density while the "1"-class (alternative) is obtained
    from the "0"-class either by translation (location model) or by scaling (scale
    model). Furthermore, the "1"-class is assumed to have a small number of
    elements w.r.t. the "0"-class (sparsity). Non-asymptotic oracle inequalities
    are derived for the excess risk of FDR thresholding.

  181. Monte Carlo algorithms for model assessment via conflicting summaries.

    Authors: Christian Robert, Oliver Ratmann, Sylvia Richardson, Pierre Pudlo
    Subjects: Methodology
    Abstract

    The development of statistical methods and numerical algorithms for model
    choice is vital to many real-world applications. In practice, the ABC approach
    can be instrumental for sequential model design; however, the theoretical basis
    of its use has been questioned. We present a measure-theoretic framework for
    using the ABC error towards model choice and describe how easily existing
    rejection, Metropolis-Hastings and sequential importance sampling ABC
    algorithms are extended for the purpose of model checking.

  182. Grouped Variable Selection via Nested Spike and Slab Priors.

    Authors: Tso-Jung Yen, Yu-Min Yen
    Subjects: Methodology
    Abstract

    In this paper we study grouped variable selection problems by proposing a
    specified prior, called the nested spike and slab prior, to model collective
    behavior of regression coefficients. At the group level, the nested spike and
    slab prior puts positive mass on the event that the l2-norm of the grouped
    coefficients is equal to zero. At the individual level, each coefficient is
    assumed to follow a spike and slab prior.

  183. Nonasymptotic bounds on the estimation error of MCMC algorithms.

    Authors: Krzysztof Latuszynski, Blazej Miasojedow, Wojciech Niemiro
    Subjects: Methodology
    Abstract

    We address the problem of upper bounding the mean square error of MCMC
    estimators. Our analysis is non-asymptotic. We first establish a general result
    valid for essentially all ergodic Markov chains encountered in Bayesian
    computation and a possibly unbounded target function $f.$ The bound is sharp in
    the sense that the leading term is exactly $\asvar/n$, where $\asvar$ is the
    CLT asymptotic variance. Next, we proceed to specific assumptions and give
    explicit computable bounds for geometrically and polynomially ergodic Markov
    chains.

  184. Re-calibration of sample means.

    Authors: Y. Ritov, E. Greenshtein
    Subjects: Methodology
    Abstract

    We consider the problem of calibration and the GREG method as suggested and
    studied in Deville and Sarndal (1992). We show that a GREG type estimator is
    typically not minimal variance unbiased estimator even asymptotically. We
    suggest a similar estimator which is unbiased but is asymptotically with a
    minimal variance.

  185. Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    Multiple comparison procedures that control a family-wise error rate or false
    discovery rate provide an achieved error rate as the adjusted p-value for each
    hypothesis tested. However, since such p-values are not probabilities that the
    null hypotheses are true, empirical Bayes methods have been devised to estimate
    such posterior probabilities, called local false discovery rates (LFDRs) to
    emphasize the frequency interpretation of their priors.

  186. Nonparametric Regression Estimation with Incomplete Data: Minimax Global Convergence Rates and Adaptivity.

    Authors: Marianna Pensky, Theofanis Sapatinas, Anestis Antoniadis
    Subjects: Methodology
    Abstract

    We consider the nonparametric regression estimation problem of recovering an
    unknown response function $f$ on the basis of incomplete data when the design
    points follow a known density $g$ with a finite number of well separated zeros.
    In particular, we consider two different cases: when $g$ has zeros of a
    polynomial order and when $g$ has zeros of an exponential order. These two
    cases correspond to moderate and severe data losses, respectively.

  187. Finite mixture models with predictive recursion marginal likelihood.

    Authors: Ryan Martin
    Subjects: Methodology
    Abstract

    Estimation of finite mixture models when the mixing distribution support is
    unknown is an important and challenging problem. In this paper, a new approach
    is given based on the recently proposed predictive recursion marginal
    likelihood (PRML) method. By taking a sufficiently fine grid as a set of
    candidate support points, one may treat the support itself as an unknown
    parameter to be estimated. The PRML approach asymptotically integrates out the
    mixing distribution itself, leaving an approximate marginal likelihood for the
    support, which can be used for estimation.

  188. Essentially ML ASN-Minimax double sampling plans.

    Authors: Eno Vangjeli
    Subjects: Methodology
    Abstract

    Subject of this paper is ASN-Minimax (AM) double sampling plans by variables
    for a normally distributed quality characteristic with unknown standard
    deviation and two-sided specification limits.

  189. Stochastic Search for Semiparametric Linear Regression Models.

    Authors: Richard Samworth, Lutz Duembgen, Dominic Schuhmacher
    Subjects: Methodology
    Abstract

    This paper introduces and analyzes a stochastic search method for parameter
    estimation in linear regression models in the spirit of Beran and Millar
    (1987). The idea is to generate a random finite subset of a parameter space
    which will automatically contain points which are very close to an unknown true
    parameter. The motivation for this procedure comes from recent work of
    Duembgen, Samworth and Schuhmacher (2011) on regression models with log-concave
    error distributions.

  190. Rejoinder.

    Authors: Robert E. Kass
    Subjects: Methodology
    Abstract

    Rejoinder of "Statistical Inference: The Big Picture" by R. E. Kass
    [arXiv:1106.2895]

  191. Discussion of "Statistical Inference: The Big Picture" by R. E. Kass.

    Authors: Hal Stern
    Subjects: Methodology
    Abstract

    Discussion of "Statistical Inference: The Big Picture" by R. E. Kass
    [arXiv:1106.2895]

  192. Discussion of "Statistical Inference: The Big Picture" by R. E. Kass.

    Authors: Robert McCulloch
    Subjects: Methodology
    Abstract

    Discussion of "Statistical Inference: The Big Picture" by R. E. Kass
    [arXiv:1106.2895]

  193. Discussion of "Statistical Inference: The Big Picture" by R. E. Kass.

    Authors: Steven N. Goodman
    Subjects: Methodology
    Abstract

    Discussion of "Statistical Inference: The Big Picture" by R. E. Kass
    [arXiv:1106.2895]

  194. Semiparametric inference in mixture models with predictive recursion marginal likelihood.

    Authors: Surya T. Tokdar, Ryan Martin
    Subjects: Methodology
    Abstract

    Predictive recursion is an accurate and computationally efficient algorithm
    for nonparametric estimation of mixing densities in mixture models. In
    semiparametric mixture models, however, the algorithm fails to account for any
    uncertainty in the additional unknown structural parameter. As an alternative
    to existing profile likelihood methods, we treat predictive recursion as a
    filter approximation to fitting a fully Bayes model, whereby an approximate
    marginal likelihood of the structural parameter emerges and can be used for
    inference.

  195. Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies.

    Authors: Marina Vannucci, Terrance Savitsky, Naijun Sha
    Subjects: Methodology
    Abstract

    This paper presents a unified treatment of Gaussian process models that
    extends to data from the exponential dispersion family and to survival data.
    Our specific interest is in the analysis of data sets with predictors that have
    an a priori unknown form of possibly nonlinear associations to the response.
    The modeling approach we describe incorporates Gaussian processes in a
    generalized linear model framework to obtain a class of nonparametric
    regression models where the covariance matrix depends on the predictors. We
    consider, in particular, continuous, categorical and count responses.

  196. Estimation of covariance matrices based on hierarchical inverse-Wishart priors.

    Authors: Mathilde Bouriga, Olivier Féron
    Subjects: Methodology
    Abstract

    This paper focuses on Bayesian shrinkage for covariance matrix estimation. We
    examine posterior properties and frequentist risks of Bayesian estimators based
    on new hierarchical inverse-Wishart priors. More precisely, we give the
    existence conditions of the posterior distributions. Advantages in terms of
    numerical simulations of posteriors are shown. A simulation study illustrates
    the performance of the estimation procedures under three loss functions for
    relevant sample sizes and various covariance structures.

  197. Statistical Modeling of RNA-Seq Data.

    Authors: Julia Salzman, Hui Jiang, Wing Hung Wong
    Subjects: Methodology
    Abstract

    Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been
    developed as an approach for analysis of gene expression. By obtaining tens or
    even hundreds of millions of reads of transcribed sequences, an RNA-Seq
    experiment can offer a comprehensive survey of the population of genes
    (transcripts) in any sample of interest. This paper introduces a statistical
    model for estimating isoform abundance from RNA-Seq data and is flexible enough
    to accommodate both single end and paired end RNA-Seq data and sampling bias
    along the length of the transcript.

  198. Bayesian Statistical Pragmatism.

    Authors: Andrew Gelman
    Subjects: Methodology
    Abstract

    Discussion of "Statistical Inference: The Big Picture" by R. E. Kass
    [arXiv:1106.2895]

  199. Distortion risk measures for sums of dependent losses.

    Authors: Brahim Brahim, Djamel Meraghni, Abdelhakim Necir
    Subjects: Methodology
    Abstract

    We discuss two distinct approaches, for distorting risk measures of sums of
    dependent random variables, which preserve the property of coherence. The
    first, based on distorted expectations, operates on the survival function of
    the sum. The second, simultaneously applies the distortion on the survival
    function of the sum and the dependence structure of risks, represented by
    copulas. Our goal is to propose risk measures that take into account the
    fluctuations of losses and possible correlations between risk components.

  200. Modern Sequential Analysis and its Applications to Computerized Adaptive Testing.

    Authors: Jay Bartroff, Tze Leung Lai, Matthew Finkelman
    Subjects: Methodology
    Abstract

    After a brief review of recent advances in sequential analysis involving
    sequential generalized likelihood ratio tests, we discuss their use in
    psychometric testing and extend the asymptotic optimality theory of these
    sequential tests to the case of sequentially generated experiments, of
    particular interest in computerized adaptive testing. We then show how these
    methods can be used to design adaptive mastery tests, which are asymptotically
    optimal and are also shown to provide substantial improvements over currently
    used sequential and fixed length tests.

  201. Resistant estimates for high dimensional and functional data based on random projections.

    Authors: Ricardo Fraiman, Marcela Svarc
    Subjects: Methodology
    Abstract

    In this paper we propose a new robust estimation method based on random
    projections which is adaptive, produces an automatic robust estimate, while
    being easy to compute for high or infinite dimensional data. Under some
    restricted contamination model, the procedure is robust and attains full
    efficiency. We challenge the method with some simulation data and we apply it
    to a real data example.

  202. Distribution fitting 13. Analysis of independent, multiplicative effect of factors. Application to effect of essential oils extracts from plant species on bacterial species. Application to factors of antibacterial activity of plant species.

    Authors: Lorentz Jäntschi, Sorana D. Bolboac{\ba}, Mugur C. B{\ba}lan, Radu E. Sestraş
    Subjects: Methodology
    Abstract

    A factor effect study was conducted on a set of observations at the
    contingency of a series of plant species and bacteria species regarding the
    antibacterial activity of essential oil extracts. The study reveals a very good
    agreement between the observations and the hypothesis of independent and
    multiplicative effect of plant and bacteria species factors on the
    antibacterial activity. Shaping of the observable to a Negative Binomial
    distribution allowed the separation of two convoluted Gamma distributions in
    the observable further assigned to the distribution of factors.

  203. Threshold estimation based on a p-value framework in dose-response and regression settings.

    Authors: George Michailidis, Bodhisattva Sen, Atul Mallik, Moulinath Banerjee
    Subjects: Methodology
    Abstract

    We use p-values to identify the threshold level at which a regression
    function takes off from its baseline value, a problem motivated by applications
    in toxicological and pharmacological dose-response studies and environmental
    statistics. We study the problem in two sampling settings: one where multiple
    responses can be obtained at a number of different covariate-levels and the
    other the standard regression setting involving limited number of response
    values at each covariate.

  204. Confidence Regions for Means of Random Sets using Oriented Distance Functions.

    Authors: Hanna K. Jankowski, Larissa I. Stanberry
    Subjects: Methodology
    Abstract

    Image analysis frequently deals with shape estimation and image
    reconstruction. The ob jects of interest in these problems may be thought of as
    random sets, and one is interested in finding a representative, or expected,
    set. We consider a definition of set expectation using oriented distance
    functions and study the properties of the associated empirical set. Conditions
    are given such that the empirical average is consistent, and a method to
    calculate a confidence region for the expected set is introduced. The proposed
    method is applied to both real and simulated data examples.

  205. A data-based power transformation for compositional data.

    Authors: Michail T. Tsagris, Simon Preston, Andrew T.A. Wood
    Subjects: Methodology
    Abstract

    Compositional data analysis is carried out either by neglecting the
    compositional constraint and applying standard multivariate data analysis, or
    by transforming the data using the logs of the ratios of the components. In
    this work we examine a more general transformation which includes both
    approaches as special cases. It is a power transformation and involves a single
    parameter, {\alpha}. The transformation has two equivalent versions. The first
    is the stay-in-the-simplex version, which is the power transformation as
    defined by Aitchison in 1986.

  206. Testing for homogeneity of variance in the wavelet domain.

    Authors: Eric Moulines, François Roueff, Olaf Kouamo
    Subjects: Methodology
    Abstract

    The danger of confusing long-range dependence with non-stationarity has been
    pointed out by many authors. Finding an answer to this difficult question is of
    importance to model time-series showing trend-like behavior, such as river
    run-off in hydrology, historical temperatures in the study of climates changes,
    or packet counts in network traffic engineering. The main goal of this paper is
    to develop a test procedure to detect the presence of non-stationarity for a
    class of processes whose $K$-th order difference is stationary.

  207. Multivariate stratified sampling by stochastic multiobjective optimisation.

    Authors: Jose A. Diaz-Garcia, Rogelio Ramos-Quiroga
    Subjects: Methodology
    Abstract

    This work considers the allocation problem for multivariate stratified random
    sampling as a problem of integer non-linear stochastic multiobjective
    mathematical programming. With this goal in mind the asymptotic distribution of
    the vector of sample variances is studied. Two alternative approaches are
    suggested for solving the allocation problem for multivariate stratified random
    sampling. An example is presented by applying the different proposed
    techniques.

  208. Beta processes, stick-breaking, and power laws.

    Authors: Michael I. Jordan, Jim Pitman, Tamara Broderick
    Subjects: Methodology
    Abstract

    The beta-Bernoulli process provides a Bayesian nonparametric prior for models
    involving collections of binary-valued features. A draw from the beta process
    provides an infinite collection of probabilities in the unit interval, and a
    draw from the Bernoulli process turns these into binary-valued features. Recent
    work has shown how to derive stick-breaking representations for the beta
    process, by analogy to Sethuraman's derivation of a stick-breaking
    representation for the Dirichlet process.

  209. Manifold embedding for curve registration.

    Authors: Jean-Michel Loubes, Chloé Dimeglio, Elie Maza
    Subjects: Methodology
    Abstract

    We focus on the problem of finding a good representative of a sample of
    random curves warped from a common pattern f. We first prove that such a
    problem can be moved onto a manifold framework. Then, we propose an estimation
    of the common pattern f based on an approximated geodesic distance on a
    suitable manifold. We then compare the proposed method to more classical
    methods.

  210. spikeSlabGAM: Bayesian Variable Selection, Model Choice and Regularization for Generalized Additive Mixed Models in R.

    Authors: Fabian Scheipl
    Subjects: Methodology
    Abstract

    The R package spikeSlabGAM implements Bayesian variable selection, model
    choice, and regularized estimation in (geo-)additive mixed models for Gaussian,
    binomial, and Poisson responses. Its purpose is to (1) choose an appropriate
    subset of potential covariates and their interactions, (2) to determine whether
    linear or more flexible functional forms are required to model the effects of
    the respective covariates, and (3) to estimate their shapes.

  211. Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models.

    Authors: Fabian Scheipl, Ludwig Fahrmeir, Thomas Kneib
    Subjects: Methodology
    Abstract

    Structured additive regression provides a general framework for complex
    exponential family regression models, with predictors comprising arbitrary
    combinations of nonlinear functions and surfaces, spatial effects, varying
    coefficients, random effects and further regression terms. The large
    flexibility of structured additive regression makes function selection a
    challenging and important task, aiming at (1) selecting the relevant
    covariates, (2) choosing an appropriate and parsimonious representation of the
    impact of covariates on the predictor and (3) determining the required
    interactions.

  212. M-estimators for Isotonic Regression.

    Authors: Enrique E. Álvarez, Víctor J. Yohai
    Subjects: Methodology
    Abstract

    In this paper we propose a family of robust estimates for isotonic
    regression: isotonic M-estimators. We show that their asymptotic distribution
    is, up to an scalar factor, the same as that of Brunk's classical isotonic
    estimator. We also derive the influence function and the breakdown point of
    these estimates. Finally we perform a Monte Carlo study that shows that the
    proposed family includes estimators that are simultaneously highly efficient
    under gaussian errors and highly robust when the error distribution has heavy
    tails.

  213. State-Observation Sampling and the Econometrics of Learning Models.

    Authors: Laurent E. Calvet, Veronika Czellar
    Subjects: Methodology
    Abstract

    In nonlinear state-space models, sequential learning about the hidden state
    can proceed by particle filtering when the density of the observation
    conditional on the state is available analytically (e.g. Gordon et al., 1993).
    This condition need not hold in complex environments, such as the
    incomplete-information equilibrium models considered in financial economics. In
    this paper, we make two contributions to the learning literature. First, we
    introduce a new filtering method, the state-observation sampling (SOS) filter,
    for general state-space models with intractable observation densities.

  214. High Dimensional Covariance Matrix Estimation in Approximate Factor Models.

    Authors: Jianqing Fan, Yuan Liao, Martina Mincheva
    Subjects: Methodology
    Abstract

    The variance covariance matrix plays a central role in the inferential
    theories of high dimensional factor models in finance and economics. Popular
    regularization methods of directly exploiting sparsity are not directly
    applicable to many financial problems. Classical methods of estimating the
    covariance matrices are based on the strict factor models, assuming independent
    idiosyncratic components. This assumption, however, is restrictive in practical
    applications.

  215. Corrected portmanteau tests for VAR models with time-varying variance.

    Authors: Valentin Patileaand Hamdi Raïssi
    Subjects: Methodology
    Abstract

    The problem of test of fit for Vector AutoRegressive (VAR) processes with
    unconditionally heteroscedastic errors is studied. The volatility structure is
    deterministic but time-varying and allows for changes that are commonly
    observed in economic or financial multivariate series. Our analysis is based on
    the residual autocovariances and autocorrelations obtained from Ordinary Least
    Squares (OLS), Generalized Least Squares (GLS)and Adaptive Least Squares (ALS)
    estimation of the autoregressive parameters.

  216. Recursive bias estimation for multivariate regression smoothers.

    Authors: P. A. Cornillon, N. Hengartner, E. Matzner-Løber
    Subjects: Methodology
    Abstract

    This paper presents a practical and simple fully nonparametric multivariate
    smoothing procedure that adapts to the underlying smoothness of the true
    regression function. Our estimator is easily computed by successive application
    of existing base smoothers (without the need of selecting an optimal smoothing
    parameter), such as thin-plate spline or kernel smoothers.

  217. Efficient adaptive designs with mid-course sample size adjustment in clinical trials.

    Authors: Jay Bartroff, Tze Leung Lai
    Subjects: Methodology
    Abstract

    Adaptive designs have been proposed for clinical trials in which the nuisance
    parameters or alternative of interest are unknown or likely to be misspecified
    before the trial. Whereas most previous works on adaptive designs and
    mid-course sample size re-estimation have focused on two-stage or group
    sequential designs in the normal case, we consider here a new approach that
    involves at most three stages and is developed in the general framework of
    multiparameter exponential families.

  218. Propensity Score Analysis with Matching Weights.

    Authors: Liang Li
    Subjects: Methodology
    Abstract

    The propensity score analysis is one of the most widely used methods for
    studying the causal treatment effect in observational studies. This paper
    studies treatment effect estimation with the method of matching weights. This
    method resembles propensity score matching but offers a number of new features
    including efficient estimation, rigorous variance calculation, simple
    asymptotics, statistical tests of balance, clearly identified target population
    with optimal sampling property, and no need for choosing matching algorithm and
    caliper size.

  219. Deconvolution of mixing time series on a graph.

    Authors: Edoardo M. Airoldi, Alexander W. Blocker
    Subjects: Methodology
    Abstract

    In many applications we are interested in making inference on latent time
    series from indirect measurements, which are often low-dimensional projections
    resulting from mixing or aggregation. Positron emission tomography,
    super-resolution, and network traffic monitoring are some examples. Inference
    in such settings requires solving a sequence of ill-posed inverse problems,
    y_t= A x_t, where the projection mechanism provides information on A. We
    consider problems in which A specifies mixing on a graph of times series that
    are bursty and sparse.

  220. Large-sample tests of extreme-value dependence for multivariate copulas.

    Authors: Johan Segers, Ivan Kojadinovic, Jun Yan
    Subjects: Methodology
    Abstract

    Starting from the characterization of extreme-value copulas based on
    max-stability, large-sample tests of extreme-value dependence for multivariate
    copulas are studied. The two key ingredients of the proposed tests are the
    empirical copula of the data and a multiplier technique for obtaining
    approximate p-values for the derived statistics. The asymptotic validity of the
    multiplier approach is established, and the finite-sample performance of a
    large number of candidate test statistics is studied through extensive Monte
    Carlo experiments for data sets of dimension two to five.

  221. A Bayesian Model of NMR Spectra for the Deconvolution and Quantification of Metabolites in Complex Biological Mixtures.

    Authors: Sylvia Richardson, Maria De Iorio, William Astle, David Stephens, Timothy Ebbels
    Subjects: Methodology
    Abstract

    Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to
    obtain profiles of metabolites dissolved in biofluids such as cell
    supernatants. Methods for estimating metabolite concentrations from these
    spectra are presently confined to manual peak fitting and to binning procedures
    for integrating resonance peaks. Extensive information on the patterns of
    spectral resonance generated by human metabolites is now available in online
    databases.

  222. A Poisson Mixed Model with Nonnormal Random Effect Distribution.

    Authors: Lizandra C. Fabio, Gilberto A. Paula, Mario de Castro
    Subjects: Methodology
    Abstract

    We propose in this paper a random intercept Poisson model in which the random
    effect distribution is assumed to follow a generalized log-gamma (GLG)
    distribution. We derive the first two moments for the marginal distribution as
    well as the intraclass correlation. Even though numerical integration methods
    are in general required for deriving the marginal models, we obtain the
    multivariate negative binomial model for a particular parameter setting of the
    hierarchical model.

  223. Multivariate convex regression with adaptive partitioning.

    Authors: Lauren A. Hannah, David B. Dunson
    Subjects: Methodology
    Abstract

    We propose a new, nonparametric method for multivariate regression subject to
    convexity or concavity constraints on the response function. Convexity
    constraints are common in economics, statistics, operations research and
    financial engineering, but there is currently no multivariate method that is
    computationally feasible for more than a few hundred observations. We introduce
    Convex Adaptive Partitioning (CAP), which creates a globally convex regression
    model from locally linear estimates fit on adaptively selected covariate
    partitions.

  224. Estimation of latent variable models for ordinal data via fully exponential Laplace approximation.

    Authors: Silvia Cagnone, Silvia Bianconcini
    Subjects: Methodology
    Abstract

    Latent variable models for ordinal data represent a useful tool in different
    fields of research in which the constructs of interest are not directly
    observable. In such models, problems related to the integration of the
    likelihood function can arise since analytical solutions do not exist.
    Numerical approximations, like the widely used Gauss Hermite (GH) quadrature,
    are generally applied to solve these problems. However, GH becomes unfeasible
    as the number of latent variables increases. Thus, alternative solutions have
    to be found.

  225. Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous.

    Authors: Zhanfeng Wang, Yuan-chin Ivan Chang
    Subjects: Methodology
    Abstract

    The receiver operating characteristic (ROC) curve is a very useful tool for
    analyzing the diagnostic/classification power of instruments/classification
    schemes as long as a binary-scale gold standard is available. When the gold
    standard is continuous and there is no confirmative threshold, ROC curve
    becomes less useful. Hence, there are several extensions proposed for
    evaluating the diagnostic potential of variables of interest.

  226. Pivotal Estimation of Nonparametric Functions via Square-root Lasso.

    Authors: Victor Chernozhukov, Alexandre Belloni, Lie Wang
    Subjects: Methodology
    Abstract

    In a nonparametric linear regression model we study a variant of LASSO,
    called square-root LASSO, which does not require the knowledge of the scaling
    parameter $\sigma$ of the noise or bounds for it. This work derives new finite
    sample upper bounds for prediction norm rate of convergence, $\ell_1$-rate of
    converge, $\ell_\infty$-rate of convergence, and sparsity of the square-root
    LASSO estimator. A lower bound for the prediction norm rate of convergence is
    also established.

  227. Metamodel-based importance sampling for structural reliability analysis.

    Authors: Vincent Dubourg, François Deheeger, Bruno Sudret
    Subjects: Methodology
    Abstract

    Structural reliability methods aim at computing the probability of failure of
    systems with respect to some prescribed performance functions. In modern
    engineering such functions usually resort to running an expensive-to-evaluate
    computational model (e.g. a finite element model). In this respect simulation
    methods, which may require $10^{3-6}$ runs cannot be used directly.

  228. Learning high-dimensional DAGs with latent and selection variables.

    Authors: Thomas S. Richardson, Marloes H. Maathuis, Markus Kalisch, Diego Colombo
    Subjects: Methodology
    Abstract

    We consider the problem of learning causal information between random
    variables in DAGs when allowing arbitrarily many latent and selection
    variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly
    designed to infer conditional independence and causal information in such
    settings. However, FCI is computationally infeasible for large graphs. We
    therefore propose a new algorithm, the RFCI algorithm, which is much faster
    than FCI. In some situations the output of RFCI is slightly less informative,
    in particular with respect to conditional independence information.

  229. Model Selection Consistency for Cointegrating Regressions.

    Authors: Eduardo F. Mendes
    Subjects: Methodology
    Abstract

    We study the asymptotic properties of the adaptive Lasso in cointegration
    regressions, in the case where all covariates are exogenous and the number of
    candidate variables is sub-linear with respect to the sample size but,
    possibly, larger.

  230. On the half-Cauchy prior for a global scale parameter.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Methodology
    Abstract

    We generalize the half-Cauchy prior for a global scale parameter to the wider
    class of hypergeometric inverted-beta priors. We derive expressions for
    posterior moments and marginal densities when these priors are used for a
    top-level normal variance in a Bayesian hierarchical model. Finally, we prove a
    result that characterizes the frequentist risk of the Bayes estimators under
    all priors in the class.

  231. Quantile Regression with Censoring and Endogeneity.

    Authors: Victor Chernozhukov, Ivan Fernandez-Val, Amanda Kowalski
    Subjects: Methodology
    Abstract

    In this paper, we develop a new censored quantile instrumental variable
    (CQIV) estimator and describe its properties and computation. The CQIV
    estimator combines Powell (1986) censored quantile regression (CQR) to deal
    semiparametrically with censoring, with a control variable approach to
    incorporate endogenous regressors. The CQIV estimator is obtained in two stages
    that are nonadditive in the unobservables. The first stage estimates a
    nonadditive model with infinite dimensional parameters for the control
    variable, such as a quantile or distribution regression model.

  232. Intent Inference and Syntactic Tracking with GMTI Measurements.

    Authors: Vikram Krishnamurthy, Alex Wang, Bhashyam Balaji
    Subjects: Methodology
    Abstract

    In conventional target tracking systems, human operators use the estimated
    target tracks to make higher level inference of the target behaviour/intent.
    This paper develops syntactic filtering algorithms that assist human operators
    by extracting spatial patterns from target tracks to identify
    suspicious/anomalous spatial trajectories. The targets' spatial trajectories
    are modeled by a stochastic context free grammar (SCFG) and a switched mode
    state space model.

  233. Testing the Equality of Covariance Operators in Functional Samples.

    Authors: Lajos Horváth, Piotr Kokoszka, Stefan Fremdt, Josef G. Steinebach
    Subjects: Methodology
    Abstract

    We propose a robust test for the equality of the covariance structures in two
    functional samples. The test statistic has a chi-square asymptotic distribution
    with a known number of degrees of freedom, which depends on the level of
    dimension reduction needed to represent the data. Detailed analysis of the
    asymptotic properties is developed. Finite sample performance is examined by a
    simulation study and an application to egg-laying curves of fruit flies.

  234. Nonparametric survival analysis of epidemic data.

    Authors: Eben Kenah
    Subjects: Methodology
    Abstract

    This paper develops nonparametric methods for the survival analysis of
    epidemic data based on contact intervals. The contact interval from person i to
    person j is the time between the onset of infectiousness in i and infectious
    contact from i to j, where we define infectious contact as a contact sufficient
    to infect a susceptible individual. We show that the Nelson-Aalen estimator
    produces an unbiased estimate of the contact interval cumulative hazard
    function when who-infects-whom is observed.

  235. A Generalization of the Skew-Normal Distribution: The Beta Skew-Normal.

    Authors: Valentina Mameli, Monica Musio
    Subjects: Methodology
    Abstract

    The aim of this article is to introduce a new family of distributions, which
    generalizes the skew normal distribution (SN). This new family, called Beta
    skew-normal (BSN), arises naturally when we consider the distributions of order
    statistics of the SN. The BSN can also be obtained as a special case of the
    Beta generated distribution (Jones (2004)).

  236. Posterior consistency in linear models under shrinkage priors.

    Authors: Waheed U. Bajwa, Artin Armagan, Jaeyong Lee, David B. Dunson
    Subjects: Methodology
    Abstract

    We investigate posterior consistency in linear models with a diverging number
    of parameters. We first propose a parameter-free multivariate generalized
    double Pareto distribution as a default prior choice that preserves some of the
    desired characteristics of a joint double exponential distribution with
    multivariate Cauchy-like tails. We give sufficient conditions for consistency
    when $p/n\rightarrow 0$ and then investigate the behavior of the posterior
    under normal, double exponential and multivariate generalized double Pareto
    priors.

  237. Reliability-based design optimization using kriging surrogates and subset simulation.

    Authors: V. Dubourg, B. Sudret, J.-M. Bourinet
    Subjects: Methodology
    Abstract

    The aim of the present paper is to develop a strategy for solving
    reliability-based design optimization (RBDO) problems that remains applicable
    when the performance models are expensive to evaluate. Starting with the
    premise that simulation-based approaches are not affordable for such problems,
    and that the most-probable-failure-point-based approaches do not permit to
    quantify the error on the estimation of the failure probability, an approach
    based on both metamodels and advanced simulation techniques is explored.

  238. Reliability-based design optimization of an imperfect submarine pressure hull.

    Authors: V. Dubourg, B. Sudret, J.-M. Bourinet, M. Cazuguel
    Subjects: Methodology
    Abstract

    Reliability-based design optimization (RBDO) has gained much attention in the
    past fifteen years as a way of introducing robustness in the process of
    designing structures and systems in an optimal manner. Indeed classical
    optimization (e.g. minimize some cost under mechanical constraints) usually
    leads to solutions that lie at the boundary of the admissible domain, and that
    are consequently rather sensitive to uncertainty in the design parameters. In
    contrast, RBDO aims at designing the system in a robust way by minimizing some
    cost function under reliability constraints.

  239. Metamodel-based importance sampling for the simulation of rare events.

    Authors: V. Dubourg, F. Deheeger, B. Sudret
    Subjects: Methodology
    Abstract

    In the field of structural reliability, the Monte-Carlo estimator is
    considered as the reference probability estimator. However, it is still
    untractable for real engineering cases since it requires a high number of runs
    of the model. In order to reduce the number of computer experiments, many other
    approaches known as reliability methods have been proposed. A certain approach
    consists in replacing the original experiment by a surrogate which is much
    faster to evaluate. Nevertheless, it is often difficult (or even impossible) to
    quantify the error made by this substitution.

  240. A Statistical Model to Explain the Mendel--Fisher Controversy.

    Authors: Ana M. Pires, João A. Branco
    Subjects: Methodology
    Abstract

    In 1866 Gregor Mendel published a seminal paper containing the foundations of
    modern genetics. In 1936 Ronald Fisher published a statistical analysis of
    Mendel's data concluding that "the data of most, if not all, of the experiments
    have been falsified so as to agree closely with Mendel's expectations." The
    accusation gave rise to a controversy which has reached the present time. There
    are reasonable grounds to assume that a certain unconscious bias was
    systematically introduced in Mendel's experimentation.

  241. Cluster Forests.

    Authors: Michael I. Jordan, Donghui Yan, Aiyou Chen
    Subjects: Methodology
    Abstract

    Inspired by Random Forests (RF) in the context of classification, we propose
    a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF
    randomly probes a high-dimensional data cloud to obtain "good local
    clusterings" and then aggregates via spectral clustering to obtain cluster
    assignments for the whole dataset. The search for good local clusterings is
    guided by a cluster quality measure $\kappa$. CF progressively improves each
    local clustering in a fashion that resembles the tree growth in RF.

  242. A frequentist two-sample test based on Bayesian model selection.

    Authors: Pietro Berkes, Jozsef Fiser
    Subjects: Methodology
    Abstract

    Despite their importance in supporting experimental conclusions, standard
    statistical tests are often inadequate for research areas, like the life
    sciences, where the typical sample size is small and the test assumptions
    difficult to verify. In such conditions, standard tests tend to be overly
    conservative, and fail thus to detect significant effects in the data. Here we
    define a novel statistical test for the two-sample problem.

  243. Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs.

    Authors: Peter Bühlmann, Alain Hauser
    Subjects: Methodology
    Abstract

    The investigation of directed acyclic graphs (DAGs) encoding the same Markov
    property, that is the same conditional independence relations of multivariate
    observational distributions, has a long tradition; many algorithms exist for
    model selection and structure learning in Markov equivalence classes. In this
    paper, we extend the notion of Markov equivalence of DAGs to the case of
    interventional distributions arising from multiple intervention experiments.

  244. Appropriate Methodology of Statistical Tests According to Prior Probability and Required Objectivity.

    Authors: Tomokazu Konishi
    Subjects: Methodology
    Abstract

    In contrast to its common definition and calculation, interpretation of
    p-values diverges among statisticians. Since p-value is the basis of various
    methodologies, this divergence has led to a variety of test methodologies and
    evaluations of test results. This chaotic situation has complicated the
    application of tests and decision processes. Here, the origin of the divergence
    is found in the prior probability of the test.

  245. Parameter Expansion and Efficient Inference.

    Authors: Chuanhai Liu, Andrew Lewandowski, Scott Vander Wiel
    Subjects: Methodology
    Abstract

    This EM review article focuses on parameter expansion, a simple technique
    introduced in the PX-EM algorithm to make EM converge faster while maintaining
    its simplicity and stability. The primary objective concerns the connection
    between parameter expansion and efficient inference. It reviews the statistical
    interpretation of the PX-EM algorithm, in terms of efficient inference via bias
    reduction, and further unfolds the PX-EM mystery by looking at PX-EM from
    different perspectives.

  246. Block-Conditional Missing at Random Models for Missing Data.

    Authors: John D. Kalbfleisch, Yan Zhou, Roderick J. A. Little
    Subjects: Methodology
    Abstract

    Two major ideas in the analysis of missing data are (a) the EM algorithm
    [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for
    maximum likelihood (ML) estimation, and (b) the formulation of models for the
    joint distribution of the data ${Z}$ and missing data indicators ${M}$, and
    associated "missing at random"; (MAR) condition under which a model for ${M}$
    is unnecessary [Rubin, Biometrika 63 (1976) 581--592].

  247. From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s.

    Authors: Wing H. Wong, Martin A. Tanner
    Subjects: Methodology
    Abstract

    It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087--1092]
    that one can sample from a distribution by performing Monte Carlo simulation
    from a Markov chain whose equilibrium distribution is equal to the target
    distribution. However, it took several decades before the statistical community
    embraced Markov chain Monte Carlo (MCMC) as a general computational tool in
    Bayesian inference.

  248. The MM Alternative to EM.

    Authors: Kenneth Lange, Tong Tong Wu
    Subjects: Methodology
    Abstract

    The EM algorithm is a special case of a more general algorithm called the MM
    algorithm. Specific MM algorithms often have nothing to do with missing data.
    The first M step of an MM algorithm creates a surrogate function that is
    optimized in the second M step. In minimization, MM stands for
    majorize--minimize; in maximization, it stands for minorize--maximize. This
    two-step process always drives the objective function in the right direction.
    Construction of MM algorithms relies on recognizing and manipulating
    inequalities rather than calculating conditional expectations.

  249. Modulated oscillations in three dimensions.

    Authors: Jonathan M. Lilly
    Subjects: Methodology
    Abstract

    The analysis of the fully three-dimensional and time-varying polarization
    characteristics of a modulated trivariate, or three-component, oscillation is
    addressed. The use of the analytic operator enables the instantaneous
    three-dimensional polarization state of any square-integrable trivariate signal
    to be uniquely defined. Straightforward expressions are given which permit the
    ellipse parameters to be recovered from data. The notions of instantaneous
    frequency and instantaneous bandwidth, generalized to the trivariate case, are
    related to variations in the ellipse properties.

  250. Analysis of Modulated Multivariate Oscillations.

    Authors: Sofia C. Olhede, Jonathan M. Lilly
    Subjects: Methodology
    Abstract

    The concept of a common modulated oscillation spanning multiple time series
    is formalized, a method for the recovery of such a signal from potentially
    noisy observations is proposed, and the time-varying bias properties of the
    recovery method are derived. The method, an extension of wavelet ridge analysis
    to the multivariate case, identifies the common oscillation by seeking, at each
    point in time, a frequency for which a bandpassed version of the signal obtains
    a local maximum in power.

  251. Learning Active Basis Models by EM-Type Algorithms.

    Authors: Zhangzhang Si, Haifeng Gong, Song-Chun Zhu, Ying Nian Wu
    Subjects: Methodology
    Abstract

    EM algorithm is a convenient tool for maximum likelihood model fitting when
    the data are incomplete or when there are latent variables or hidden states. In
    this review article we explain that EM algorithm is a natural computational
    scheme for learning image templates of object categories where the learning is
    not fully supervised.

  252. The EM Algorithm and the Rise of Computational Biology.

    Authors: Jun S. Liu, Yuan Yuan, Xiaodan Fan
    Subjects: Methodology
    Abstract

    In the past decade computational biology has grown from a cottage industry
    with a handful of researchers to an attractive interdisciplinary field,
    catching the attention and imagination of many quantitatively-minded
    scientists. Of interest to us is the key role played by the EM algorithm during
    this transformation. We survey the use of the EM algorithm in a few important
    computational biology problems surrounding the "central dogma"; of molecular
    biology: from DNA to RNA and then to proteins.

  253. The EM Algorithm in Genetics, Genomics and Public Health.

    Authors: Nan M. Laird
    Subjects: Methodology
    Abstract

    The popularity of the EM algorithm owes much to the 1977 paper by Dempster,
    Laird and Rubin. That paper gave the algorithm its name, identified the general
    form and some key properties of the algorithm and established its broad
    applicability in scientific research. This review gives a nontechnical
    introduction to the algorithm for a general scientific audience, and presents a
    few examples characteristic of its application.

  254. Cross-Fertilizing Strategies for Better EM Mountain Climbing and DA Field Exploration: A Graphical Guide Book.

    Authors: Xiao-Li Meng, David A. van Dyk
    Subjects: Methodology
    Abstract

    In recent years, a variety of extensions and refinements have been developed
    for data augmentation based model fitting routines. These developments aim to
    extend the application, improve the speed and/or simplify the implementation of
    data augmentation methods, such as the deterministic EM algorithm for mode
    finding and stochastic Gibbs sampler and other auxiliary-variable based methods
    for posterior sampling.

  255. Locally Adaptive Density Estimation on the Unit Sphere Using Needlets.

    Authors: Andre Kueh
    Subjects: Methodology
    Abstract

    The problem of estimating a probability density function f on the
    (d-1)-dimensional unit sphere S^{d-1} from directional data using the needlet
    frame is considered. It is shown that the decay of needlet coefficients
    supported near a point of a function f depends only on local H\"{o}lder
    continuity properties of f at x. This is then used to show that the thresholded
    needlet estimator introduced in Baldi, Kerkyacharian, Marinucci and Picard
    adapts to the local regularity properties of f.

  256. Generalized Isotonic Regression.

    Authors: Ronny Luss, Saharon Rosset
    Subjects: Methodology
    Abstract

    We present a computational and statistical approach for fitting isotonic
    models under convex differentiable loss functions. We offer a recursive
    partitioning algorithm which provably and efficiently solves isotonic
    regression under any such loss function. Models along the partitioning path are
    also isotonic and can be viewed as regularized solutions to the problem.

  257. Generalized double Pareto shrinkage.

    Authors: Artin Armagan, David Dunson, Jaeyong Lee
    Subjects: Methodology
    Abstract

    We propose a generalized double Pareto prior for Bayesian shrinkage
    estimation and inferences in linear models. The prior can be obtained via a
    scale mixture of Laplace or normal distributions, while forming a bridge
    between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero
    like the Laplace density, it also has a Student-t-like tail behavior. Bayesian
    computation is straightforward via a simple Gibbs sampling algorithm.

  258. Comparison of Weibull tail-coefficient estimators.

    Authors: Stéphane Girard, Laurent Gardes
    Subjects: Methodology
    Abstract

    We address the problem of estimating the Weibull tail-coefficient which is
    the regular variation exponent of the inverse failure rate function. We propose
    a family of estimators of this coefficient and an associate extreme quantile
    estimator. Their asymptotic normality are established and their asymptotic
    mean-square errors are compared. The results are illustrated on some finite
    sample situations.

  259. Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    By constraining the possible values of the proportion of null hypotheses that
    are true, the local false discovery rate (LFDR) can be estimated using as few
    as one comparison. The proportion of proteins with equivalent abundance was
    estimated to be about 20% for patient group I and about 90% for group II. The
    simultaneously-estimated LFDRs give approximately the same inferences as
    individual-protein confidence levels for group I but are much closer to
    individual-protein LFDR estimates for group II.

  260. On matrix variance inequalities.

    Authors: G. Afendras, N. Papadatos
    Subjects: Methodology
    Abstract

    Olkin and Shepp (2005, J. Statist. Plann. Inference, vol. 130, pp. 351--358)
    presented a matrix form of Chernoff's inequality for Normal and Gamma
    (univariate) distributions. We extend and generalize this result, proving
    Poincare-type and Bessel-type inequalities, for matrices of arbitrary order and
    for a large class of distributions.

  261. Regularization via Data Augmentation.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Methodology
    Abstract

    In this paper we provide a data-augmentation scheme that unifies many common
    regularized estimators into a single class. This leads to simple algorithms
    based on iterative least squares for fitting models involving arbitrary
    combinations of likelihood and penalty functions within the class. The class
    itself is quite large: for example, it includes quantile regression, support
    vector machines, and logistic and multinomial logistic regression, along with
    the usual ridge regression, lasso, bridge estimators, and regression with
    heavy-tailed errors.

  262. Interpretable Clustering using Unsupervised Binary Trees.

    Authors: Ricardo Fraiman, Badih Ghattas, Marcela Svarc
    Subjects: Methodology
    Abstract

    We herein introduce a new method of interpretable clustering that uses
    unsupervised binary trees. It is a three-stage procedure, the first stage of
    which entails a series of recursive binary splits to reduce the heterogeneity
    of the data within the new subsamples. During the second stage (pruning),
    consideration is given to whether adjacent nodes can be aggregated. Finally,
    during the third stage (joining), similar clusters are joined together, even if
    they do not descend from the same node originally.

  263. Logistic Network Regression for Scalable Analysis of Networks with Joint Edge/Vertex Dynamics.

    Authors: Zack W. Almquist, Carter T. Butts
    Subjects: Methodology
    Abstract

    Network dynamics may be viewed as a process of change in the edge structure
    of a network, in the vertex set on which edges are defined, or in both
    simultaneously. Though early studies of such processes were primarily
    descriptive, recent work on this topic has increasingly turned to formal
    statistical models. While showing great promise, many of these modern dynamic
    models are computationally intensive and scale very poorly in the size of the
    network under study and/or the number of time points considered.

  264. ASN-Minimax double sampling plans by variables for two-sided specification limits when {\sigma} is unknown.

    Authors: Eno Vangjeli
    Subjects: Methodology
    Abstract

    ASN-minimax double sampling plans by variables for a normally distributed
    quality characteristic with unknown standard deviation and two-sided
    specification limits are introduced. The plans base on the essentially
    Maximum-Likelihood (ML) estimator p* and the Minimum Variance Unbiased (MVU)
    estimator ^p of the fraction defective p. The operating characteristic (OC) for
    the plans is determined by using the independent random variables p*_1, p*_2
    and ^p_1, ^p_2, which relate to the first and second samples, respectively.

  265. Coupling optional P\'olya trees and the two sample problem.

    Authors: Li Ma, Wing H. Wong
    Subjects: Methodology
    Abstract

    Testing and characterizing the difference between two data samples is of
    fundamental interest in statistics. Existing methods such as Kolmogorov-Smirnov
    and Cramer-von-Mises tests do not scale well as the dimensionality increases
    and provides no easy way to characterize the difference should it exist.

  266. Learning the Ambiguity function.

    Authors: Sofia Olhede
    Subjects: Methodology
    Abstract

    This paper introduces the class of ambiguity sparse processes, containing
    subsets of popular nonstationary time series such as locally stationary,
    cyclostationary and uniformly modulated processes. The class also contains
    aggregations of the aforementioned processes. Ambiguity sparse processes are
    defined for a fixed sampling regime, in terms of a given number of sample
    points and a fixed sampling period.

  267. 4D Wavelet-Based Regularization for Parallel MRI Reconstruction: Impact on Subject and Group-Levels Statistical Sensitivity in fMRI.

    Authors: Lotfi Chaari, Jean-Christophe Pesquet, Philippe Ciuciu, Sébastien Mériaux, Solveig Badillo
    Subjects: Methodology
    Abstract

    Parallel MRI is a fast imaging technique that enables the acquisition of
    highly resolved images in space. It relies on $k$-space undersampling and
    multiple receiver coils with complementary sensitivity profiles in order to
    reconstruct a full Field-Of-View (FOV) image. The performance of parallel
    imaging mainly depends on the reconstruction algorithm, which can proceed
    either in the original $k$-space (GRAPPA, SMASH) or in the image domain
    (SENSE-like methods).

  268. Interaction patterns of brain activity across space, time and frequency. Part I: methods.

    Authors: Roberto D. Pascual-Marqui, Rolando J. Biscay-Lirio
    Subjects: Methodology
    Abstract

    We consider exploratory methods for the discovery of cortical functional
    connectivity. Typically, data for the i-th subject (i=1...NS) is represented as
    an NVxNT matrix Xi, corresponding to brain activity sampled at NT moments in
    time from NV cortical voxels. A widely used method of analysis first
    concatenates all subjects along the temporal dimension, and then performs an
    independent component analysis (ICA) for estimating the common cortical
    patterns of functional connectivity. There exist many other interesting
    variations of this technique, as reviewed in [Calhoun et al.

  269. Sparsity with sign-coherent groups of variables via the cooperative-Lasso.

    Authors: Camille Charbonnier, Julien Chiquet, Yves Grandvalet
    Subjects: Methodology
    Abstract

    We consider the problems of estimation and selection of parameters endowed
    with a known group structure, when the groups are assumed to be sign-coherent,
    that is, gathering either non-negative, non-positive or null parameters. To
    tackle this problem we propose a new penalty that we call the cooperative-Lasso
    penalty. We derive the optimality conditions defining the cooperative-Lasso
    estimate for generalized linear models and propose an efficient active set
    algorithm suited to high-dimensional problems.

  270. Sequences of regressions and their independences.

    Authors: Nanny Wermuth, Kayvan Sadeghi
    Subjects: Methodology
    Abstract

    Ordered sequences of univariate or multivariate regressions provide
    statistical models for analysing data from randomized, possibly sequential
    interventions, from cohort or multi-wave panel studies, but also from
    cross-sectional or retrospective studies. Conditional independences are
    captured by what we name regression graphs, provided the generated distribution
    shares some properties with a joint Gaussian distribution.

  271. Dynamic Functional Regression.

    Authors: Daniel Gervini
    Subjects: Methodology
    Abstract

    A characteristic feature of samples of curves is the presence of time
    variability in addition to amplitude variability. The existing functional
    regression methods do not handle time variability in an efficient manner. We
    propose in this paper a regression method that incorporates time warping as an
    intrinsic part of the model. In this way, the method attains a high predictive
    power in a parsimonious and efficient manner, avoiding overfitting and
    simplifying statistical inference.

  272. A general class of zero-or-one inflated beta regression models.

    Authors: Raydonal Ospina, Silvia L. P. Ferrari
    Subjects: Methodology
    Abstract

    This paper proposes a general class of regression models for continuous
    proportions when the data contain zeros or ones. The proposed class of models
    assumes that the response variable has a mixed continuous-discrete distribution
    with probability mass at zero or one. The beta distribution is used to describe
    the continuous component of the model, since its density has a wide range of
    different shapes depending on the values of the two parameters that index the
    distribution. We use a suitable parameterization of the beta law in terms of
    its mean and a precision parameter.

  273. A smooth ROC curve estimator based on log-concave density estimates.

    Authors: Kaspar Rufibach
    Subjects: Methodology
    Abstract

    We introduce a new smooth estimator of the ROC curve based on log-concave
    density estimates of the constituent distributions. We show that our estimate
    is asymptotically equivalent to the empirical ROC curve.

  274. Group Lasso for high dimensional sparse quantile regression models.

    Authors: Kengo Kato
    Subjects: Methodology
    Abstract

    This paper studies the statistical properties of the group Lasso estimator
    for high dimensional sparse quantile regression models where the number of
    explanatory variables (or the number of groups of explanatory variables) is
    possibly much larger than the sample size while the number of variables in
    "active" groups is sufficiently small. We establish a non-asymptotic bound on
    the $\ell_{2}$-estimation error of the estimator.

  275. Application of Mathematical Optimization Procedures to Intervention Effects in Structural Equation Models.

    Authors: Atsushi Yagishita, Kentaro Tanaka, Masami Miyakawa
    Subjects: Methodology
    Abstract

    For a given statistical model, it often happens that it is necessary to
    intervene the model to reduce the variances of the output variables. In
    structural equation models, this can be done by changing the value of the path
    coefficients by intervention. First, we explain that the expectations and
    variance matrix can be decomposed into several parts in terms of the total
    effects. Then, we show that an algorithm to obtain the intervention method
    which minimizes the weighted sum of the variances can be formulated as a convex
    quadratic programming.

  276. Markov chain Monte Carlo for exact inference for diffusions.

    Authors: Paul Fearnhead, Omiros Papaspiliopoulos, Gareth O. Roberts, Giorgos Sermaidis, Alex Beskos
    Subjects: Methodology
    Abstract

    We develop exact Markov chain Monte Carlo methods for discretely-sampled,
    directly and indirectly observed diffusions. The qualification "exact" refers
    to the fact that the invariant and limiting distribution of the Markov chains
    is the exact posterior distribution of the parameters of interest. The class of
    processes to which our methods directly apply are those which can be simulated
    using the most general to date exact simulation algorithm. The article
    introduces various methods to boost the performance of the basic scheme,
    including reparametrizations and auxiliary Poisson sampling.

  277. Isotonic Recursive Partitioning.

    Authors: Ronny Luss, Saharon Rosset, Moni Shahar
    Subjects: Methodology
    Abstract

    Isotonic regression is a nonparametric approach for fitting monotonic models
    to data that has been widely studied from both theoretical and practical
    perspectives. However, this approach encounters computational and statistical
    overfitting issues in higher dimensions. To address both concerns we present an
    algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
    regression based on recursively partitioning the covariate space through
    solution of progressively smaller "best cut" subproblems.

  278. Testing for change in mean of heteroskedastic time series.

    Authors: Mohamed Boutahar
    Subjects: Methodology
    Abstract

    In this paper we consider a Lagrange Multiplier-type test (LM) to detect
    change in the mean of time series with heteroskedasticity of unknown form. We
    derive the limiting distribution under the null, and prove the consistency of
    the test against the alternative of either an abrupt or smooth changes in the
    mean. We perform also some Monte Carlo simulations to analyze the size
    distortion and the power of the proposed test. We conclude that for moderate
    sample size, the test has a good performance.

  279. Relational models for contingency tables.

    Authors: Adrian Dobra, Anna Klimova, Tamás Rudas
    Subjects: Methodology
    Abstract

    The paper considers general multiplicative models for complete and incomplete
    contingency tables that generalize log-linear and several other models and are
    entirely coordinate free. Sufficient conditions of the existence of maximum
    likelihood estimates under these models are given, and it is shown that the
    usual equivalence between multinomial and Poisson likelihoods holds if and only
    if an overall effect is present in the model.

  280. Causality as a unifying approach between activation and connectivity analysis of fMRI data.

    Authors: Nevio Dubbini
    Subjects: Methodology
    Abstract

    This paper indicates causality as the tool that unifies the analysis of both
    activations and connectivity of brain areas, obtained with fMRI data. Causality
    analysis is commonly applied to study connectivity, so this work focuses on
    demonstrating that also the detection of activations can be handled with a
    causality analysis. We test our method on finger tapping data, in which GLM and
    Granger Causality approaches are compared in finding activations. Granger
    causality not only performs the task well, but indeed we obtained a better
    localization (i.e. precision) of activations.

  281. Selection models with monotone weight functions in meta analysis.

    Authors: Kaspar Rufibach
    Subjects: Methodology
    Abstract

    Publication bias, the fact that studies identified for inclusion in a meta
    analysis do not represent all studies on the topic of interest, is commonly
    recognized as a threat to the validity of the results of a meta analysis. One
    way to explicitly model publication bias is via selection models or weighted
    probability distributions. We adopt the nonparametric approach initially
    introduced by Dear (1992) but impose that the weight function $w$ is monotonely
    non-increasing as a function of the $p$-value.

  282. Semi-supervised logistic discrimination for functional data.

    Authors: Shuichi Kawano, Sadanori Konishi
    Subjects: Methodology
    Abstract

    Multi-class classification methods based on both labeled and unlabeled
    functional data sets are discussed. We present semi-supervised logistic models
    for classification in the context of functional data analysis. Unknown
    parameters in our proposed models are estimated by regularization with the help
    of EM algorithm. Crucial points in modeling procedure are the choices of
    regularization parameter involved in the semi-supervised functional logistic
    models. In order to select the adjusted parameter, we introduce model selection
    criteria from information-theoretic and Bayesian viewpoints.

  283. Lack of confidence in ABC model choice.

    Authors: Christian P. Robert, Jean-Michel Marin, Jean-Marie Cornuet, Natesh Pillai
    Subjects: Methodology
    Abstract

    Approximate Bayesian computation (ABC) have become a essential tool for the
    analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated
    the use of ABC for Bayesian model choice in the specific case of Gibbs random
    fields, relying on a inter-model sufficiency property to show that the
    approximation was legitimate.

  284. Missing Data Imputation and Corrected Statistics for Large-Scale Behavioral Databases.

    Authors: Pierre Courrieu, Arnaud Rey
    Subjects: Methodology
    Abstract

    This paper presents a new methodology to solve problems resulting from
    missing data in large-scale item performance behavioral databases. Useful
    statistics corrected for missing data are described, and a new method of
    imputation for missing data is proposed. This methodology is applied to the DLP
    database recently published by Keuleers et al. (2010), which allows us to
    conclude that this database fulfills the conditions of use of the method
    recently proposed by Courrieu et al.

  285. A Conversation with Myles Hollander.

    Authors: Francisco J. Samaniego
    Subjects: Methodology
    Abstract

    Myles Hollander was born in Brooklyn, New York, on March 21, 1941. He
    graduated from Carnegie Mellon University in 1961 with a B.S. in mathematics.
    In the fall of 1961, he entered the Department of Statistics, Stanford
    University, earning his M.S. in statistics in 1962 and his Ph.D. in statistics
    in 1965. He joined the Department of Statistics, Florida State University in
    1965 and retired on May 31, 2007, after 42 years of service. He was department
    chair for nine years 1978-1981, 1999-2005. He was named Professor Emeritus at
    Florida State upon retirement in 2007.

  286. Handling Covariates in the Design of Clinical Trials.

    Authors: William F. Rosenberger, Oleksandr Sverdlov
    Subjects: Methodology
    Abstract

    There has been a split in the statistics community about the need for taking
    covariates into account in the design phase of a clinical trial. There are many
    advocates of using stratification and covariate-adaptive randomization to
    promote balance on certain known covariates. However, balance does not always
    promote efficiency or ensure more patients are assigned to the better
    treatment.

  287. Multiway Spectral Clustering: A Margin-Based Perspective.

    Authors: Michael I. Jordan, Zhihua Zhang
    Subjects: Methodology
    Abstract

    Spectral clustering is a broad class of clustering procedures in which an
    intractable combinatorial optimization formulation of clustering is "relaxed"
    into a tractable eigenvector problem, and in which the relaxed solution is
    subsequently "rounded" into an approximate discrete solution to the original
    problem. In this paper we present a novel margin-based perspective on multiway
    spectral clustering.

  288. Stochastic Approximation and Newton's Estimate of a Mixing Distribution.

    Authors: Ryan Martin, Jayanta K. Ghosh
    Subjects: Methodology
    Abstract

    Many statistical problems involve mixture models and the need for
    computationally efficient methods to estimate the mixing distribution has
    increased dramatically in recent years. Newton [Sankhya Ser. A 64 (2002)
    306--322] proposed a fast recursive algorithm for estimating the mixing
    distribution, which we study as a special case of stochastic approximation
    (SA). We begin with a review of SA, some recent statistical applications, and
    the theory necessary for analysis of a SA algorithm, which includes Lyapunov
    functions and ODE stability theory.

  289. Spectral estimation of the L\'evy density in partially observed affine models.

    Authors: Denis Belomestny
    Subjects: Methodology
    Abstract

    The problem of estimating the L\'evy density of a partially observed
    multidimensional affine process from low-frequency and mixed-frequency data is
    considered. The estimation methodology is based on the log-affine
    representation of the conditional characteristic function of an affine process
    and local linear smoothing in time. We derive almost sure uniform rates of
    convergence for the estimated L\'evy density both in mixed-frequency and
    low-frequency setups and prove that these rates are optimal in the minimax
    sense.

  290. A Generalized Least Squares Matrix Decomposition.

    Authors: Genevera I. Allen, Jonathan Taylor, Logan Grosenick
    Subjects: Methodology
    Abstract

    Variables in high-dimensional data sets common in neuroimaging, spatial
    statistics, time series and genomics often exhibit complex dependencies.
    Conventional multivariate analysis techniques often ignore these relationships,
    that arise, for example, from spatial and/or temporal processes or network
    structures. We propose a generalization of the singular value decomposition
    that is appropriate for transposable matrix data, or data in which neither the
    rows nor the columns can be considered independent instances.

  291. Rejoinder: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies.

    Authors: Xiao-Li Meng, Dan L. Nicolae, Augustine Kong
    Subjects: Methodology
    Abstract

    Rejoinder to "Quantifying the Fraction of Missing Information for Hypothesis
    Testing in Statistical and Genetic Studies" [arXiv:1102.2774]

  292. Comment: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies.

    Authors: Shaw-Hwa Lo, Tian Zheng
    Subjects: Methodology
    Abstract

    Comment on "Quantifying the Fraction of Missing Information for Hypothesis
    Testing in Statistical and Genetic Studies" [arXiv:1102.2774]

  293. Comment: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies.

    Authors: Li-Chu Chien, I-Shou Chang, Chao A. Hsiung, Chung-Hsing Chen
    Subjects: Methodology
    Abstract

    Comment on "Quantifying the Fraction of Missing Information for Hypothesis
    Testing in Statistical and Genetic Studies" [arXiv:1102.2774]

  294. Comment: Quantifying Information Loss in Survival Studies.

    Authors: Hani Doss
    Subjects: Methodology
    Abstract

    Comment on "Quantifying the Fraction of Missing Information for Hypothesis
    Testing in Statistical and Genetic Studies" [arXiv:1102.2774]

  295. Compatibility of Prior Specifications Across Linear Models.

    Authors: Guido Consonni, Piero Veronese
    Subjects: Methodology
    Abstract

    Bayesian model comparison requires the specification of a prior distribution
    on the parameter space of each candidate model. In this connection two concerns
    arise: on the one hand the elicitation task rapidly becomes prohibitive as the
    number of models increases; on the other hand numerous prior specifications can
    only exacerbate the well-known sensitivity to prior assignments, thus producing
    less dependable conclusions.

  296. Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies.

    Authors: Xiao-Li Meng, Dan L. Nicolae, Augustine Kong
    Subjects: Methodology
    Abstract

    Many practical studies rely on hypothesis testing procedures applied to data
    sets with missing information. An important part of the analysis is to
    determine the impact of the missing data on the performance of the test, and
    this can be done by properly quantifying the relative (to complete data) amount
    of available information. The problem is directly motivated by applications to
    studies, such as linkage analyses and haplotype-based association projects,
    designed to identify genetic contributions to complex diseases.

  297. A Hierarchical Model for Aggregated Functional Data.

    Authors: Nancy L. Garcia, Ronaldo Dias, Alexandra M. Schmidt
    Subjects: Methodology
    Abstract

    In many areas of science one aims to estimate latent sub-population mean
    curves based only on observations of aggregated population curves. By
    aggregated curves we mean linear combination of functional data that cannot be
    observed individually. We assume that several aggregated curves with linear
    independent coefficients are available. More specifically, we assume each
    aggregated curve is an independent partial realization of a Gaussian process
    with mean modeled through a weighted linear combination of the disaggregated
    curves.

  298. Modelling time to event with observations made at arbitrary times.

    Authors: Matthew Sperrin, Iain Buchan
    Subjects: Methodology
    Abstract

    We introduce new methods of analysing time to event data via extended
    versions of the proportional hazards and accelerated failure time (AFT) models.
    In many time to event studies, the time of first observation is arbitrary, in
    the sense that no risk modifying event occurs. This is particularly common in
    epidemiological studies. We show formally that, in these situations, it is not
    sensible to take the first observation as the time origin, either in AFT or
    proportional hazards type models. Instead, we advocate using age of the subject
    as the time scale.

  299. Adaptive Thresholding for Sparse Covariance Matrix Estimation.

    Authors: Weidong Liu, Tony Cai
    Subjects: Methodology
    Abstract

    In this paper we consider estimation of sparse covariance matrices and
    propose a thresholding procedure which is adaptive to the variability of
    individual entries. The estimators are fully data driven and enjoy excellent
    performance both theoretically and numerically. It is shown that the estimators
    adaptively achieve the optimal rate of convergence over a large class of sparse
    covariance matrices under the spectral norm. In contrast, the commonly used
    universal thresholding estimators are shown to be sub-optimal over the same
    parameter spaces. Support recovery is also discussed.

  300. A Constrained L1 Minimization Approach to Sparse Precision Matrix Estimation.

    Authors: Weidong Liu, Tony Cai, Xi Luo
    Subjects: Methodology
    Abstract

    A constrained L1 minimization method is proposed for estimating a sparse
    inverse covariance matrix based on a sample of $n$ iid $p$-variate random
    variables. The resulting estimator is shown to enjoy a number of desirable
    properties. In particular, it is shown that the rate of convergence between the
    estimator and the true $s$-sparse precision matrix under the spectral norm is
    $s\sqrt{\log p/n}$ when the population distribution has either exponential-type
    tails or polynomial-type tails. Convergence rates under the elementwise
    $L_{\infty}$ norm and Frobenius norm are also presented.

  301. Robust Retrospective Multiple Change-point Estimation for Multivariate Data.

    Authors: Olivier Cappé, Alexandre Lung-Yut-Fong, Céline Lévy-Leduc
    Subjects: Methodology
    Abstract

    We propose a non-parametric statistical procedure for detecting multiple
    change-points in multidimensional signals. The method is based on a test
    statistic that generalizes the well-known Kruskal-Wallis procedure to the
    multivariate setting. The proposed approach does not require any knowledge
    about the distribution of the observations and is parameter-free. It is
    computationally efficient thanks to the use of dynamic programming and can also
    be applied when the number of change-points is unknown.

  302. Asymptotically optimal parameter estimation under quantization constraints.

    Authors: Georgios Fellouris
    Subjects: Methodology
    Abstract

    The problem of decentralized parameter estimation is considered for
    diffusion-type processes whose drift coefficients are linear with respect to
    the unknown parameter. This problem is motivated by applications where remote
    sensors observe coupled stochastic processes and transmit quantized versions of
    their data to a fusion center, for the latter to take the final decision. Novel
    decentralized estimation schemes are suggested, according to which the sensors
    communicate at two-sided exit times of appropriate sufficient statistics.

  303. Ridge parameter for g-prior distribution in Probit mixed model.

    Authors: Meili Baragatti, Denys Pommeret
    Subjects: Methodology
    Abstract

    In the Bayesian variable selection framework, a common prior distribution for
    the regression coefficients is the g-prior of Zellner (1986). However, there
    are two standard cases in which the associated covariance matrix does not
    exist, and the conventional prior of Zellner can not be used: if the number of
    observations is lower than the number of variables (large p and small n
    paradigm), or if some variables are linear combinations of others. In such
    situations we propose a prior distribution derived from the prior of Zellner,
    by introducing a ridge parameter.

  304. Fisher information matrix for three-parameter exponentiated-Weibull distribution under type II censoring.

    Authors: Lianfen Qian
    Subjects: Methodology
    Abstract

    This paper considers the three-parameter exponentiated Weibull family under
    type II censoring. It first graphically illustrates the shape property of the
    hazard function. Then, it proposes a simple algorithm for computing the maximum
    likelihood estimator and derives the Fisher information matrix. The latter one
    is represented through a single integral in terms of hazard function, hence it
    solves the problem of computation difficulty in constructing inference for the
    maximum likelihood estimator.

  305. Statistical Methods for Analyzing Tissue Microarray Images - Algorithmic Scoring and Co-training.

    Authors: Pei Wang, Donghui Yan, Beatrice S. Knudsen, Michael Linden, Timothy W. Randolph
    Subjects: Methodology
    Abstract

    Recent advances in tissue microarray technology have allowed
    immunohistochemistry to become a powerful medium-to-high throughput analysis
    tool, particularly for the validation of diagnostic and prognostic biomarkers.
    However, as study size grows, the manual evaluation of these assays becomes a
    prohibitive limitation; it vastly reduces throughput and greatly increases
    variability and expense. We propose an algorithm - Tissue Array Co-Occurrence
    Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on
    textural regularity summarized by local inter-pixel relationships.

  306. Recursive $\ell_{1,\infty}$ Group lasso.

    Authors: Alfred O. Hero III, Yilun Chen
    Subjects: Methodology
    Abstract

    We introduce a recursive adaptive group lasso algorithm for real-time
    penalized least squares prediction that produces a time sequence of optimal
    sparse predictor coefficient vectors. At each time index the proposed algorithm
    computes an exact update of the optimal $\ell_{1,\infty}$-penalized recursive
    least squares (RLS) predictor. Each update minimizes a convex but
    nondifferentiable function optimization problem. We develop an online homotopy
    method to reduce the computational complexity.

  307. Clustering functional data using wavelets.

    Authors: Jean-Michel Poggi, Anestis Antoniadis, Xavier Brosat, Jairo Cugliari
    Subjects: Methodology
    Abstract

    We present two methods for detecting patterns and clusters in high
    dimensional time-dependent functional data. Our methods are based on
    wavelet-based similarity measures, since wavelets are well suited for
    identifying highly discriminant local time and scale features. The
    multiresolution aspect of the wavelet transform provides a time-scale
    decomposition of the signals allowing to visualize and to cluster the
    functional data into homogeneous groups.

  308. Parallel Tempering with Equi-Energy Moves.

    Authors: Meili Baragatti, Agnès Grimaud, Denys Pommeret
    Subjects: Methodology
    Abstract

    The Equi-Energy Sampler (EES) introduced by Kou et al. [2006] is based on a
    population of chains which are updated by local moves and equi-energy jumps.
    This algorithm has been developed to facilitate global moves between the
    different chains, resulting in a good exploration of the states space by the
    target chain. This method seems to be more efficient than the classical
    Parallel Tempering (PT) algorithm. However it necessitates increased storage
    and the convergence of the original EES is not guaranteed (see Andrieu et al.
    [2008]).

  309. Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection.

    Authors: Meili Baragatti
    Subjects: Methodology
    Abstract

    In computational biology, gene expression datasets are characterized by very
    few individual samples compared to a large number of measurements per sample.
    Thus, it is appealing to merge these datasets in order to increase the number
    of observations and diversify the data, allowing a more reliable selection of
    genes relevant to the biological problem. Besides, the increased size of a
    merged dataset facilitates its re-splitting into training and validation sets.
    This necessitates the introduction of the dataset as a random effect. In this
    context, extending a work of Lee et al.

  310. Inferences in Bayesian variable selection problems with large model spaces.

    Authors: Gonzalo Garcia-Donato, Miguel Angel Martinez-Beneito
    Subjects: Methodology
    Abstract

    An important aspect of Bayesian model selection is how to deal with huge
    model spaces, since exhaustive enumeration of all the models entertained is
    unfeasible and inferences have to be based on the very small proportion of
    models visited. This is the case for the variable selection problem, with a
    moderate to large number of possible explanatory variables being considered in
    this paper.

  311. A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups.

    Authors: Karen Lostritto, Robert Strawderman, Annette Molinaro
    Subjects: Methodology
    Abstract

    One approach to assessing a patient's risk of a given event is to stratify
    patients into two or more distinct risk groups using both clinical and
    demographic variables. Outcomes may be categorical or continuous in nature;
    important examples in cancer studies might include level of toxicity or time to
    recurrence. Recursive partitioning methods are ideal for building such risk
    groups. Two such methods are Classification and Regression Trees (CART) and a
    more recent competitor known as the "partitioning
    Deletion/Substitution/Addition" (partDSA) algorithm.

  312. A Novel Approach for Fast Detection of Multiple Change Points in Linear Models.

    Authors: Yuehua Wu, Xiaoping Shi, Baisuo Jin
    Subjects: Methodology
    Abstract

    A change point problem occurs in many statistical applications. If there
    exist change points in a model, it is harmful to make a statistical analysis
    without any consideration of the existence of the change points and the results
    derived from such an analysis may be misleading. There are rich literatures on
    change point detection. Although many methods have been proposed for detecting
    multiple change points, using these methods to find multiple change points in a
    large sample seems not feasible.

  313. Classification under Data Contamination with Application to Remote Sensing Image Mis-registration.

    Authors: Donghui Yan, Peng Gong, Aiyou Chen, Liheng Zhong
    Subjects: Methodology
    Abstract

    This work is motivated by the problem of image mis-registration in remote
    sensing and we are interested in determining the resulting loss in the accuracy
    of pattern classification. A statistical formulation is given where we propose
    to use data contamination to model and understand the phenomenon of image
    mis-registration. This model is widely applicable to many other types of errors
    as well, for example, measurement errors and gross errors etc. The impact of
    data contamination on classification is studied under a statistical learning
    theoretical framework.

  314. Defining a robust biological prior from Pathway Analysis to drive Network Inference.

    Authors: Christophe Ambroise, Marine Jeanmougin, Mickael Guedj
    Subjects: Methodology
    Abstract

    Due to the vast space of possible networks and the relatively small amount of
    data available, inferring genetic networks from gene expression data is one of
    the most challenging work in the post-genomic era. In this field, Gaussian
    Graphical Model (GGM) provides a convenient framework for the discovery of
    biological networks. In this paper, we propose an original approach for
    inferring gene regulation network using a robust biological prior on structure
    in order to limit the set of candidate networks.

  315. Minimum mean square distance estimation of a subspace.

    Authors: Nicolas Dobigeon, Jean-Yves Tourneret, Olivier Besson
    Subjects: Methodology
    Abstract

    We consider the problem of subspace estimation in a Bayesian setting. Since
    we are operating in the Grassmann manifold, the usual approach which consists
    of minimizing the mean square error (MSE) between the true subspace $U$ and its
    estimate $\hat{U}$ may not be adequate as the MSE is not the natural metric in
    the Grassmann manifold. As an alternative, we propose to carry out subspace
    estimation by minimizing the mean square distance (MSD) between $U$ and its
    estimate, where the considered distance is a natural metric in the Grassmann
    manifold, viz.

  316. An Adjusted Likelihood Ratio Test for Separability in Unbalanced Multivariate Repeated Measures Data.

    Authors: Sean L. Simpson
    Subjects: Methodology
    Abstract

    We propose an adjusted likelihood ratio test of two-factor separability
    (Kronecker product structure) for unbalanced multivariate repeated measures
    data. Here we address the particular case where the within subject correlation
    is believed to decrease exponentially in both dimensions (e.g., temporal and
    spatial dimensions). However, the test can be easily generalized to factor
    specific matrices of any structure. A simulation study is conducted to assess
    the inference accuracy of the proposed test.

  317. Correct ordering in the Zipf-Poisson ensemble.

    Authors: Art B. Owen, Justin S. Dyer
    Subjects: Methodology
    Abstract

    We consider a Zipf--Poisson ensemble in which $X_i\sim\poi(Ni^{-\alpha})$ for
    $\alpha>1$ and $N>0$ and integers $i\ge 1$. As $N\to\infty$ the first $n'(N)$
    random variables have their proper order $X_1>X_2>...>X_{n'}$ relative to each
    other, with probability tending to 1 for $n'$ up to
    $(AN/\log(N))^{1/(\alpha+2)}$ for an explicit constant $A(\alpha)\ge 3/4$. The
    rate $N^{1/(\alpha+2)}$ cannot be achieved. The ordering of the first $n'(N)$
    entities does not preclude $X_m>X_{n'}$ for some interloping $m>n'$.

  318. Estimators of Fractal Dimension: Assessing the Roughness of Time Series and Spatial Data.

    Authors: Tilmann Gneiting, Donald B. Percival, Hana Sevcikova
    Subjects: Methodology
    Abstract

    The fractal or Hausdorff dimension is a measure of roughness (or smoothness)
    for time series and spatial data. The graph of a smooth, differentiable surface
    indexed in R^d has topological and fractal dimension d. If the surface is
    non-differentiable and rough, the fractal dimension takes values between the
    topological dimension, d, and d + 1. We review and assess estimators of fractal
    dimension by their large sample behavior under infill asymptotics, in extensive
    finite sample simulation studies, and in a data example on arctic sea-ice
    profiles.

  319. Optimal detection of changepoints with a linear computational cost.

    Authors: R. Killick, P. Fearnhead, I.A. Eckley
    Subjects: Methodology
    Abstract

    We consider the problem of detecting multiple changepoints in large data
    sets. Our focus is on applications where the number of changepoints will
    increase as we collect more data: for example in genetics as we sequence larger
    regions of the genome, or in finance as we observe time-series over longer
    periods.

  320. Exponential-Family Random Graph Models for Valued Networks.

    Authors: Pavel N. Krivitsky
    Subjects: Methodology
    Abstract

    Exponential-family random graph models (ERGMs) provide a principled and
    flexible way to model and simulate features common in social networks, such as
    propensities for homophily, mutuality, and friend-of-a-friend triad closure,
    through choice of model terms (sufficient statistics). However, those ERGMs
    modeling the more complex features have, to date, been limited to binary data:
    presence or absence of ties. Thus, analysis of valued networks, such as those
    where counts, measurements, or ranks are observed, has necessitated
    dichotomizing them, losing information.

  321. A Family of Generalized Linear Models for Repeated Measures with Normal and Conjugate Random Effects.

    Authors: Clarice G. B. Demétrio, Geert Molenberghs, Geert Verbeke, Afrânio M. C. Vieira
    Subjects: Methodology
    Abstract

    Non-Gaussian outcomes are often modeled using members of the so-called
    exponential family. Notorious members are the Bernoulli model for binary data,
    leading to logistic regression, and the Poisson model for count data, leading
    to Poisson regression.

  322. Bivariate Uniform Deconvolution.

    Authors: Bert van Es, Martina Benešová, Peter Tegelaar
    Subjects: Methodology
    Abstract

    We construct a density estimator in the bivariate uniform deconvolution
    model. For this model we derive four inversion formulas to express the
    bivariate density that we want to estimate in terms of the bivariate density of
    the observations. By substituting a kernel density estimator of the density of
    the observations we then get four different estimators. Next we construct an
    asymptotically optimal convex combination of these four estimators. Expansions
    for the bias, variance, as well as asymptotic normality, are derived. Some
    simulated examples are presented.

  323. A Conversation with George C. Tiao.

    Authors: Daniel Peña, Ruey S. Tsay
    Subjects: Methodology
    Abstract

    George C. Tiao was born in London in 1933. After graduating with a B.A. in
    Economics from National Taiwan University in 1955 he went to the US to obtain
    an M.B.A from New York University in 1958 and a Ph.D. in Economics from the
    University of Wisconsin, Madison in 1962. From 1962 to 1982 he was Assistant,
    Associate, Professor and Bascom Professor of Statistics and Business at the
    University of Wisconsin, Madison, and in the period 1973--1975 was Chairman of
    the Department of Statistics. He moved to the Graduate School of Business at
    the University of Chicago in 1982 and is the W.

  324. Laplace Approximated EM Microarray Analysis: An Empirical Bayes Approach for Comparative Microarray Experiments.

    Authors: Martin T. Wells, Haim Bar, James Booth, Elizabeth Schifano
    Subjects: Methodology
    Abstract

    A two-groups mixed-effects model for the comparison of (normalized)
    microarray data from two treatment groups is considered. Most competing
    parametric methods that have appeared in the literature are obtained as special
    cases or by minor modification of the proposed model. Approximate maximum
    likelihood fitting is accomplished via a fast and scalable algorithm, which we
    call LEMMA (Laplace approximated EM Microarray Analysis).

  325. Graphical Models for Inference Under Outcome-Dependent Sampling.

    Authors: Niels Keiding, Vanessa Didelez, Svend Kreiner
    Subjects: Methodology
    Abstract

    We consider situations where data have been collected such that the sampling
    depends on the outcome of interest and possibly further covariates, as for
    instance in case-control studies. Graphical models represent assumptions about
    the conditional independencies among the variables. By including a node for the
    sampling indicator, assumptions about sampling processes can be made explicit.
    We demonstrate how to read off such graphs whether consistent estimation of the
    association between exposure and outcome is possible.

  326. On the Sample Information About Parameter and Prediction.

    Authors: Nader Ebrahimi, Ehsan S. Soofi, Refik Soyer
    Subjects: Methodology
    Abstract

    The Bayesian measure of sample information about the parameter, known as
    Lindley's measure, is widely used in various problems such as developing prior
    distributions, models for the likelihood functions and optimal designs. The
    predictive information is defined similarly and used for model selection and
    optimal designs, though to a lesser extent. The parameter and predictive
    information measures are proper utility functions and have been also used in
    combination.

  327. Nonparametric Additive Model-assisted Estimation for Survey Data.

    Authors: Li Wang, Suojin Wang
    Subjects: Methodology
    Abstract

    An additive model-assisted nonparametric method is investigated to estimate
    the finite population totals of massive survey data with the aid of auxiliary
    information. A class of estimators is proposed to improve the precision of the
    well known Horvitz-Thompson estimators by combining the spline and local
    polynomial smoothing methods. These estimators are calibrated, asymptotically
    design-unbiased, consistent, normal and robust in the sense of asymptotically
    attaining the Godambe-Joshi lower bound to the anticipated variance.

  328. To Explain or to Predict?.

    Authors: Galit Shmueli
    Subjects: Methodology
    Abstract

    Statistical modeling is a powerful tool for developing and testing theories
    by way of causal explanation, prediction, and description. In many disciplines
    there is near-exclusive use of statistical modeling for causal explanation and
    the assumption that models with high explanatory power are inherently of high
    predictive power. Conflation between explanation and prediction is common, yet
    the distinction must be understood for progressing scientific knowledge.

  329. Generalised Wishart Processes.

    Authors: Zoubin Ghahramani, Andrew Gordon Wilson
    Subjects: Methodology
    Abstract

    We introduce a stochastic process with Wishart marginals: the generalised
    Wishart process (GWP). It is a collection of positive semi-definite random
    matrices indexed by any arbitrary dependent variable. We use it to model
    dynamic (e.g. time varying) covariance matrices. Unlike existing models, it can
    capture a diverse class of covariance structures, it can easily handle missing
    data, the dependent variable can readily include covariates other than time,
    and it scales well with dimension; there is no need for free parameters, and
    optional parameters are easy to interpret.

  330. Component Selection in the Additive Regression Model.

    Authors: Xia Cui, Lixing Zhu, Heng Peng, Songqiao Wen
    Subjects: Methodology
    Abstract

    Similar to variable selection in the linear regression model, selecting
    significant components in the popular additive regression model is of great
    interest. However, such components are unknown smooth functions of independent
    variables, which are unobservable. As such, some approximation is needed. In
    this paper, we suggest a combination of penalized regression spline
    approximation and group variable selection, called the lasso-type spline method
    (LSM), to handle this component selection problem with a diverging number of
    strongly correlated variables in each group.

  331. Truncated Stochastic Approximation with Moving Bounds: Convergence.

    Authors: Teo Sharia
    Subjects: Methodology
    Abstract

    In this paper we propose a wide class of truncated stochastic approximation
    procedures with moving random bounds. While we believe that the proposed class
    of procedures will find its way to a wider range of applications, the main
    motivation is to accommodate applications to parametric statistical estimation
    theory. Our class of stochastic approximation procedures has three main
    characteristics: truncations with random moving bounds, a matrix valued random
    step-size sequence, and dynamically changing random regression function.

  332. Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models.

    Authors: Mark S. Handcock, Pavel N. Krivitsky, Martina Morris
    Subjects: Methodology
    Abstract

    Exponential-family random graph models (ERGMs) provide a principled way to
    model and simulate features common in human social networks, such as
    propensities for homophily and friend-of-a-friend triad closure. We show that,
    without adjustment, ERGMs preserve density as network size increases. Density
    invariance is often not appropriate for social networks. We suggest a simple
    modification based on an offset which instead preserves the mean degree and
    accommodates changes in network composition asymptotically.

  333. Large-scale interval and point estimates from an empirical Bayes extension of confidence posteriors.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    The proposed approach extends the confidence posterior distribution to the
    semi-parametric empirical Bayes setting. Whereas the Bayesian posterior is
    defined in terms of a prior distribution conditional on the observed data, the
    confidence posterior is defined such that the probability that the parameter
    value lies in any fixed subset of parameter space, given the observed data, is
    equal to the coverage rate of the corresponding confidence interval.

  334. Forward Smoothing using Sequential Monte Carlo.

    Authors: Pierre Del Moral, Arnaud Doucet, Sumeetpal Singh
    Subjects: Methodology
    Abstract

    Sequential Monte Carlo (SMC) methods are a widely used set of computational
    tools for inference in non-linear non-Gaussian state-space models. We propose a
    new SMC algorithm to compute the expectation of additive functionals
    recursively. Essentially, it is an online or forward-only implementation of a
    forward filtering backward smoothing SMC algorithm proposed in Doucet .et .al
    (2000).

  335. Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression.

    Authors: Jianqing Fan, Shaojun Guo, Ning Hao
    Subjects: Methodology
    Abstract

    Variance estimation is a fundamental problem in statistical modeling. In
    ultrahigh dimensional linear regressions where the dimensionality is much
    larger than sample size, traditional variance estimation techniques are not
    applicable. Recent advances on variable selection in ultrahigh dimensional
    linear regressions make this problem accessible. One of the major problems in
    ultrahigh dimensional regression is the high spurious correlation between the
    unobserved realized noise and some of the predictors.

  336. Regularized Least-Mean-Square Algorithms.

    Authors: Alfred O. Hero, Yilun Chen, Yuantao Gu
    Subjects: Methodology
    Abstract

    We consider adaptive system identification problems with convex constraints
    and propose a family of regularized Least-Mean-Square (LMS) algorithms. We show
    that with a properly selected regularization parameter the regularized LMS
    provably dominates its conventional counterpart in terms of mean square
    deviations. We establish simple and closed-form expressions for choosing this
    regularization parameter. For identifying an unknown sparse system we propose
    sparse and group-sparse LMS algorithms, which are special examples of the
    regularized LMS family.

  337. Approximate tail probabilities of the maximum of a chi-square field on multi-dimensional lattice points and their applications to detection of loci interactions.

    Authors: Satoshi Kuriki, Yoshiaki Harushima, Hironori Fujisawa, Nori Kurata
    Subjects: Methodology
    Abstract

    Define a chi-square random field on a multi-dimensional lattice points index
    set with a direct-product covariance structure, and consider the distribution
    of the maximum of this random field. We provide two approximate formulas for
    the upper tail probability of the distribution based on nonlinear renewal
    theory and an integral-geometric approach called the volume-of-tube method. The
    former is accurate when the lattice spacings are approximately equal. The
    latter is a conservative bound, but has the advantage that the lattice spacings
    do not matter.

  338. Control of the False Discovery Rate Under Arbitrary Covariance Dependence.

    Authors: Jianqing Fan, Xu Han, Weijie Gu
    Subjects: Methodology
    Abstract

    Multiple hypothesis testing is a fundamental problem in high dimensional
    inference, with wide applications in many scientific fields. In genome-wide
    association studies, tens of thousands of tests are performed simultaneously to
    find if any genes are associated with some traits and those tests are
    correlated. When test statistics are correlated, false discovery control
    becomes very challenging under arbitrary dependence.

  339. Robust Sure Independence Screening based on Rank Correlation for the Ultrahigh Dimensional Models.

    Authors: Lixing Zhu, Jun Zhang, Gaorong Li, Heng Peng
    Subjects: Methodology
    Abstract

    The variable selection problem for high-dimensional models has become an
    important topic in modern statistics, especially for the setting which the
    number of predictors $p$ is much larger than the number of observations $n$. In
    this paper, we propose a rank correlation screening (RCS), a novel method, to
    deal with the ultra-high dimensional data. We show that our proposed procedure
    possesses a sure independence screening property even when the number of
    predictor variables grows as exponential dimensionality.

  340. Exploring the Consequences of IED Deployment with a Generalized Linear Model Implementation of the Canadian Traveller Problem.

    Authors: Stephen E. Fienberg, Andrew C. Thomas
    Subjects: Methodology
    Abstract

    The deployment of improvised explosive devices (IEDs) along major roadways
    has been a favoured strategy of insurgents in recent war zones, both for the
    ability to cause damage to targets along roadways at minimal cost, but also as
    a means of controlling the flow of traffic and causing additional expense to
    opposing forces.

  341. Type I error rate control in multiple testing: a survey with proofs.

    Authors: Etienne Roquain
    Subjects: Methodology
    Abstract

    This paper presents a survey on some recent advances in the error rate
    control in multiple testing methodology. We consider the problem of controlling
    the $k$-family-wise error rate (kFWER, probability to make $k$ false
    discoveries or more) and the false discovery proportion (FDP, proportion of
    false discoveries among the discoveries), the latter being controlled either
    via its expectation, which is the so-called false discovery rate (FDR), or via
    its upper-tail distribution function.

  342. Mixture Modeling for Marked Poisson Processes.

    Authors: Matthew A. Taddy, Athanasios Kottas
    Subjects: Methodology
    Abstract

    We propose a general modeling framework for marked Poisson processes observed
    over time or space. The modeling approach exploits the connection of the
    nonhomogeneous Poisson process intensity with a density function. Nonparametric
    Dirichlet process mixtures for this density, combined with nonparametric or
    semiparametric modeling for the mark distribution, yield flexible prior models
    for the marked Poisson process.

  343. Inverse Regression for Analysis of Sentiment in Text.

    Authors: Matthew A. Taddy
    Subjects: Methodology
    Abstract

    Text data, including speeches, stories, and other document forms, is often
    composed with regard to sentiment variables that are of interest for research
    in marketing, economics, and other social research fields. It is also very high
    dimensional and difficult to incorporate into statistical analysis. This
    article introduces a straightforward framework of sentiment-preserving
    dimension reduction for text data. Our aim is to provide a general approach to
    text regression while avoiding the model complexity characterizing much of
    statistical learning for language.

  344. Rejoinder: The Future of Indirect Evidence.

    Authors: Bradley Efron
    Subjects: Methodology
    Abstract

    Rejoinder to "The Future of Indirect Evidence" [arXiv:1012.1161]

  345. Comment: How Should Indirect Evidence Be Used?.

    Authors: Robert E. Kass
    Subjects: Methodology
    Abstract

    Indirect evidence is crucial for successful statistical practice. Sometimes,
    however, it is better used informally. Future efforts should be directed toward
    understanding better the connection between statistical methods and scientific
    problems. [arXiv:1012.1161]

  346. Bayesian Statistics Then and Now.

    Authors: Andrew Gelman
    Subjects: Methodology
    Abstract

    Discussion of "The Future of Indirect Evidence" by Bradley Efron
    [arXiv:1012.1161]

  347. Comment: The Need for Syncretism in Applied Statistics.

    Authors: Sander Greenland
    Subjects: Methodology
    Abstract

    Comment on "The Need for Syncretism in Applied Statistics" [arXiv:1012.1161]

  348. Weak Convergence of Markov Chain Monte Carlo Methods and its Application to Regular Gibbs Sampler.

    Authors: Kengo Kamatani
    Subjects: Methodology
    Abstract

    In this paper, we introduce the notion of efficiency (consistency) and
    examine some asymptotic properties of Markov chain Monte Carlo methods. We
    apply these results to the Gibbs sampler for independent and identically
    distributed observations. More precisely, we show that if both the sample size
    and the running time of the Gibbs sampler tend to infinity, and if the initial
    guess is not far from the true parameter, the Gibbs sampler estimator tends to
    the Bayesian estimator.

  349. Approximate Dynamic Programming and Its Applications to the Design of Phase I Cancer Trials.

    Authors: Jay Bartroff, Tze Leung Lai
    Subjects: Methodology
    Abstract

    Optimal design of a Phase I cancer trial can be formulated as a stochastic
    optimization problem. By making use of recent advances in approximate dynamic
    programming to tackle the problem, we develop an approximation of the Bayesian
    optimal design.

  350. Bayesian Models and Decision Algorithms for Complex Early Phase Clinical Trials.

    Authors: Peter F. Thall
    Subjects: Methodology
    Abstract

    An early phase clinical trial is the first step in evaluating the effects in
    humans of a potential new anti-disease agent or combination of agents. Usually
    called "phase I" or "phase I/II" trials, these experiments typically have the
    nominal scientific goal of determining an acceptable dose, most often based on
    adverse event probabilities. This arose from a tradition of phase I trials to
    evaluate cytotoxic agents for treating cancer, although some methods may be
    applied in other medical settings, such as treatment of stroke or immunological
    diseases.

  351. Dose Finding with Escalation with Overdose Control (EWOC) in Cancer Clinical Trials.

    Authors: Mourad Tighiouart, André Rogatko
    Subjects: Methodology
    Abstract

    Traditionally, the major objective in phase I trials is to identify a
    working-dose for subsequent studies, whereas the major endpoint in phase II and
    III trials is treatment efficacy. The dose sought is typically referred to as
    the maximum tolerated dose (MTD). Several statistical methodologies have been
    proposed to select the MTD in cancer phase I trials. In this manuscript, we
    focus on a Bayesian adaptive design, known as escalation with overdose control
    (EWOC).

  352. The Random Walk Metropolis: Linking Theory and Practice Through a Case Study.

    Authors: Chris Sherlock, Paul Fearnhead, Gareth O. Roberts
    Subjects: Methodology
    Abstract

    The random walk Metropolis (RWM) is one of the most common Markov chain Monte
    Carlo algorithms in practical use today. Its theoretical properties have been
    extensively explored for certain classes of target, and a number of results
    with important practical implications have been derived. This article draws
    together a selection of new and existing key results and concepts and describes
    their implications. The impact of each new idea on algorithm efficiency is
    demonstrated for the practical example of the Markov modulated Poisson process
    (MMPP).

  353. Bivariate Penalized Splines.

    Authors: David Ruppert, Luo Xiao, Yingxing Li
    Subjects: Methodology
    Abstract

    We propose a new penalized spline method for bivariate smoothing. Tensor
    product B-splines with row and column penalties are used as in the bivariate
    P-spline of Marx and Eilers (2005). What is new here is the introduction of a
    third penalty term and a modification of the row and column penalties. We call
    the new estimator a Bivariate Penalized Spline or BPS. The modified penalty
    used by the BPS results in considerable simplifications that speed computations
    and facilitate asymptotic analysis.

  354. Variational approximation for heteroscedastic linear models and matching pursuit algorithms.

    Authors: Minh-Ngoc Tran, David J. Nott, Chenlei Leng
    Subjects: Methodology
    Abstract

    Modern statistical applications involving large data sets have focused
    attention on statistical methodologies which are both efficient computationally
    and able to deal with the screening of large numbers of different candidate
    models. Here we consider computationally efficient variational Bayes approaches
    to inference in high-dimensional heteroscedastic linear regression, where both
    the mean and variance are described in terms of linear functions of the
    predictors and where the number of predictors can be larger than the sample
    size.

  355. Predictor-dependent shrinkage for linear regression via partial factor modeling.

    Authors: Sayan Mukherjee, P. Richard Hahn, Carlos Carvalho
    Subjects: Methodology
    Abstract

    In prediction problems with more predictors than observations, it can
    sometimes be helpful to use a joint probability model, $\pi(Y,X)$, rather than
    a purely conditional model, $\pi(Y \mid X)$, where $Y$ is a scalar response
    variable and $X$ is a vector of predictors. This approach is motivated by the
    fact that in many situations the marginal predictor distribution $\pi(X)$ can
    provide useful information about the parameter values governing the conditional
    regression. However, under very mild misspecification, this marginal
    distribution can also lead conditional inferences astray.

  356. Clustering using Unsupervised Binary Trees: CUBT.

    Authors: Ricardo Fraiman, Badih Ghattas, Marcela Svarc
    Subjects: Methodology
    Abstract

    We introduce a new clustering method based on unsupervised binary trees. It
    is a three stages procedure, which performs on a first stage recursive binary
    splits reducing the heterogeneity of the data within the new subsamples. On the
    second stage (pruning) adjacent nodes are considered to be aggregated. Finally,
    on the third stage (joining) similar clusters are joined even if they do not
    descend from the same node. Consistency results are obtained and the procedure
    is tested on simulated and real data sets

  357. Modeling Non-Stationary Processes Through Dimension Expansion.

    Authors: Luke Bornn, Gavin Shaddick, James V Zidek
    Subjects: Methodology
    Abstract

    In this paper, we propose a novel approach to modeling nonstationary spatial
    fields. The proposed method works by expanding the geographic plane over which
    these processes evolve into higher dimensional spaces, transforming and
    clarifying complex patterns in the physical plane. By combining aspects of
    multi-dimensional scaling, group lasso, and latent variables models, a
    dimensionally sparse projection is found in which the originally nonstationary
    field exhibits stationarity.

  358. A Separable Model for Dynamic Networks.

    Authors: Mark S. Handcock, Pavel N. Krivitsky
    Subjects: Methodology
    Abstract

    Models of dynamic networks - networks that evolve over time - have manifold
    applications. We develop a discrete-time generative model for social network
    evolution that inherits the richness and flexibility of the class of
    exponential-family random graph models. The model facilitates separable
    modeling of the tie duration distributions and the structural dynamics of tie
    formation. We develop likelihood-based inference for the model, and provide
    computational algorithms for maximum likelihood estimation.

  359. Model Selection by Loss Rank for Classification and Unsupervised Learning.

    Authors: Marcus Hutter, Minh-Ngoc Tran
    Subjects: Methodology
    Abstract

    Hutter (2007) recently introduced the loss rank principle (LoRP) as a
    generalpurpose principle for model selection. The LoRP enjoys many attractive
    properties and deserves further investigations. The LoRP has been well-studied
    for regression framework in Hutter and Tran (2010). In this paper, we study the
    LoRP for classification framework, and develop it further for model selection
    problems in unsupervised learning where the main interest is to describe the
    associations between input measurements, like cluster analysis or graphical
    modelling.

  360. The Loss Rank Criterion for Variable Selection in Linear Regression Analysis.

    Authors: Minh-Ngoc Tran
    Subjects: Methodology
    Abstract

    Lasso and other regularization procedures are attractive methods for variable
    selection, subject to a proper choice of shrinkage parameter. Given a set of
    potential subsets produced by a regularization algorithm, a consistent model
    selection criterion is proposed to select the best one among this preselected
    set. The approach leads to a fast and efficient procedure for variable
    selection, especially in high-dimensional settings.

  361. The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators.

    Authors: Christopher J. Paciorek
    Subjects: Methodology
    Abstract

    Residuals in regression models are often spatially correlated. Prominent
    examples include studies in environmental epidemiology to understand the
    chronic health effects of pollutants. I consider the effects of residual
    spatial structure on the bias and precision of regression coefficients,
    developing a simple framework in which to understand the key issues and derive
    informative analytic results.

  362. Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.

    Authors: Kosuke Imai, Luke Keele, Teppei Yamamoto
    Subjects: Methodology
    Abstract

    Causal mediation analysis is routinely conducted by applied researchers in a
    variety of disciplines. The goal of such an analysis is to investigate
    alternative causal mechanisms by examining the roles of intermediate variables
    that lie in the causal paths between the treatment and outcome variables. In
    this paper we first prove that under a particular version of sequential
    ignorability assumption, the average causal mediation effect (ACME) is
    nonparametrically identified. We compare our identification assumption with
    those proposed in the literature.

  363. Particle Learning and Smoothing.

    Authors: Nicholas G. Polson, Carlos M. Carvalho, Michael S. Johannes, Hedibert F. Lopes
    Subjects: Methodology
    Abstract

    Particle learning (PL) provides state filtering, sequential parameter
    learning and smoothing in a general class of state space models. Our approach
    extends existing particle methods by incorporating the estimation of static
    parameters via a fully-adapted filter that utilizes conditional sufficient
    statistics for parameters and/or states as particles. State smoothing in the
    presence of parameter uncertainty is also solved as a by-product of PL. In a
    number of examples, we show that PL outperforms existing particle filtering
    alternatives and proves to be a competitor to MCMC.

  364. Improving Estimates of Monotone Functions by Rearrangement.

    Authors: Alfred Galichon, Victor Chernozhukov, Ivan Fernandez-Val
    Subjects: Methodology
    Abstract

    Suppose that a target function is monotonic, namely, weakly increasing, and
    an original estimate of the target function is available, which is not weakly
    increasing. Many common estimation methods used in statistics produce such
    estimates. We show that these estimates can always be improved with no harm
    using rearrangement techniques: The rearrangement methods, univariate and
    multivariate, transform the original estimate to a monotonic estimate, and the
    resulting estimate is closer to the true curve in common metrics than the
    original estimate.

  365. Dempster--Shafer Theory and Statistical Inference with Weak Beliefs.

    Authors: Chuanhai Liu, Ryan Martin, Jianchun Zhang
    Subjects: Methodology
    Abstract

    The Dempster--Shafer (DS) theory is a powerful tool for probabilistic
    reasoning based on a formal calculus for combining evidence. DS theory has been
    widely used in computer science and engineering applications, but has yet to
    reach the statistical mainstream, perhaps because the DS belief functions do
    not satisfy long-run frequency properties. Recently, two of the authors
    proposed an extension of DS, called the weak belief (WB) approach, that can
    incorporate desirable frequency properties into the DS framework by
    systematically enlarging the focal elements.

  366. Make Research Data Public?---Not Always so Simple: A Dialogue for Statisticians and Science Editors.

    Authors: Nell Sedransk, Lawrence H. Cox, Deborah Nolan, Keith Soper, Cliff Spiegelman, Linda J. Young, Katrina L. Kelner, Robert A. Moffitt, Ani Thakar, Jordan Raddick, Edward J. Ungvarsky, Richard W. Carlson, Rolf Apweiler
    Subjects: Methodology
    Abstract

    Putting data into the public domain is not the same thing as making those
    data accessible for intelligent analysis. A distinguished group of editors and
    experts who were already engaged in one way or another with the issues inherent
    in making research data public came together with statisticians to initiate a
    dialogue about policies and practicalities of requiring published research to
    be accompanied by publication of the research data.

  367. Assumptions of IV Methods for Observational Epidemiology.

    Authors: Vanessa Didelez, Sha Meng, Nuala A. Sheehan
    Subjects: Methodology
    Abstract

    Instrumental variable (IV) methods are becoming increasingly popular as they
    seem to offer the only viable way to overcome the problem of unobserved
    confounding in observational studies. However, some attention has to be paid to
    the details, as not all such methods target the same causal parameters and some
    rely on more restrictive parametric assumptions than others. We therefore
    discuss and contrast the most common IV approaches with relevance to typical
    applications in observational epidemiology.

  368. A Note on an R^2 Measure for Fixed Effects in the Generalized Linear Mixed Model.

    Authors: Lloyd J. Edwards
    Subjects: Methodology
    Abstract

    Using the LRT statistic, a model R^2 is proposed for the generalized linear
    mixed model for assessing the association between the correlated outcomes and
    fixed effects. The R^2 compares the full model to a null model with all fixed
    effects deleted.

  369. Control of the False Discovery Rate Under Arbitrary Covariance Dependence.

    Authors: Jianqing Fan, Xu Han, Weijie Gu
    Subjects: Methodology
    Abstract

    Multiple hypothesis testing is a fundamental problem in high dimensional
    inference, with wide applications in many scientific fields. In genome-wide
    association studies, tens of thousands of tests are performed simultaneously to
    find if any genes are associated with some traits and those tests are
    correlated. When test statistics are correlated, false discovery control
    becomes very challenging under arbitrary dependence.

  370. Random threshold for linear model selection, revisited.

    Authors: Merlin Keller, Marc Lavielle
    Subjects: Methodology
    Abstract

    In [Lavielle and Ludena 07], a random thresholding metho d is intro duced to
    select the significant, or non null, mean terms among a collection of
    independent random variables, and applied to the problem of recovering the
    significant coefficients in non ordered model selection. We intro duce a simple
    modification which removes the dep endency of the proposed estimator on a
    window parameter while maintaining its asymptotic properties.

  371. Large-scale simultaneous testing with hypergeometric inverted-beta priors.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Methodology
    Abstract

    We develop a new class of distributions for use in large-scale simultaneous
    testing. These priors are based on hypergeometric inverted-beta priors, and
    have two main attractive features: heavy tails, and computational tractability.
    The family is a four-parameter generalization of the normal/inverted-beta
    prior, and is the natural conjugate prior for a shrinkage coefficients in a
    hierarchical normal model.

  372. Replication in Genome-Wide Association Studies.

    Authors: Peter Kraft, Eleftheria Zeggini, John P. A. Ioannidis
    Subjects: Methodology
    Abstract

    Replication helps ensure that a genotype-phenotype association observed in a
    genome-wide association (GWA) study represents a credible association and is
    not a chance finding or an artifact due to uncontrolled biases. We discuss
    prerequisites for exact replication, issues of heterogeneity, advantages and
    disadvantages of different methods of data synthesis across multiple studies,
    frequentist vs. Bayesian inferences for replication, and challenges that arise
    from multi-team collaborations.

  373. On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs.

    Authors: Ruth M. Pfeiffer, Mitchell H. Gail, David Pee
    Subjects: Methodology
    Abstract

    Combining data from several case-control genome-wide association (GWA)
    studies can yield greater efficiency for detecting associations of disease with
    single nucleotide polymorphisms (SNPs) than separate analyses of the component
    studies. We compared several procedures to combine GWA study data both in terms
    of the power to detect a disease-associated SNP while controlling the
    genome-wide significance level, and in terms of the detection probability
    ($\mathit{DP}$).

  374. Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium.

    Authors: Gang Zheng, Jungnam Joo, Dmitri Zaykin, Colin Wu, Nancy Geller
    Subjects: Methodology
    Abstract

    Under complete linkage disequilibrium (LD), robust tests often have greater
    power than Pearson's chi-square test and trend tests for the analysis of
    case-control genetic association studies. Robust statistics have been used in
    candidate-gene and genome-wide association studies (GWAS) when the genetic
    model is unknown. We consider here a more general incomplete LD model, and
    examine the impact of penetrances at the marker locus when the genetic models
    are defined at the disease locus.

  375. Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases.

    Authors: Sebastian Zöllner, Tanya M. Teslovich
    Subjects: Methodology
    Abstract

    Copy number variants (CNVs) account for more polymorphic base pairs in the
    human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass
    genes as well as noncoding DNA, making these polymorphisms good candidates for
    functional variation. Consequently, most modern genome-wide association studies
    test CNVs along with SNPs, after inferring copy number status from the data
    generated by high-throughput genotyping platforms. Here we give an overview of
    CNV genomics in humans, highlighting patterns that inform methods for
    identifying CNVs.

  376. Estimating Effects and Making Predictions from Genome-Wide Marker Data.

    Authors: Michael E. Goddard, Naomi R. Wray, Klara Verbyla, Peter M. Visscher
    Subjects: Methodology
    Abstract

    In genome-wide association studies (GWAS), hundreds of thousands of genetic
    markers (SNPs) are tested for association with a trait or phenotype. Reported
    effects tend to be larger in magnitude than the true effects of these markers,
    the so-called ``winner's curse.'' We argue that the classical definition of
    unbiasedness is not useful in this context and propose to use a different
    definition of unbiasedness that is a property of the estimator we advocate.

  377. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes.

    Authors: Raymond J. Carroll, Nilanjan Chatterjee, Yi-Hau Chen, Sheng Luo
    Subjects: Methodology
    Abstract

    Although prospective logistic regression is the standard method of analysis
    for case-control data, it has been recently noted that in genetic epidemiologic
    studies one can use the ``retrospective'' likelihood to gain major power by
    incorporating various population genetics model assumptions such as
    Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence.
    In this article we review these modern methods and contrast them with the more
    classical approaches through two types of applications (i) association tests
    for typed and untyped single nucleotide polymorphisms (SNPs) a

  378. Structures and Assumptions: Strategies to Harness Gene $\times$ Gene and Gene $\times$ Environment Interactions in GWAS.

    Authors: Charles Kooperberg, Michael LeBlanc, James Y. Dai, Indika Rajapakse
    Subjects: Methodology
    Abstract

    Genome-wide association studies, in which as many as a million single
    nucleotide polymorphisms (SNP) are measured on several thousand samples, are
    quickly becoming a common type of study for identifying genetic factors
    associated with many phenotypes. There is a strong assumption that interactions
    between SNPs or genes and interactions between genes and environmental factors
    substantially contribute to the genetic risk of a disease.

  379. Population Structure and Cryptic Relatedness in Genetic Association Studies.

    Authors: William Astle, David J. Balding
    Subjects: Methodology
    Abstract

    We review the problem of confounding in genetic association studies, which
    arises principally because of population structure and cryptic relatedness.
    Many treatments of the problem consider only a simple ``island'' model of
    population structure. We take a broader approach, which views population
    structure and cryptic relatedness as different aspects of a single confounder:
    the unobserved pedigree defining the (often distant) relationships among the
    study subjects.

  380. A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies.

    Authors: Peter Donnelly, Jonathan Marchini, Zhan Su, Niall Cardin, Wellcome Trust Case Control Consortium
    Subjects: Methodology
    Abstract

    The standard paradigm for the analysis of genome-wide association studies
    involves carrying out association tests at both typed and imputed SNPs. These
    methods will not be optimal for detecting the signal of association at SNPs
    that are not currently known or in regions where allelic heterogeneity occurs.
    We propose a novel association test, complementary to the SNP-based approaches,
    that attempts to extract further signals of association by explicitly modeling
    and estimating both unknown SNPs and allelic heterogeneity at a locus.

  381. Methodological Issues in Multistage Genome-Wide Association Studies.

    Authors: Duncan C. Thomas, Graham Casey, David V. Conti, Robert W. Haile, Juan Pablo Lewinger, Daniel O. Stram
    Subjects: Methodology
    Abstract

    Because of the high cost of commercial genotyping chip technologies, many
    investigations have used a two-stage design for genome-wide association
    studies, using part of the sample for an initial discovery of ``promising''
    SNPs at a less stringent significance level and the remainder in a joint
    analysis of just these SNPs using custom genotyping.

  382. The Role of Family-Based Designs in Genome-Wide Association Studies.

    Authors: Christoph Lange, Nan M. Laird
    Subjects: Methodology
    Abstract

    Genome-Wide Association Studies (GWAS) offer an exciting and promising new
    research avenue for finding genes for complex diseases. Traditional
    case-control and cohort studies offer many advantages for such designs.
    Family-based association designs have long been attractive for their robustness
    properties, but robustness can mean a loss of power. In this paper we discuss
    some of the special features of family designs and their relevance in the era
    of GWAS.

  383. Genome-Wide Significance Levels and Weighted Hypothesis Testing.

    Authors: Larry Wasserman, Kathryn Roeder
    Subjects: Methodology
    Abstract

    Genetic investigations often involve the testing of vast numbers of related
    hypotheses simultaneously. To control the overall error rate, a substantial
    penalty is required, making it difficult to detect signals of moderate
    strength. To improve the power in this setting, a number of authors have
    considered using weighted $p$-values, with the motivation often based upon the
    scientific plausibility of the hypotheses. We review this literature, derive
    optimal weights and show that the power is remarkably robust to
    misspecification of these weights.

  384. Introduction to the Special Issue: Genome-Wide Association Studies.

    Authors: Gang Zheng, Jonathan Marchini, Nancy L. Geller
    Subjects: Methodology
    Abstract

    Introduction to the Special Issue: Genome-Wide Association Studies

  385. Non-Euclidean statistical analysis of covariance matrices and diffusion tensors.

    Authors: Ian L. Dryden, Diwei Zhou, Alexey Kolydenko, Bai Li
    Subjects: Methodology
    Abstract

    The statistical analysis of covariance matrices occurs in many important
    applications, e.g. in diffusion tensor imaging and longitudinal data analysis.
    We consider the situation where it is of interest to estimate an average
    covariance matrix, describe its anisotropy, to carry out principal geodesic
    analysis and to interpolate between covariance matrices. There are many choices
    of metric available, each with its advantages. The particular choice of what is
    best will depend on the particular application.

  386. Local shrinkage rules, L\'evy processes, and regularized regression.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Methodology
    Abstract

    We use L\'evy processes to generate joint prior distributions for a location
    parameter $\bbeta = (\beta_1,...,\beta_p) $ as $p$ grows large. This leads to
    the class of local-global shrinkage rules. We extend this framework to
    large-scale regularized regression for $p>n$ problems, and provide thorough
    comparisons with current methodologies.

  387. Estimating animal densities and home range in regions with irregular boundaries and holes: a lattice-based alternative to the kernel density estimator.

    Authors: Ronald P. Barry, Julie McIntyre
    Subjects: Methodology
    Abstract

    Density estimates based on point processes are often restrained to regions
    with irregular boundaries or holes. We propose a density estimator, the
    lattice-based density estimator, which produces reasonable density estimates
    under these circumstances. The estimation process starts with overlaying the
    region with nodes, linking these together in a lattice and then computing the
    density of random walks of length k on the lattice. We use an approximation to
    the unbiased crossvalidation criterion to find the optimal walk length k.

  388. An Alternative Prior Process for Nonparametric Bayesian Clustering.

    Authors: Shane T. Jensen, Hanna M. Wallach, Lee Dicker, Katherine A. Heller
    Subjects: Methodology
    Abstract

    Prior distributions play a crucial role in Bayesian approaches to clustering.
    Two commonly-used prior distributions are the Dirichlet and Pitman-Yor
    processes. In this paper, we investigate the predictive probabilities that
    underlie these processes, and the implicit "rich-get-richer" characteristic of
    the resulting partitions. We explore an alternative prior for nonparametric
    Bayesian clustering -- the uniform process -- for applications where the
    "rich-get-richer" property is undesirable.

  389. A factor mixture analysis model for multivariate binary data.

    Authors: Cinzia Viroli, Silvia Cagnone
    Subjects: Methodology
    Abstract

    The paper proposes a latent variable model for binary data coming from an
    unobserved heterogeneous population. The heterogeneity is taken into account by
    replacing the traditional assumption of Gaussian distributed factors by a
    finite mixture of multivariate Gaussians. The aim of the proposed model is
    twofold: it allows to achieve dimension reduction when the data are dichotomous
    and, simultaneously, it performs model based clustering in the latent space.
    Model estimation is obtained by means of a maximum likelihood method via a
    generalized version of the EM algorithm.

  390. Stochastic model selection for Mixtures of Matrix-Normals.

    Authors: Cinzia Viroli
    Subjects: Methodology
    Abstract

    Finite mixtures of matrix normal distributions are a powerful tool for
    classifying three-way data in unsupervised problems. The distribution of each
    component is assumed to be a matrix variate normal density. The mixture model
    can be estimated through the EM algorithm under the assumption that the number
    of components is known and fixed. In this work we introduce, develop and
    explore a Bayesian analysis of the model in order to provide a tool for
    simultaneous model estimation and model selection.

  391. Testing Parallelism of Nonparametric Regression Curves.

    Authors: Wei Biao Wu, David Degras, Zhiwei Xu, Ting Zhang
    Subjects: Methodology
    Abstract

    This paper considers the inference of regression functions in the context of
    multiple time series. For an arbitrary number of time series observed at a
    large number of time points, we test the hypothesis that the regression curves
    are parallel to each other. A central limit theorem is obtained for a
    parallelism index based on the distances between the estimates of the
    regression curves and their average. To implement the testing procedure, we
    propose a simulation-based approach that significantly improves upon the normal
    approximation to the test statistic.

  392. Rejoinder: Likelihood Inference for Models with Unobservables Another View.

    Authors: Youngjo Lee, John A. Nelder
    Subjects: Methodology
    Abstract

    Rejoinder to "Likelihood Inference for Models with Unobservables: Another
    View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]

  393. Decoding the H-likelihood.

    Authors: Xiao-Li Meng
    Subjects: Methodology
    Abstract

    Discussion of "Likelihood Inference for Models with Unobservables: Another
    View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]

  394. Discussion of Likelihood Inference for Models with Unobservables: Another View.

    Authors: Geert Molenberghs, Michael G. Kenward, Geert Verbeke
    Subjects: Methodology
    Abstract

    Discussion of "Likelihood Inference for Models with Unobservables: Another
    View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]

  395. Discussion of Likelihood Inference for Models with Unobservables: Another View.

    Authors: Thomas A. Louis
    Subjects: Methodology
    Abstract

    Discussion of "Likelihood Inference for Models with Unobservables: Another
    View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]

  396. A Conversation with Leo Goodman.

    Authors: Mark P. Becker
    Subjects: Methodology
    Abstract

    Leo A. Goodman was born on August 7, 1928 in New York City. He received his
    A.B. degree, summa cum laude, in 1948 from Syracuse University, majoring in
    mathematics and sociology. He went on to pursue graduate studies in
    mathematics, with an emphasis on mathematical statistics, in the Mathematics
    Department at Princeton University, and in 1950 he was awarded the M.A. and
    Ph.D. degrees. His statistics professors at Princeton were the late Sam Wilks
    and John Tukey.

  397. The Impact of Levene's Test of Equality of Variances on Statistical Theory and Practice.

    Authors: Joseph L. Gastwirth, Yulia R. Gel, Weiwen Miao
    Subjects: Methodology
    Abstract

    In many applications, the underlying scientific question concerns whether the
    variances of $k$ samples are equal. There are a substantial number of tests for
    this problem. Many of them rely on the assumption of normality and are not
    robust to its violation. In 1960 Professor Howard Levene proposed a new
    approach to this problem by applying the $F$-test to the absolute deviations of
    the observations from their group means. Levene's approach is powerful and
    robust to nonnormality and became a very popular tool for checking the
    homogeneity of variances.

  398. Interval Estimation for Messy Observational Data.

    Authors: Sander Greenland, Paul Gustafson
    Subjects: Methodology
    Abstract

    We review some aspects of Bayesian and frequentist interval estimation,
    focusing first on their relative strengths and weaknesses when used in "clean"
    or "textbook" contexts. We then turn attention to observational-data situations
    which are "messy," where modeling that acknowledges the limitations of study
    design and data collection leads to nonidentifiability.

  399. Inference and Modeling with Log-concave Distributions.

    Authors: Guenther Walther
    Subjects: Methodology
    Abstract

    Log-concave distributions are an attractive choice for modeling and
    inference, for several reasons: The class of log-concave distributions contains
    most of the commonly used parametric distributions and thus is a rich and
    flexible nonparametric class of distributions. Further, the MLE exists and can
    be computed with readily available algorithms. Thus, no tuning parameter, such
    as a bandwidth, is necessary for estimation. Due to these attractive
    properties, there has been considerable recent research activity concerning the
    theory and applications of log-concave distributions.

  400. Model Assessment Tools for a Model False World.

    Authors: Jiawei Liu, Bruce Lindsay
    Subjects: Methodology
    Abstract

    A standard goal of model evaluation and selection is to find a model that
    approximates the truth well while at the same time is as parsimonious as
    possible. In this paper we emphasize the point of view that the models under
    consideration are almost always false, if viewed realistically, and so we
    should analyze model adequacy from that point of view. We investigate this
    issue in large samples by looking at a model credibility index, which is
    designed to serve as a one-number summary measure of model adequacy.

  401. Likelihood Inference for Models with Unobservables: Another View.

    Authors: Youngjo Lee, John A. Nelder
    Subjects: Methodology
    Abstract

    There have been controversies among statisticians on (i) what to model and
    (ii) how to make inferences from models with unobservables. One such
    controversy concerns the difference between estimation methods for the marginal
    means not necessarily having a probabilistic basis and statistical models
    having unobservables with a probabilistic basis. Another concerns
    likelihood-based inference for statistical models with unobservables. This
    needs an extended-likelihood framework, and we show how one such extension,
    hierarchical likelihood, allows this to be done.

  402. Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation.

    Authors: Christian P. Robert, Jean-Michel Marin, Gilles Celeux, Mohammed El Anbari
    Subjects: Methodology
    Abstract

    We propose a global noninformative approach for Bayesian variable selection
    that builds on Zellner's g-priors and is similar to Liang et al. (2008). Our
    proposal does not require any kind of calibration. In the case of a benchmark,
    we compare Bayesian and frequentist regularization approaches under a low
    informative constraint when the number of variables is almost equal to the
    number of observations. The simulated and real dataset experiments we present
    here highlight the appeal of Bayesian regularization methods, when compared
    with alternatives.

  403. Group-Lasso on Splines for Spectrum Cartography.

    Authors: Georgios B. Giannakis, Juan A. Bazerque, Gonzalo Mateos
    Subjects: Methodology
    Abstract

    The unceasing demand for continuous situational awareness calls for
    innovative and large-scale signal processing algorithms, complemented by
    collaborative and adaptive sensing platforms to accomplish the objectives of
    layered sensing and control. Towards this goal, the present paper develops a
    spline-based approach to field estimation, which relies on a basis expansion
    model of the field of interest. The model entails known bases, weighted by
    generic functions estimated from the field's noisy samples.

  404. Validated Intraclass Correlation Statistics to Test Item Performance Models.

    Authors: Pierre Courrieu, Muriele Brand-D'Abrescia, Ronald Peereman, Daniel Spieler, Arnaud Rey
    Subjects: Methodology
    Abstract

    A new method, with an application program in Matlab code, is proposed for
    testing item performance models on empirical databases. This method uses data
    intraclass correlation statistics as expected correlations to which one
    compares simple functions of correlations between model predictions and
    observed item performance. The method rests on a data population model whose
    validity for the considered data is suitably tested, and has been verified for
    three behavioural measure databases.

  405. Minimum description length methods of medium-scale simultaneous inference.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    Nonparametric statistical methods developed for analyzing data for high
    numbers of genes, SNPs, or other biological features tend to overfit data with
    smaller numbers of features such as proteins, metabolites, or, when expression
    is measured with conventional instruments, genes. For this medium-scale
    inference problem, the minimum description length (MDL) framework quantifies
    the amount of information in the data supporting a null or alternative
    hypothesis for each feature in terms of parametric model selection. Two new MDL
    techniques are proposed.

  406. Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.

    Authors: Victor Chernozhukov, Alexandre Belloni, Lie Wang
    Subjects: Methodology
    Abstract

    We propose a pivotal method for estimating high-dimensional sparse linear
    regression models, where the overall number of regressors $p$ is large,
    possibly much larger than $n$, but only $s$ regressors are significant. The
    method is a modification of LASSO, called square-root LASSO. The method neither
    relies on the knowledge of the standard deviation $\sigma$ of the regression
    errors nor does it need to pre-estimate $\sigma$.

  407. Robust Shrinkage Estimation of High-dimensional Covariance Matrices.

    Authors: Alfred O. Hero III, Yilun Chen, Ami Wiesel
    Subjects: Methodology
    Abstract

    We address high dimensional covariance estimation for elliptical distributed
    samples, which are also known as spherically invariant random vectors (SIRV) or
    compound-Gaussian processes. Specifically we consider shrinkage methods that
    are suitable for high dimensional problems with a small number of samples
    (large $p$ small $n$). We start from a classical robust covariance estimator
    [Tyler(1987)], which is distribution-free within the family of elliptical
    distribution but inapplicable when $n<p$. Using a shrinkage coefficient, we
    regularize Tyler's fixed point iterations.

  408. On the identification of discrete graphical models with hidden nodes.

    Authors: Elena Stanghellini, Barbara Vantaggi
    Subjects: Methodology
    Abstract

    Conditions are presented for local identifiability of discrete undirected
    graphical models with a binary hidden node. These models can be obtained by
    extending the latent class model to allow for conditional associations between
    the observed variables. We establish a necessary and sufficient condition for
    the model to be locally identified almost everywhere in the parameter space and
    we provide expressions of the subspace where identifiability breaks down. The
    condition is based on the topology of the undirected graph and relies on the
    faithfulness assumption.

  409. Gaussian process single-index models as emulators for computer experiments.

    Authors: Heng Lian, Robert B. Gramacy
    Subjects: Methodology
    Abstract

    A single-index model (SIM) provides for parsimonious multi-dimensional
    nonlinear regression by combining parametric (linear) projection with
    univariate nonparametric (non-linear) regression models. We show that a
    particular Gaussian process (GP) formulation is simple to work with and ideal
    as an emulator for some types of computer experiment as it can outperform the
    canonical separable GP regression model commonly used in this setting.

  410. Robust Graphical Modeling with Classical and Alternative T-Distributions.

    Authors: Mathias Drton, Michael Finegold
    Subjects: Methodology
    Abstract

    Graphical Gaussian models have proven to be useful tools for exploring
    network structures based on multivariate data. Applications to studies of gene
    expression have generated substantial interest in these models, and resulting
    recent progress includes the development of fitting methodology involving
    penalization of the likelihood function. In this paper we advocate the use of
    multivariate t-distributions for more robust inference of graphs.

  411. Intrinsic Inference on the Mean Geodesic of Planar Shapes and Tree Discrimination by Leaf Growth.

    Authors: Stephan Huckemann
    Subjects: Methodology
    Abstract

    For planar landmark based shapes, taking into account the non-Euclidean
    geometry of the shape space, a statistical test for a common mean first
    geodesic principal component (GPC) is devised. It rests on one of two
    asymptotic scenarios, both of which are identical in a Euclidean geometry. For
    both scenarios, strong consistency and central limit theorems are established,
    along with an algorithm for the computation of a Ziezold mean geodesic.

  412. Power Euclidean metrics for covariance matrices with application to diffusion tensor imaging.

    Authors: Ian L. Dryden, Xavier Pennec, Jean-Marc Peyrat
    Subjects: Methodology
    Abstract

    Various metrics for comparing diffusion tensors have been recently proposed
    in the literature. We consider a broad family of metrics which is indexed by a
    single power parameter. A likelihood-based procedure is developed for choosing
    the most appropriate metric from the family for a given dataset at hand. The
    approach is analogous to using the Box-Cox transformation that is frequently
    investigated in regression analysis. The methodology is illustrated with a
    simulation study and an application to a real dataset of diffusion tensor
    images of canine hearts.

  413. Estimation of distribution functions in measurement error models.

    Authors: I. Dattner, B. Reiser
    Subjects: Methodology
    Abstract

    Many practical problems are related to the estimation of distribution
    functions when data contains measurement errors. For example, consider the
    estimation of the prevalence of a disease which is determined by some
    underlying biomarker, measured with error, having value greater than some known
    constant.

  414. Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples.

    Authors: Silvia L.P. Ferrari, Artur J. Lemonte
    Subjects: Methodology
    Abstract

    The two-parameter Birnbaum-Saunders distribution has been used succesfully to
    model fatigue failure times. Although censoring is typical in reliability and
    survival studies, little work has been published on the analysis of censored
    data for this distribution. In this paper, we address the issue of performing
    testing inference on the two parameters of the Birnbaum-Saunders distribution
    under type-II right censored samples.

  415. Robust Bayesian variable selection with sub-harmonic priors.

    Authors: Yuzo Maruyama, William E. Strawderman
    Subjects: Methodology
    Abstract

    This paper studies Bayesian variable selection in linear models with
    spherically symmetric error distributions. We give a series of proper prior
    distributions which converge in a certain sense to an improper prior
    distribution and for which the Bayes factor for each possible sub-model
    converges to the Bayes factor for the improper prior. This convergence
    justifies the use of the improper prior in variable selection. We also show
    that the resulting improper Bayes factors are independent of the particular
    sampling model when all sub-models are assumed to have the same error
    distribution.

  416. A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors.

    Authors: Arnaud Doucet, Anthony Lee, Francois Caron, Chris Holmes
    Subjects: Methodology
    Abstract

    Variable selection techniques have become increasingly popular amongst
    statisticians due to an increased number of regression and classification
    applications involving high-dimensional data where we expect some predictors to
    be unimportant.

  417. Kullback Leibler Divergence for Bayesian Networks with Complex Mean Structure.

    Authors: Jessica Kasza, Patty Solomon
    Subjects: Methodology
    Abstract

    In this paper, we compare two methods for the estimation of Bayesian networks
    given data containing exogenous variables. Firstly, we consider a fully
    Bayesian approach, where a prior distribution is placed upon the effects of
    exogenous variables, and secondly, we consider a restricted maximum likelihood
    approach to account for the effects of exogenous variables. We investigate the
    differences between these two approaches on posterior inference using the
    Kullback Leibler divergence.

  418. A Mixed Effects Model for Longitudinal Relational and Network Data, with Applications to International Trade and Conflict.

    Authors: Peter D. Hoff, Anton H. Westveld
    Subjects: Methodology
    Abstract

    The focus of this paper is an approach to the modeling of longitudinal social
    relational or network data. Such data arise from measurements on pairs of
    objects or actors made at regular temporal intervals, resulting in a social
    network for each point in time. In this article we represent the network and
    temporal dependencies with a random effects model, resulting in a stochastic
    process defined by a set of stationary covariance matrices.

  419. Model selection for weakly dependent time series forecasting.

    Authors: Olivier Wintenberger, Pierre Alquier
    Subjects: Methodology
    Abstract

    Observing a stationary time series, we propose a two-step procedure for the
    prediction of the next value of the time series. The first step follows machine
    learning theory paradigm and consists in determining a set of possible
    predictors as randomized estimators in (possibly numerous) different predictive
    models. The second step follows the model selection paradigm and consists in
    choosing one predictor with good properties among all the predictors of the
    first steps.

  420. Predicting Sequences of Progressive Events Times with Time-dependent Covariates.

    Authors: Song Cai, James V. Zidek, Nathaniel Newlands
    Subjects: Methodology
    Abstract

    This paper presents an approach to modeling progressive event-history data
    when the overall objective is prediction based on time-dependent covariates.
    This approach does not model the hazard function directly. Instead, it models
    the process of the state indicators of the event history so that the
    time-dependent covariates can be incorporated and predictors of the future
    events easily formulated. Our model can be applied to a range of real-world
    problems in medical and agricultural science.

  421. A log-Birnbaum-Saunders Regression Model with Asymmetric Errors.

    Authors: Artur J. Lemonte
    Subjects: Methodology
    Abstract

    The paper by Leiva et al. (2010) introduced a skewed version of the
    sinh-normal distribution, discussed some of its properties and characterized an
    extension of the Birnbaum-Saunders distribution associated with this
    distribution. In this paper, we introduce a skewed log-Birnbaum-Saunders
    regression model based on the skewed sinh-normal distribution. Some influence
    methods, such as the local influence and generalized leverage are presented.
    Additionally, we derived the normal curvatures of local influence under some
    perturbation schemes.

  422. Dynamic interactions in terms of senders, hubs, and receivers (SHR) using the singular value decomposition of time series: Theory and brain connectivity applications.

    Authors: Roberto D. Pascual-Marqui, Rolando J. Biscay-Lirio
    Subjects: Methodology
    Abstract

    Understanding of normal and pathological brain function requires the
    identification and localization of functional connections between specialized
    regions. The availability of high time resolution signals of electric neuronal
    activity at several regions offers information for quantifying the connections
    in terms of information flow. When the signals cover the whole cortex, the
    number of connections is very large, making visualization and interpretation
    very difficult.

  423. Combining individually valid and conditionally i.i.d. P-variables.

    Authors: Lutz Mattner
    Subjects: Methodology
    Abstract

    For a given testing problem, let $U_1,...,U_n$ be individually valid and
    conditionally on the data i.i.d.\ P-variables (often called P-values). For
    example, the data could come in groups, and each $U_i$ could be based on
    subsampling just one datum from each group in order to satisfy an independence
    assumption under the hypothesis. The problem is then to deterministically
    combine the $U_i$ into a valid summary P-variable.

  424. A J-function for inhomogeneous point processes.

    Authors: M.N.M. van Lieshout
    Subjects: Methodology
    Abstract

    We propose new summary statistics for intensity-reweighted moment stationary
    point processes that generalise the well known J-, empty space, and
    nearest-neighbour distance distribution functions, represent them in terms of
    generating functionals and conditional intensities, and relate them to the
    inhomogeneous reduced second moment function. Extensions to space time and
    marked point processes are briefly discussed.

  425. A-Collapsibility of Distribution Dependence and Quantile Regression Coefficients.

    Authors: Mark M. Meerschaert, P. Vellaisamy
    Subjects: Methodology
    Abstract

    The Yule-Simpson paradox notes that an association between random variables
    can be reversed when averaged over a background variable. Cox and Wermuth
    (2003) introduced the concept of distribution dependence between two random
    variables X and Y , and developed two dependence conditions, each of which
    guarantees that reversal cannot occur. Ma, Xie and Geng (2006) studied the
    collapsibility of distribution dependence over a background variable W, under a
    rather strong homogeneity condition.

  426. Modelling coloured residual noise in gravitational-wave signal processing.

    Authors: Christian R&#xf6;ver, Renate Meyer, Nelson Christensen
    Subjects: Methodology
    Abstract

    We introduce a signal processing model for signals in non-white noise, where
    the exact noise spectrum is a priori unknown. The model is based on a Student's
    t distribution and constitutes a natural generalization of the widely used
    normal (Gaussian) model. This way, it allows for uncertainty in the noise
    spectrum, or more generally is also able to accommodate outliers (heavy-tailed
    noise) in the data. Examples are given pertaining to data from gravitational
    wave detectors.

  427. Efficient and Robust Estimation for a Class of Generalized Linear Longitudinal Mixed Models.

    Authors: Ren&#xe9; Holst, Bent J&#xf8;rgensen
    Subjects: Methodology
    Abstract

    We propose a versatile and computationally efficient estimating equation
    method for a class of hierarchical multiplicative generalized linear mixed
    models with additive dispersion components, based on explicit modelling of the
    covariance structure. The class combines longitudinal and random effects models
    and retains a marginal as well as a conditional interpretation.

  428. Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data.

    Authors: Lutz Duembgen, Kaspar Rufibach, Andre Huesler
    Subjects: Methodology
    Abstract

    We develop an active set algorithm for the maximum likelihood estimation of a
    log-concave density based on complete data. Building on this fast algorithm, we
    indidate an EM algorithm to treat arbitrarily censored or binned data.

  429. Flexible Shrinkage Estimation in High-Dimensional Varying Coefficient Models.

    Authors: Heng Lian
    Subjects: Methodology
    Abstract

    We consider the problem of simultaneous variable selection and constant
    coefficient identification in high-dimensional varying coefficient models based
    on B-spline basis expansion. Both objectives can be considered as some type of
    model selection problems and we show that they can be achieved by a double
    shrinkage strategy. We apply the adaptive group Lasso penalty in models
    involving a diverging number of covariates, which can be much larger than the
    sample size, but we assume the number of relevant variables is smaller than the
    sample size via model sparsity.

  430. Combining spatial information sources while accounting for systematic errors in proxies.

    Authors: Christopher J. Paciorek
    Subjects: Methodology
    Abstract

    Environmental research increasingly uses high-dimensional remote sensing and
    numerical model output to help fill space-time gaps between traditional
    observations. Such output is often a noisy proxy for the process of interest.
    Thus one needs to separate and assess the signal and noise (often called
    discrepancy) in the proxy, given sparse observations and complicated
    spatio-temporal dependences. Here I extend a popular two-likelihood
    hierarchical model using a more flexible representation for the discrepancy.

  431. A Sticky HDP-HMM with Application to Speaker Diarization.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    We consider the problem of speaker diarization, the problem of segmenting an
    audio recording of a meeting into temporal segments corresponding to individual
    speakers. The problem is rendered particularly difficult by the fact that we
    are not allowed to assume knowledge of the number of people participating in
    the meeting. To address this problem, we take a Bayesian nonparametric approach
    to speaker diarization that builds on the hierarchical Dirichlet process hidden
    Markov model (HDP-HMM) of Teh et al. (2006).

  432. Separable covariance arrays via the Tucker product, with applications to multivariate relational data.

    Authors: Peter D. Hoff
    Subjects: Methodology
    Abstract

    Modern datasets are often in the form of matrices or arrays,potentially
    having correlations along each set of data indices. For example, data involving
    repeated measurements of several variables over time may exhibit temporal
    correlation as well as correlation among the variables. A possible model for
    matrix-valued data is the class of matrix normal distributions, which is
    parametrized by two covariance matrices, one for each index set of the data. In
    this article we describe an extension of the matrix normal model to accommodate
    multidimensional data arrays, or tensors.

  433. Peak Detection as Multiple Testing.

    Authors: Robert J. Adler, Armin Schwartzman, Yulia Gavrilov
    Subjects: Methodology
    Abstract

    This paper considers the problem of detecting equal-shaped non-overlapping
    unimodal peaks in the presence of Gaussian ergodic stationary noise, where the
    number, location and heights of the peaks are unknown. A multiple testing
    approach is proposed in which, after kernel smoothing, the presence of a peak
    is tested at each observed local maximum.

  434. Efficient statistical analysis of large correlated multivariate datasets: a case study on brain connectivity matrices.

    Authors: Djalel Eddine Meskaldji, Leila Cammoun, Patric Hagmann, Reto Meuli, Jean Philippe Thiran, Stephan Morgenthaler
    Subjects: Methodology
    Abstract

    In neuroimaging, a large number of correlated tests are routinely performed
    to detect active voxels in single-subject experiments or to detect regions that
    differ between individuals belonging to different groups. In order to bound the
    probability of a false discovery of pair-wise differences, a Bonferroni or
    other correction for multiplicity is necessary. These corrections greatly
    reduce the power of the comparisons which means that small signals
    (differences) remain hidden and therefore have been more or less successful
    depending on the application.

  435. Variable importance and model selection by decorrelation.

    Authors: Korbinian Strimmer, Verena Zuber
    Subjects: Methodology
    Abstract

    We introduce the CAR score, a simple criterion for ranking and selecting
    variables in linear regression that arises naturally in the best predictor
    formulation of the linear model. The CAR score measures the correlation between
    the response and the Mahalanobis-decorrelated predictors and reduces to
    marginal correlation if the predictors are uncorrelated. As a population
    quantity, the CAR score can be used irrespective of the choice of inference
    paradigm.

  436. Censoring Out-Degree Compromises Inferences of Social Network Contagion and Autocorrelation.

    Authors: Andrew C. Thomas
    Subjects: Methodology
    Abstract

    I examine the consequences of modelling contagious influence in a social
    network with incomplete edge information, namely in the situation where each
    individual may name a limited number of friends, so that extra outbound ties
    are censored. In particular, I consider a prototypical time series
    configuration where a property of the ``ego'' is affected in a causal fashion
    by the properties of their ``alters'' at a previous time point, both in the
    total number of alters as well as the deviation from a central value.

  437. Gaussian Process Models for Nonparametric Functional Regression with Functional Responses.

    Authors: Heng Lian
    Subjects: Methodology
    Abstract

    Recently nonparametric functional model with functional responses has been
    proposed within the functional reproducing kernel Hilbert spaces (fRKHS)
    framework. Motivated by its superior performance and also its limitations, we
    propose a Gaussian process model whose posterior mode coincide with the fRKHS
    estimator. The Bayesian approach has several advantages compared to its
    predecessor. Firstly, the multiple unknown parameters can be inferred together
    with the regression function in a unified framework.

  438. Hyper-g Priors for Generalized Linear Models.

    Authors: Daniel Saban&#xe9;s Bov&#xe9;, Leonhard Held
    Subjects: Methodology
    Abstract

    We develop an extension of the classical Zellner's g-prior to generalized
    linear models. The prior on the hyperparameter g is handled in a flexible way,
    so that any continuous proper hyperprior f(g) can be used, giving rise to a
    large class of hyper-g priors. Connections with the literature are described in
    detail. A fast and accurate integrated Laplace approximation of the marginal
    likelihood makes inference in large model spaces feasible. For posterior
    parameter estimation we propose an efficient and tuning-free
    Metropolis-Hastings sampler.

  439. Towards Nonstationary, Nonparametric Independent Process Analysis with Unknown Source Component Dimensions.

    Authors: Zoltan Szabo
    Subjects: Methodology
    Abstract

    The goal of this paper is to extend independent subspace analysis (ISA) to
    the case of (i) nonparametric, not strictly stationary source dynamics and (ii)
    unknown source component dimensions. We make use of functional autoregressive
    (fAR) processes to model the temporal evolution of the hidden sources.

  440. Adaptive post-Dantzig estimation and prediction for non-sparse "large $p$ and small $n$" models.

    Authors: Lu Lin, Lixing Zhu, Yujie Gai
    Subjects: Methodology
    Abstract

    For consistency (even oracle properties) of estimation and model prediction,
    almost all existing methods of variable/feature selection critically depend on
    sparsity of models. However, for ``large $p$ and small $n$" models sparsity
    assumption is hard to check and particularly, when this assumption is violated,
    the consistency of all existing estimations is usually impossible because
    working models selected by existing methods such as the LASSO and the Dantzig
    selector are usually biased. To attack this problem, we in this paper propose
    adaptive post-Dantzig estimation and model prediction.

  441. Penalized Likelihood Regression in Reproducing Kernel Hilbert Spaces with Randomized Covariate Data.

    Authors: Xiwen Ma, Bin Dai, Ronald Klein, Barbara E.K. Klein, Kristine E. Lee, Grace Wahba
    Subjects: Methodology
    Abstract

    Classical penalized likelihood regression problems deal with the case that
    the independent variables data are known exactly. In practice, however, it is
    common to observe data with incomplete covariate information. We are concerned
    with a fundamentally important case where some of the observations do not
    represent the exact covariate information, but only a probability distribution.
    In this case, the maximum penalized likelihood method can be still applied to
    estimating the regression function. We first show that the maximum penalized
    likelihood estimate exists under a mild condition.

  442. Calibrating Weibull priors using virtual data in reliability and risk assessment.

    Authors: Nicolas Bousquet
    Subjects: Methodology
    Abstract

    Based on expert opinions, informative prior elicitation for the common
    Weibull lifetime distribution usually presents some difficulties since it
    requires to elicit a two-dimensional joint prior. We consider here a
    reliability framework where the available expert information states directly in
    terms of prior predictive values (lifetimes) and not parameter values, which
    are less intuitive. The novelty of our procedure is to weigh the expert
    information by the size m of a virtual sample yielding a similar information,
    the prior being seen as a reference posterior.

  443. A decision theoretic approach for segmental classification using Hidden Markov models.

    Authors: Christopher Yau, Christopher C. Holmes
    Subjects: Methodology
    Abstract

    This paper is concerned with statistical methods for the analysis of linear
    sequence data using Hidden Markov Models (HMMs) where the task is to segment
    and classify the data according to the underlying hidden state sequence. Such
    analysis is commonplace in the empirical sciences including genomics, finance
    and speech processing.

  444. Assessing Characteristic Scales Using Wavelets.

    Authors: Michael J. Keim, Donald B. Percival
    Subjects: Methodology
    Abstract

    Characteristic scale is a notion that pervades the geophysical sciences, but
    it has no widely accepted precise definition. The wavelet transform decomposes
    a time series into coefficients that are associated with different scales. The
    variance of these coefficients can be used to decompose the variance of the
    time series across different scales. A practical definition for characteristic
    scale can be formulated in terms of peaks in plots of the wavelet variance
    versus scale.

  445. Reconstruction of a Low-rank Matrix in the Presence of Gaussian Noise.

    Authors: Andrey Shabalin, Andrew Nobel
    Subjects: Methodology
    Abstract

    In this paper we study the problem of reconstruction of a low-rank matrix
    observed with additive Gaussian noise. First we show that under mild
    assumptions (about the prior distribution of the signal matrix) we can restrict
    our attention to reconstruction methods that are based on the singular value
    decomposition of the observed matrix and act only on its singular values
    (preserving the singular vectors). Then we determine the effect of noise on the
    SVD of low-rank matrices by building a connection between matrix reconstruction
    problem and spiked population model in random matrix theory.

  446. Quasi-Concave Density Estimation.

    Authors: Ivan Mizera, Roger Koenker
    Subjects: Methodology
    Abstract

    Maximum likelihood estimation of a log-concave probability density is
    formulated as a convex optimization problem and shown to have an equivalent
    dual formulation as a constrained maximum Shannon entropy problem. Closely
    related maximum Renyi entropy estimators that impose weaker concavity
    restrictions on the fitted density are also considered, notably a minimum
    Hellinger discrepancy estimator that constrains the reciprocal of the
    square-root of the density to be concave. A limiting form of these estimators
    constrains solutions to the class of quasi-concave densities.

  447. Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process.

    Authors: Judith Rousseau, Nicolas Chopin, Brunero Liseo
    Subjects: Methodology
    Abstract

    A stationary Gaussian process is said to be long-range dependent (resp.
    anti-persistent) if its spectral density $f(\lambda)$ can be written as
    $f(\lambda)=|\lambda|^{-2d}g(|\lambda|)$, where $0< d < 1/2 (resp. -1/2 < d <
    0), and g is continuous. We propose a novel Bayesian nonparametric approach for
    the estimation of the spectral density of such processes. Within this approach,
    we prove posterior consistency for both d and g, under appropriate conditions
    on the prior distribution.

  448. Nonparametric quantile regression for twice censored data.

    Authors: Holger Dette, Stanislav Volgushev
    Subjects: Methodology
    Abstract

    We consider the problem of nonparametric quantile regression for twice
    censored data. Two new estimates are presented, which are constructed by
    applying concepts of monotone rearrangements to estimates of the conditional
    distribution function. The proposed methods avoid the problem of crossing
    quantile curves. Weak uniform consistency and weak convergence is established
    for both estimates and their finite sample properties are investigated by means
    of a simulation study.

  449. The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis.

    Authors: Russell Lyons
    Subjects: Methodology
    Abstract

    We present cautionary examples of what can go wrong when assumptions behind
    statistical procedures are insufficiently examined, even when the analysis is
    performed by highly reputed and otherwise careful practitioners. Our examples
    come from a series of recent papers by Christakis and Fowler that claim to have
    demonstrated the existence of transmission via social networks of various
    personal characteristics, including obesity, smoking cessation, happiness, and
    loneliness. Those papers also assert that such influence extends to three
    degrees of separation in social networks.

  450. Small-sample corrections for score tests in Birnbaum-Saunders regressions.

    Authors: Silvia L.P. Ferrari, Artur J. Lemonte
    Subjects: Methodology
    Abstract

    In this paper we deal with the issue of performing accurate small-sample
    inference in the Birnbaum-Saunders regression model, which can be useful for
    modeling lifetime or reliability data. We derive a Bartlett-type correction for
    the score test and numerically compare the corrected test with the usual score
    test, the likelihood ratio test and its Bartlett-corrected version. Our
    simulation results suggest that the corrected test we propose is more reliable
    than the other tests.

  451. Adaptive estimation of vector autoregressive models with time-varying variance: application to testing linear causality in mean.

    Authors: Valentin Patilea, Hamdi Ra&#xef;ssi
    Subjects: Methodology
    Abstract

    Linear Vector AutoRegressive (VAR) models where the innovations could be
    unconditionally heteroscedastic and serially dependent are considered. The
    volatility structure is deterministic and quite general, including breaks or
    trending variances as special cases. In this framework we propose Ordinary
    Least Squares (OLS), Generalized Least Squares (GLS) and Adaptive Least Squares
    (ALS) procedures.

  452. A new lifetime model with decreasing failure rate.

    Authors: Wagner Barreto-Souza, Hassan S. Bakouch
    Subjects: Methodology
    Abstract

    In this paper we introduce a new lifetime distribution by compounding
    exponential and Poisson-Lindley distributions, named exponential
    Poisson-Lindley distribution. Several properties are derived, such as density,
    failure rate, mean lifetime, moments, order statistics and R\'enyi entropy.
    Furthermore, estimation by maximum likelihood and inference for large sample
    are discussed. The paper is motivated by two applications to real data sets and
    we hope that this model be able to attract wider applicability in survival and
    reliability.

  453. Statistical Inference in Dynamic Treatment Regimes.

    Authors: Eric Laber, Min Qian, Dan J. Lizotte, Susan A. Murphy
    Subjects: Methodology
    Abstract

    Dynamic treatment regimes, also known as treatment policies, are increasingly
    being used to operationalize clinical decision making associated with long-term
    patient care. Common approaches to constructing a dynamic treatment regime from
    data, such as Q-learning, employ non-smooth functionals of the data. Therefore,
    simple inferential tasks such as constructing a confidence interval for the
    parameters in the Q-function are complicated by non-regular asymptotics under
    certain commonly-encountered gen- erative models.

  454. Noise Invalidation Denoising.

    Authors: Soosan Beheshti, Masoud Hashemi, Xiao-Ping Zhang, Nima Nikvand
    Subjects: Methodology
    Abstract

    A denoising technique based on noise invalidation is proposed. The adaptive
    approach derives a noise signature from the noise order statistics and utilizes
    the signature to denoise the data. The novelty of this approach is in
    presenting a general-purpose denoising in the sense that it does not need to
    employ any particular assumption on the structure of the noise-free signal,
    such as data smoothness or sparsity of the coefficients. An advantage of the
    method is in denoising the corrupted data in any complete basis transformation
    (orthogonal or non-orthogonal).

  455. A Statistical Social Network Model for Consumption Data in Food Webs.

    Authors: Anton H. Westveld, Grace S. Chiu
    Subjects: Methodology
    Abstract

    We adapt existing statistical modelling techniques for social networks to
    study consumption data observed in food webs. These data describe the feeding
    among organisms grouped into nodes that form the food web. Model complexity
    arises due to the extensive amount of zeros in the data, as each node in the
    web is predator / prey to only a small number of other nodes.

  456. About incoherent inference.

    Authors: Christian P. Robert
    Subjects: Methodology
    Abstract

    In Templeton (2010), the Approximate Bayesian Computation (ABC) algorithm
    (see, e.g., Pritchard et al., 1999, Beaumont et al., 2002, Marjoram et al.,
    2003, Ratmann et al., 2009) is criticised on mathematical and logical grounds:
    "the [Bayesian] inference is mathematically incorrect and formally illogical".
    Since those criticisms turn out to be bearing on Bayesian foundations rather
    than on the computational methodology they are primarily directed at, we
    endeavour to point out in this note the statistical errors and inconsistencies
    in Templeton (2010), refering to Beaumont et al.

  457. Adaptive Optimal Scaling of Metropolis-Hastings Algorithms Using the Robbins-Monro Process.

    Authors: Y. Fan, S. A. Sisson, P. H. Garthwaite
    Subjects: Methodology
    Abstract

    We present an adaptive method for the automatic scaling of Random-Walk
    Metropolis-Hastings algorithms, which quickly and robustly identifies the
    scaling factor that yields a specified overall sampler acceptance probability.
    Our method relies on the use of the Robbins-Monro search process, whose
    performance is determined by an unknown steplength constant. We give a very
    simple estimator of this constant for proposal distributions that are
    univariate or multivariate normal, together with a sampling algorithm for
    automating the method.

  458. Redescending M-estimators and Deterministic Annealing, with Applications to Robust Regression and Tail Index Estimation.

    Authors: Rudolf Fr&#xfc;hwirth, Wolfgang Waltenberger
    Subjects: Methodology
    Abstract

    A new type of redescending M-estimators is constructed, based on data
    augmentation with an unspecified outlier model. Necessary and sufficient
    conditions for the convergence of the resulting estimators to the Hubertype
    skipped mean are derived. By introducing a temperature parameter the concept of
    deterministic annealing can be applied, making the estimator insensitive to the
    starting point of the iteration. The properties of the annealing M-estimator as
    a function of the temperature are explored. Finally, two applications are
    presented.

  459. G1-Renewal Process as Repairable System Model.

    Authors: Mark Kaminskiy, Vasiliy Krivtsov
    Subjects: Methodology
    Abstract

    This paper considers a point process model with a monotonically decreasing or
    increasing ROCOF and the underlying distributions from the location-scale
    family. In terms of repairable system reliability analysis, the process is
    capable of modeling various restoration types including "better-than-new",
    i.e., the one not covered by the popular G-Renewal model (Kijima & Sumita,
    1986).

  460. Multi-color Randomly Reinforced Urn for Adaptive Designs.

    Authors: Feifang Hu, Li-Xin Zhang, Siu Hung Cheung, Wei Sum Chan
    Subjects: Methodology
    Abstract

    This paper is withdrawn

  461. On signal and extraneous roots in Singular Spectrum Analysis.

    Authors: Konstantin Usevich
    Subjects: Methodology
    Abstract

    In the present paper we study properties of roots of characteristic
    polynomials for the linear recurrent formulae (LRF) that govern time series. We
    also investigate how the values of these roots affect Singular Spectrum
    Analysis implications, in what concerns separation of components, SSA
    forecasting and related signal parameter estimation methods. The roots of the
    characteristic polynomial for an LRF comprise the signal roots, which determine
    the structure of the time series, and extraneous roots.

  462. Group Variable Selection via a Hierarchical Lasso and Its Oracle Property.

    Authors: Ji Zhu, Nengfeng Zhou
    Subjects: Methodology
    Abstract

    In many engineering and scientific applications, prediction variables are
    grouped, for example, in biological applications where assayed genes or
    proteins can be grouped by biological roles or biological pathways. Common
    statistical analysis methods such as ANOVA, factor analysis, and functional
    modeling with basis sets also exhibit natural variable groupings.

  463. LASSO ISOtone for High Dimensional Additive Isotonic Regression.

    Authors: Nicolai Meinshausen, Zhou Fang
    Subjects: Methodology
    Abstract

    Additive isotonic regression attempts to determine the relationship between a
    multi-dimensional observation variable and a response, under the constraint
    that the estimate is the additive sum of univariate component effects that are
    monotonically increasing. In this article, we present a new method for such
    regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear
    modelling to additive isotonic regression. Thus, it is viable in many
    situations with high dimensional predictor variables, where selection of
    significant versus insignificant variables are required.

  464. Outlier Detection Using Nonconvex Penalized Regression.

    Authors: Art B. Owen, Yiyuan She
    Subjects: Methodology
    Abstract

    This paper studies the outlier detection problem from the point of view of
    penalized regressions. Our regression model adds one mean shift parameter for
    each of the $n$ data points. We then apply a regularization favoring a sparse
    vector of mean shift parameters. The usual $L_1$ penalty yields a convex
    criterion, but we find that it fails to deliver a robust estimator. The $L_1$
    penalty corresponds to soft thresholding. We introduce a thresholding (denoted
    by $\Theta$) based iterative procedure for outlier detection ($\Theta$-IPOD).

  465. A Probabilistic Perspective on Gaussian Filtering and Smoothing.

    Authors: Marc Peter Deisenroth, Henrik Ohlsson
    Subjects: Methodology
    Abstract

    We present a general probabilistic perspective on Gaussian filtering and
    smoothing. We show that different approaches to Gaussian filtering/smoothing
    can be distinguished solely by their methods of computing means and covariances
    of joint probabilities. New filters and smoothers can therefore be derived
    easily by providing methods for computing these moments. From the probabilistic
    perspective, we additionally derive general sufficient conditions for
    unbiasedness and optimality of Gaussian filters in linear and nonlinear dynamic
    systems.

  466. Characterization of a subclass of Tweedie distributions by a property of generalized stability.

    Authors: Lev B. Klebanov, Grigory Temnov
    Subjects: Methodology
    Abstract

    We introduce a class of distributions originating from an exponential family
    and having a property related to the strict stability property. A
    characteristic function representation for this family is obtained and its
    properties are investigated. The proposed class relates to stable distributions
    and includes Inverse Gaussian distribution and Levy distribution as special
    cases.

  467. Auxiliary Particle filtering within adaptive Metropolis-Hastings Sampling.

    Authors: Ralph Silva, Paolo Giordani, Robert Kohn, Michael Pitt
    Subjects: Methodology
    Abstract

    Our article deals with Bayesian inference for a general state space model
    with the simulated likelihood computed by the particle filter. We show
    empirically that the partially or fully adapted particle filters can be much
    more efficient than the standard particle, especially when the signal to noise
    ratio is high. This is especially important because using the particle filter
    within MCMC sampling is O(T^2), where T is the sample size.

  468. Particle Filter-Based On-Line Estimation of Spot Volatility with Nonlinear Market Microstructure Noise Models.

    Authors: Rainer Dahlhaus, Jan C. Neddermeyer
    Subjects: Methodology
    Abstract

    A new technique for the on-line estimation of spot volatility for
    high-frequency data is developed. The algorithm works directly on the
    transaction data and updates the volatility estimate immediately after the
    occurrence of a new transaction. We make a clear distinction between volatility
    per time unit and volatility per transaction and provide estimators for both. A
    new nonlinear market microstructure noise model is proposed that reproduces the
    major stylized facts of high-frequency data.

  469. Asymptotic Properties of Self-Normalized Linear Processes with Long Memory.

    Authors: Magda Peligrad, Hailin Sang
    Subjects: Methodology
    Abstract

    In this paper we study the central limit theorem in its functional form for
    time series with long memory having independent innovations with infinite
    second moment. For the sake of applications we derive the self-normalized
    version of this theorem. The study is motivated by models arising in economical
    applications where often the linear processes have long memory, the innovations
    have long tails and coefficients are not summable.

  470. Testing randomness of spatial point patterns with the Ripley statistic.

    Authors: Gabriel Lang, Eric Marcon
    Subjects: Methodology
    Abstract

    Aggregation patterns are often visually detected in sets of location data.
    These clusters may be the result of interesting dynamics or the effect of pure
    randomness. We build an asymptotically Gaussian test for the hypothesis of
    randomness corresponding to a Poisson point process. We first compute the exact
    first and second moment of the Ripley K-statistic under the homogeneous Poisson
    point process model. Then we prove the asymptotic normality of a vector of such
    statistics for different scales and compute its covariance matrix.

  471. The coalescent and its descendants.

    Authors: Peter Donnelly, Stephen Leslie
    Subjects: Methodology
    Abstract

    The coalescent revolutionised theoretical population genetics, simplifying,
    or making possible for the first time, many analyses, proofs, and derivations,
    and offering crucial insights about the way in which the structure of data in
    samples from populations depends on the demographic history of the population.
    However statistical inference under the coalescent model is extremely
    challenging, effectively because no explicit expressions are available for key
    sampling probabilities.

  472. Copula Processes.

    Authors: Zoubin Ghahramani, Andrew Gordon Wilson
    Subjects: Methodology
    Abstract

    We define a copula process which describes the dependencies between
    arbitrarily many random variables independently of their marginal
    distributions. As an example, we develop a stochastic volatility model,
    Gaussian Copula Process Volatility (GCPV), to predict the latent standard
    deviations of a sequence of random variables. To learn the parameters of GCPV
    we use Bayesian inference, with the Laplace approximation, and with Markov
    chain Monte Carlo as an alternative. We find both methods comparable. We also
    find our model can outperform GARCH, on simulated and financial data.

  473. Sparse covariance thresholding for high-dimensional variable selection.

    Authors: X. Jessie Jeng And Z. John Daye
    Subjects: Methodology
    Abstract

    In high-dimensions, many variable selection methods, such as the lasso, are
    often limited by excessive variability and rank deficiency of the sample
    covariance matrix. Covariance sparsity is a natural phenomenon in
    high-dimensional applications, such as microarray analysis, image processing,
    etc., in which a large number of predictors are independent or weakly
    correlated. In this paper, we propose the covariance-thresholded lasso, a new
    class of regression methods that can utilize covariance sparsity to improve
    variable selection.

  474. Tree-Structured Stick Breaking Processes for Hierarchical Data.

    Authors: Michael I. Jordan, Ryan Prescott Adams, Zoubin Ghahramani
    Subjects: Methodology
    Abstract

    Many data are naturally modeled by an unobserved hierarchical structure. In
    this paper we propose a flexible nonparametric prior over unknown data
    hierarchies. The approach uses nested stick-breaking processes to allow for
    trees of unbounded width and depth, where data can live at any node and are
    infinitely exchangeable. One can view our model as providing infinite mixtures
    where the components have a dependency structure corresponding to an
    evolutionary diffusion down a tree.

  475. On Particle Learning.

    Authors: Christian P. Robert, Nicolas Chopin, Jean-Michel Marin, Alessandra Iacobucci, Kerrie Mengersen, Robin Ryder, Christian Sch&#xe4;fer
    Subjects: Methodology
    Abstract

    This document is the aggregation of several discussions of Lopes et al.
    (2010) we submitted to the proceedings of the Ninth Valencia Meeting, held in
    Benidorm, Spain, on June 3-8, 2010, in conjunction with Hedibert Lopes' talk at
    this meeting. The main point in those discussions is the potential for
    degeneracy in the particle learning methodology, related with the exponential
    forgetting of the past simulations. We illustrate the resulting difficulties in
    the case of mixtures.

  476. A generalized Multiple-try Metropolis version of the Reversible Jump algorithm.

    Authors: S. Pandolfi, F. Bartolucci, N. Friel
    Subjects: Methodology
    Abstract

    The Reversible Jump (RJ) algorithm (Green, 1995) is one of the most used
    Markov chain Monte Carlo algorithms for Bayesian estimation and model
    selection. We propose a generalized Multiple-try version of this algorithm
    which is based on drawing several proposals at each step and randomly choosing
    one of them on the basis of weights (selection probabilities) that may be
    arbitrary chosen. Along the same lines as in Pandolfi et al. (2010), we exploit
    among the possible choices, a method based on selection probabilities depending
    on a quadratic approximation of the posterior distribution.

  477. Semiparametric Regression in Testicular Germ Cell Data.

    Authors: Anastasia Voulgaraki, Benjamin Kedem, Barry I. Graubard
    Subjects: Methodology
    Abstract

    It is possible to approach regression analysis with random covariates from a
    semiparametric perspective where information is combined from multiple
    multivariate sources. The approach assumes a semiparametric density ratio model
    where multivariate distributions are "regressed" on a reference distribution.
    Each multivariate distribution and a corresponding conditional
    expectation-regression-of interest is then estimated from the combined data
    from all sources. Graphical and quantitative diagnostic tools are suggested to
    assess model validity.

  478. Hierarchical multilinear models for multiway data.

    Authors: Peter Hoff
    Subjects: Methodology
    Abstract

    Reduced-rank decompositions provide descriptions of the variation among the
    elements of a matrix or array. In such decompositions, the elements of an array
    are expressed as products of low-dimensional latent factors. This article
    presents a model-based version of such a decomposition, extending the scope of
    reduced rank methods to accommodate a variety of data types such as
    longitudinal social networks and continuous multivariate data that are
    cross-classified by categorical variables.

  479. Bayesian clustering of decomposable graphs.

    Authors: Luke Bornn, Fran&#xe7;ois Caron
    Subjects: Methodology
    Abstract

    In this paper we propose a class of prior distributions on decomposable
    graphs, allowing for improved modeling flexibility. While existing methods
    solely penalize the number of edges, the proposed work empowers practitioners
    to control clustering, level of separation, and other features of the graph.
    Emphasis is placed on a particular prior distribution which derives its
    motivation from the class of product partition models; the properties of this
    prior relative to existing priors is examined through theory and simulation.

  480. RIP-Based Near-Oracle Performance Guarantees for Subspace-Pursuit, CoSaMP, and Iterative Hard-Thresholding.

    Authors: Michael Elad, Raja Giryes
    Subjects: Methodology
    Abstract

    This paper presents an average case denoising performance analysis for the
    Subspace Pursuit (SP), the CoSaMP and the IHT algorithms. This analysis
    considers the recovery of a noisy signal, with the assumptions that (i) it is
    corrupted by an additive random white Gaussian noise; and (ii) it has a
    K-sparse representation with respect to a known dictionary D. The proposed
    analysis is based on the Restricted-Isometry-Property (RIP), establishing a
    near-oracle performance guarantee for each of these algorithms.

  481. The role of the nugget term in the Gaussian process method.

    Authors: Andrey Pepelyshev
    Subjects: Methodology
    Abstract

    The maximum likelihood estimate of the correlation parameter of a Gaussian
    process with and without of a nugget term is studied in the case of the
    analysis of deterministic models.

  482. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data.

    Authors: Adrian Dobra, Alex Lenkoski, Abel Rodriguez
    Subjects: Methodology
    Abstract

    We introduce efficient Markov chain Monte Carlo methods for inference and
    model determination in multivariate and matrix-variate Gaussian graphical
    models. Our framework is based on the G-Wishart prior for the precision matrix
    associated with graphs that can be decomposable or non-decomposable. We extend
    our sampling algorithms to a novel class of conditionally autoregressive models
    for sparse estimation in multivariate lattice data, with a special emphasis on
    the analysis of spatial data.

  483. Measures of Variability for Bayesian Network Graphical Structures.

    Authors: Marco Scutari
    Subjects: Methodology
    Abstract

    The structure of a Bayesian network includes a great deal of information
    about the probability distribution of the data, which is uniquely identified
    given some general distributional assumptions. Therefore it's important to
    study its variability, which can be used to compare the performance of
    different learning algorithms and to measure the strength of any arbitrary
    subset of arcs.

  484. On the Estimation of the Heavy-Tail Exponent in Time Series using the Max-Spectrum.

    Authors: George Michailidis, Stilian A Stoev
    Subjects: Methodology
    Abstract

    This paper addresses the problem of estimating the tail index of
    distributions with heavy, Pareto-type tails for dependent data, that is of
    interest in the areas of finance, insurance, environmental monitoring and
    teletraffic analysis. A novel approach based on the max self-similarity scaling
    behavior of block maxima is introduced. The method exploits the increasing lack
    of dependence of maxima over large size blocks, which proves useful for time
    series data.

  485. On the estimation of the extremal index based on scaling and resampling.

    Authors: Stilian A. Stoev, George Michailidis, Kamal Hamidieh
    Subjects: Methodology
    Abstract

    The extremal index parameter theta characterizes the degree of local
    dependence in the extremes of a stationary time series and has important
    applications in a number of areas, such as hydrology, telecommunications,
    finance and environmental studies. In this study, a novel estimator for theta
    based on the asymptotic scaling of block-maxima and resampling is introduced.
    It is shown to be consistent and asymptotically normal for a large class of
    m-dependent time series.

  486. On the choice of parameters in Singular Spectrum Analysis and related subspace-based methods.

    Authors: Nina Golyandina
    Subjects: Methodology
    Abstract

    In the present paper we investigate methods related to both the Singular
    Spectrum Analysis (SSA) and subspace-based methods. We describe common and
    specific features of these methods and consider different kinds of problems
    solved by them such as signal reconstruction, forecasting and parameter
    estimation. General recommendations on the choice of parameters to obtain
    minimal errors are provided. We demonstrate that the optimal choice depends on
    the particular problem.

  487. Sensitivity of health-related scales is a non-decreasing function of their classes.

    Authors: Vasileios Maroulas, Demosthenes B. Panagiotakos
    Subjects: Methodology
    Abstract

    In biomedical research the use of discrete scales which describe
    characteristics of individuals are widely applied for the evaluation of
    clinical conditions. However, the number of classes (partitions) used in a
    discrete scale has never been mathematically evaluated against the accuracy of
    a scale to predict the true cases. This work, using as accuracy markers the
    sensitivity and specificity, revealed that the number of classes of a discrete
    scale affects its estimating ability of correctly classifying the true
    diseased.

  488. Simulation-based Regularized Logistic Regression.

    Authors: Robert B. Gramacy, Nicholas G. Polson
    Subjects: Methodology
    Abstract

    We develop simulation-based methods for regularized logistic regression by
    exploiting normal mixtures in two ways: using z-distributions to represent the
    logistic likelihood, and using mixtures of stable distributions to implement
    regularization penalties including the lasso. By carefully choosing the
    z-distribution parameterization, and choosing how regularization is applied, we
    obtain subtly different MCMC sampling schemes with varying efficiency depending
    on the data type (binary v.

  489. Size and power properties of some tests in the Birnbaum-Saunders regression model.

    Authors: Artur Lemonte, Silvia Ferrari
    Subjects: Methodology
    Abstract

    The Birnbaum-Saunders distribution has been used quite effectively to model
    times to failure for materials subject to fatigue and for modeling lifetime
    data. In this paper we obtain asymptotic expansions, up to order $n^{-1/2}$ and
    under a sequence of Pitman alternatives, for the nonnull distribution functions
    of the likelihood ratio, Wald, score and gradient test statistics in the
    Birnbaum-Saunders regression model. The asymptotic distributions of all four
    statistics are obtained for testing a subset of regression parameters and for
    testing the shape parameter.

  490. Community extraction for social networks.

    Authors: Ji Zhu, Yunpeng Zhao, Elizaveta Levina
    Subjects: Methodology
    Abstract

    Analysis of networks and in particular discovering communities within
    networks has been a focus of recent work in several fields, with applications
    ranging from citation and friendship networks to food webs and gene regulatory
    networks. Most of the existing community detection methods focus on
    partitioning the entire network into communities, with the expectation of many
    ties within communities and few ties between. However, many networks contain
    nodes that do not fit in with any of the communities, and forcing every node
    into a community can distort results.

  491. Random nonlinear model with missing responses.

    Authors: Gabriela Ciuperca
    Subjects: Methodology
    Abstract

    A nonlinear model with response variable missing at random is studied. In
    order to improve the coverage accuracy, the empirical likelihood ratio (EL)
    method is considered. The asymptotic distribution of EL statistic and also of
    its approximation is $\chi^2$ if the parameters are estimated using least
    squares(LS) or least absolute deviation(LAD) method on complete data. When the
    response are reconstituted using a semiparametric method, the empirical
    log-likelihood associated on imputed data is also asymptotically $\chi^2$.

  492. Ecological non-linear state space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC).

    Authors: Gareth W. Peters, Geoff R. Hosack, Keith R. Hayes
    Subjects: Methodology
    Abstract

    We develop a novel advanced Particle Markov chain Monte Carlo algorithm that
    is capable of sampling from the posterior distribution of non-linear state
    space models for both the unobserved latent states and the unknown model
    parameters. We apply this novel methodology to five population growth models,
    including models with strong and weak Allee effects, and test if it can
    efficiently sample from the complex likelihood surface that is often associated
    with these models.

  493. Robustness of Optimal Designs for 2^2 Experiments with Binary Response.

    Authors: Jie Yang, Abhyuday Mandal, Dibyen Majumdar
    Subjects: Methodology
    Abstract

    We consider an experiment with two qualitative factors at 2 levels each and a
    binary response, that follows a generalized linear model. In Mandal, Yang and
    Majumdar (2010) we obtained basic results and characterizations of locally
    D-optimal designs for special cases. As locally optimal designs depend on the
    assumed parameter values, a critical issue is the sensitivity of the design to
    misspecification of these values. In this paper we study the sensitivity
    theoretically and by simulation, and show that the optimal designs are quite
    robust.

  494. A self-normalized approach to confidence interval construction in time series.

    Authors: Xiaofeng Shao
    Subjects: Methodology
    Abstract

    We propose a new method to construct confidence intervals for quantities that
    are associated with a stationary time series, which avoids direct estimation of
    the asymptotic variances. Unlike the existing tuning-parameter-dependent
    approaches, our method has the attractive convenience of being free of choosing
    any user-chosen number or smoothing parameter. The interval is constructed on
    the basis of an asymptotically distribution-free self-normalized statistic, in
    which the normalizing matrix is computed using recursive estimates.

  495. On the Estimation of Integrated Covariance Matrices of High Dimensional Diffusion Processes.

    Authors: Xinghua Zheng, Yingying Li
    Subjects: Methodology
    Abstract

    We consider the estimation of integrated covariance matrices of high
    dimensional diffusion processes by using high frequency data. We start by
    studying the most commonly used estimator, the realized covariance matrix
    (RCV). We show that in the high dimensional case when the dimension p and the
    observation frequency n grow in the same rate, the limiting empirical spectral
    distribution of RCV depends on the covolatility processes not only through the
    underlying integrated covariance matrix Sigma, but also on how the covolatility
    processes vary in time.

  496. Yet another breakdown point notion: EFSBP.

    Authors: Peter Ruckdeschel, Nataliya Horbenko
    Subjects: Methodology
    Abstract

    The breakdown point is one the central notions to quantify the global
    robustness of a procedure. Since its introduction in Hampel (1968), several
    variants of this definition have been given in the literature.

  497. Parametric inference in a perturbed gamma degradation process.

    Authors: Laurent Bordes, Christian Paroissin, Ali Salami
    Subjects: Methodology
    Abstract

    We consider the gamma process perturbed by a Brownian motion (independent of
    the gamma process) as a degradation model. Parameters estimation is studied
    here. We assume that $n$ independent items are observed at irregular instants.
    From these observations, we estimate the parameters using the moments method.
    Then, we study the asymptotic properties of the estimators. Furthermore we
    derive some particular cases of items observed at regular or non-regular
    instants. Finally, some numerical simulations and two real data applications
    are provided to illustrate our method.

  498. Reconstructing DNA Copy Number by Penalized Estimation and Imputation.

    Authors: Kenneth Lange, Zhongyang Zhang, Roel Ophoff, Chiara Sabatti
    Subjects: Methodology
    Abstract

    Recent advances in genomics have underscored the surprising ubiquity of DNA
    copy number variation (CNV). Fortunately, modern genotyping platforms also
    detect CNVs with fairly high reliability. Hidden Markov models and algorithms
    have played a dominant role in the interpretation of CNV data. Here we explore
    CNV reconstruction via estimation with a fused-lasso penalty as suggested by
    Tibshirani and Wang (2008).

  499. Strong uniform consistency and asymptotic normality of a kernel based error density estimator in functional autoregressive models.

    Authors: Nadine Hilgert, Bruno Portier
    Subjects: Methodology
    Abstract

    Estimating the innovation probability density is an important issue in any
    regression analysis. This paper focuses on functional autoregressive models. A
    residual-based kernel estimator is proposed for the innovation density.
    Asymptotic properties of this estimator depend on the average prediction error
    of the functional autoregressive function. Sufficient conditions are studied to
    provide strong uniform consistency and asymptotic normality of the kernel
    density estimator.

  500. A Majorization-Minimization Approach to Variable Selection Using Spike and Slab Priors.

    Authors: Tso-Jung Yen
    Subjects: Methodology
    Abstract

    We develop a method to carry out MAP estimation for a class of Bayesian
    regression models in which coefficients are assigned with Gaussian-based spike
    and slab priors weighted by Bernoulli variables. Unlike simulation-based
    inference methods, the proposed method directly optimizes the logarithm of the
    joint posterior density for parameter estimation. The corresponding
    optimization problem has an objective function in Lagrangian form in that
    regression coefficients are regularized by a mixture of squared $l_{2}$ and
    $l_{0}$ norms.

  501. Missing values: sparse inverse covariance estimation and an extension to sparse regression.

    Authors: Peter B&#xfc;hlmann, Nicolas St&#xe4;dler
    Subjects: Methodology
    Abstract

    We propose an l1-regularized likelihood method for estimating the inverse
    covariance matrix in the high-dimensional multivariate normal model in presence
    of missing data. Our method is based on the assumption that the data are
    missing at random (MAR) which entails also the completely missing at random
    case. The implementation of the method is non-trivial as the observed negative
    log-likelihood generally is a complicated and non-convex function. We propose
    an efficient EM-algorithm for optimization with provable numerical convergence
    properties.

  502. Pattern Alternating Maximization Algorithm for High-Dimensional Missing Data.

    Authors: Nicolas Staedler, Peter Buehlmann
    Subjects: Methodology
    Abstract

    We propose a new and computationally efficient algorithm for maximizing the
    observed log-likelihood for a multivariate normal data matrix with missing
    values. We show that our procedure based on iteratively regressing the missing
    on the observed variables, generalizes the traditional EM algorithm by
    alternating between different complete data spaces and performing the E-Step
    incrementally. In this non-standard setup we prove numerical convergence to a
    stationary point of the observed log-likelihood.

  503. Locally most powerful sequential tests of a simple hypothesis vs. One-sided alternatives for independent observations.

    Authors: Andrey Novikov, Petr Novikov
    Subjects: Methodology
    Abstract

    Let $X_1,X_2,..., X_n,...$ be a stochastic process with independent values
    whose distribution $P_\theta$ depends on an unknown parameter $\theta$,
    $\theta\in\Theta$, where $\Theta$ is an open subset of the real line. The
    problem of testing $H_0:$ $\theta=\theta_0$ vs. a composite alternative $H_1:$
    $\theta>\theta_0$ is considered, where $\theta_0\in\Theta$ is a fixed value of
    the parameter. The main objective of this work is the characterization of the
    structure of the locally most powerful (in the sense of Berk) sequential tests
    in this problem.

  504. Optimization Under Unknown Constraints.

    Authors: Robert B. Gramacy, Herbert K. H. Lee
    Subjects: Methodology
    Abstract

    Optimization of complex functions, such as the output of computer simulators,
    is a difficult task that has received much attention in the literature. A less
    studied problem is that of optimization under unknown constraints, i.e., when
    the simulator must be invoked both to determine the typical real-valued
    response and to determine if a constraint has been violated, either for
    physical or policy reasons. We develop a statistical approach based on Gaussian
    processes and Bayesian learning to both approximate the unknown function and
    estimate the probability of meeting the constraints.

  505. A nonparametric approach for relevance determination.

    Authors: Babak Shahbaba
    Subjects: Methodology
    Abstract

    The objective of many high throughput studies is to identify factors that are
    relevant to an outcome of interest. Such studies are abundant in genetics,
    image processing, astrophysics, and neuroscience. In this paper, we argue that
    treating these problems as large-scale hypothesis tests does not reflect the
    motivation behind the studies. Instead, we suggest treating these studies as a
    decision problem, where our primary concern is selecting the most relevant set
    of factors for a more focused follow up.

  506. M-decomposability, elliptical unimodal densities, and applications to clustering and kernel density estimation.

    Authors: Nicholas Chia, Junji Nakano
    Subjects: Methodology
    Abstract

    Chia and Nakano (2009) introduced the concept of M-decomposability of
    probability densities in one-dimension. In this paper, we generalize
    M-decomposability to any dimension. We prove that all elliptical unimodal
    densities are M-undecomposable. We also derive an inequality to show that it is
    better to represent an M-decomposable density via a mixture of unimodal
    densities. Finally, we demonstrate the application of M-decomposability to
    clustering and kernel density estimation, using real and simulated data.

  507. Estimation of the autocovariance function with missing observations.

    Authors: Eric Moulines, Paul Doukhan, Natalia Bahamonde
    Subjects: Methodology
    Abstract

    We propose a novel estimator of the autocorrelation function in presence of
    missing observations. We establish the consistency, the asymptotic normality,
    and we derive deviation bounds for various classes of weakly dependent
    stationary time series, including causal or non causal models. In addition, we
    introduce a modified version periodogram defined from these autocorrelation
    estimators and derive asymptotic distribution of linear functionals of this
    estimator.

  508. Using a priori knowledge to construct copulas.

    Authors: Dominique Drouet Mari, Valerie Monbet
    Subjects: Methodology
    Abstract

    Our purpose is to model the dependence between two random variables, taking
    into account a priori knowledge on these variables. For example, in many
    applications (oceanography, finance...), there exists an order relation between
    the two variables; when one takes high values, the other cannot take low
    values, but the contrary is possible. The dependence for the high values of the
    two variables is, therefore, not symmetric.

  509. Approximate Methods for State-Space Models.

    Authors: Cosma Rohilla Shalizi, Shinsuke Koyama, Lucia Castellanos P&#xe9;rez-Bolde, Robert E. Kass
    Subjects: Methodology
    Abstract

    State-space models provide an important body of techniques for analyzing
    time-series, but their use requires estimating unobserved states. The optimal
    estimate of the state is its conditional expectation given the observation
    histories, and computing this expectation is hard when there are
    nonlinearities. Existing filtering methods, including sequential Monte Carlo,
    tend to be either inaccurate or slow.

  510. Design and analysis of fractional factorial experiments from the viewpoint of computational algebraic statistics.

    Authors: Akimichi Takemura, Satoshi Aoki
    Subjects: Methodology
    Abstract

    We give an expository review of applications of computational algebraic
    statistics to design and analysis of fractional factorial experiments based on
    our recent works. For the purpose of design, the techniques of Gr\"obner bases
    and indicator functions allow us to treat fractional factorial designs without
    distinction between regular designs and non-regular designs. For the purpose of
    analysis of data from fractional factorial designs, the techniques of Markov
    bases allow us to handle discrete observations.

  511. Semi-automatic Approximate Bayesian Computation.

    Authors: Paul Fearnhead, Dennis Prangle
    Subjects: Methodology
    Abstract

    Many modern statistical applications involve inference for complex stochastic
    models, where it is easy to simulate from the models, but impossible to
    calculate likelihoods. Approximate Bayesian Computation (ABC) is a method of
    inference for such models. It replaces calculation of the likelihood by a step
    which involves simulating artificial data for different parameter values, and
    comparing summary statistics of the simulated data to summary statistics of the
    observed data. Here we show how to construct appropriate summary statistics for
    ABC in a semi-automatic manner.

  512. A New Generalized Kumaraswamy Distribution.

    Authors: Gauss M. Cordeiro, Jalmar M.F. Carrasco, Silvia L.P. Ferrari
    Subjects: Methodology
    Abstract

    A new five-parameter continuous distribution which generalizes the
    Kumaraswamy and the beta distributions as well as some other well-known
    distributions is proposed and studied. The model has as special cases new four-
    and three-parameter distributions on the standard unit interval. Moments, mean
    deviations, R\'enyi's entropy and the moments of order statistics are obtained
    for the new generalized Kumaraswamy distribution. The score function is given
    and estimation is performed by maximum likelihood. Hypothesis testing is also
    discussed.

  513. Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.

    Authors: Genevera I. Allen, Robert Tibshirani
    Subjects: Methodology
    Abstract

    We consider the problem of large-scale inference on the row or column
    variables of data in the form of a matrix. Often this data is transposable,
    meaning that both the row variables and column variables are of potential
    interest. An example of this scenario is detecting significant genes in
    microarrays when the samples or arrays may be dependent due to underlying
    relationships.

  514. Stochastic Stepwise Ensembles for Variable Selection.

    Authors: Mu Zhu, Lu Xin
    Subjects: Methodology
    Abstract

    In this article, we advocate the "ensemble approach" for variable selection.
    We point out that the stochastic mechanism used to generate the
    variable-selection ensemble (VSE) must be picked with care. We construct a VSE
    using a stochastic stepwise algorithm, and compare its performance with
    numerous state-of-the-art algorithms.

  515. Implementing Bayesian predictive procedures: The K-prime and K-square distributions.

    Authors: Jacques Poitevineau, Bruno Lecoutre
    Subjects: Methodology
    Abstract

    The implementation of Bayesian predictive procedures under standard normal
    models is considered. Two distributions are of particular interest, the K-prime
    and K-square distributions. They also give exact inferences for simple and
    multiple correlation coefficients. Their cumulative distribution functions can
    be expressed in terms of infinite series of multiples of incomplete beta
    function ratios, thus adequate for recursive calculations. Efficient algorithms
    are provided.

  516. Colouring and breaking sticks: random distributions and heterogeneous clustering.

    Authors: Peter J. Green
    Subjects: Methodology
    Abstract

    We begin by reviewing some probabilistic results about the Dirichlet Process
    and its close relatives, focussing on their implications for statistical
    modelling and analysis. We then introduce a class of simple mixture models in
    which clusters are of different `colours', with statistical characteristics
    that are constant within colours, but different between colours. Thus cluster
    identities are exchangeable only within colours.

  517. A longest run test for heteroscedasticity in univariate regression model.

    Authors: Jean-Baptiste Aubin, Samuela Leoni-Aubin
    Subjects: Methodology
    Abstract

    The scope of this paper is the presentation of a test that enables to detect
    heteroscedasticity in univariate regression model. The test is simple to
    compute and very general since no hypothesis is made on the regularity of the
    response function or on the normality of errors. Simulations show that our test
    fairs well with respect to other less general nonparametric tests.

  518. Bayesian Nonparametric Inference of Switching Linear Dynamical Systems.

    Authors: Michael I. Jordan, Alan S. Willsky, Emily B. Fox, Erik B. Sudderth
    Subjects: Methodology
    Abstract

    Many complex dynamical phenomena can be effectively modeled by a system that
    switches among a set of conditionally linear dynamical modes. We consider two
    such models: the switching linear dynamical system (SLDS) and the switching
    vector autoregressive (VAR) process. Our Bayesian nonparametric approach
    utilizes a hierarchical Dirichlet process prior to learn an unknown number of
    persistent, smooth dynamical modes.

  519. Probability distributions with summary graph structure.

    Authors: Nanny Wermuth
    Subjects: Methodology
    Abstract

    A set of independence statements may define the independence structure of
    interest in a family of joint probability distributions. This structure is
    often captured by a graph that consists of nodes representing the random
    variables and of edges that couple node pairs. One important class are
    multivariate regression chain graphs. They describe the independences of
    stepwise processes, in which at each step single or joint responses are
    generated given the relevant explanatory variables in their past.

  520. A Simple Misspecification Test for Regression Models.

    Authors: Jean-Baptiste Aubin, Samuela Leoni-Aubin
    Subjects: Methodology
    Abstract

    A simple test is proposed for examining the correctness of a given response
    function against unspecified general alternatives in the context of univariate
    regression. The usual diagnostic tools based on residuals plots are useful but
    heuristic. We introduce a formal statistical test supplementing the graphical
    analysis. Technically, the test statistic is the maximum length of the
    sequences of ordered (with respect to the covariate) observations that are
    consecutively overestimated or underestimated by the candidate regression
    function.

  521. A Statistical View of Learning in the Centipede Game.

    Authors: Peter D. Hoff, Anton H. Westveld
    Subjects: Methodology
    Abstract

    In this article we evaluate the statistical evidence that a population of
    students learn about the sub-game perfect Nash equilibrium of the centipede
    game via repeated play of the game. This is done by formulating a model in
    which a player's error in assessing the utility of decisions changes as they
    gain experience with the game. We first estimate parameters in a statistical
    model where the probabilities of choices of the players are given by a Quantal
    Response Equilibrium (QRE) (McKelvey and Palfrey, 1995, 1996, 1998), but are
    allowed to change with repeated play.

  522. The exp-$G$ family of probability distributions.

    Authors: Alexandre B. Simas, Wagner Barreto-Souza
    Subjects: Methodology
    Abstract

    In this paper we introduce a new method to add a parameter to a family of
    distributions. The additional parameter is completely studied and a full
    description of its behaviour in the distribution is given. We obtain several
    mathematical properties of the new class of distributions such as
    Kullback-Leibler divergence, Shannon entropy, moments, order statistics,
    estimation of the parameters and inference for large sample. Further, we showed
    that the new distribution have the reference distribution as special case, and
    that the usual inference procedures also hold in this case.

  523. Optimal Designs for Two-Level Factorial Experemients.

    Authors: Jie Yang, Abhyuday Mandal, Dibyen Majumdar
    Subjects: Methodology
    Abstract

    We consider the problem of obtaining locally D-optimal designs for factorial
    experiments with qualitative factors at two levels each with binary response.
    For the 2^2 factorial experiment with main effects model we obtain optimal
    designs analytically in special cases and demonstrate how to obtain a solution
    in the general case using cylindrical algebraic decomposition.

  524. Beta-binomial/gamma-Poisson regression models for repeated counts with random parameters.

    Authors: Mayra Ivanoff Lora, Julio M Singer
    Subjects: Methodology
    Abstract

    Beta-binomial/Poisson models have been used by many authors to model
    multivariate count data. Lora and Singer (Statistics in Medicine, 2008)
    extended such models to accommodate repeated multivariate count data with
    overdipersion in the binomial component. To overcome some of the limitations of
    that model, we consider a beta-binomial/gamma-Poisson alternative that also
    allows for both overdispersion and different covariances between the Poisson
    counts.

  525. Gaussian Process Models and Interpolators for Deterministic Computer Simulators.

    Authors: Pritam Ranjan, Ronald Haynes, Richard Karsten
    Subjects: Methodology
    Abstract

    For many expensive deterministic computer simulators, the outputs do not have
    replication error and the desired metamodel (or emulator) is an interpolator of
    the observed data. Realizations of Gaussian spatial processes (GP) are commonly
    used to model such simulator outputs. Fitting a GP model to $n$ data points
    requires inversion of $n \times n$ correlation matrices, $R$, that are
    sometimes computationally unstable due to near-singularity of $R$. This happens
    if any pair of design points are very close together in the input space.

  526. Extending The Range of Application of Permutation Tests: the Expected Permutation p-value Approach.

    Authors: Daniel Commenges
    Subjects: Methodology
    Abstract

    The limitation of permutation tests is that they assume exchangeability. It
    is shown that in generalized linear models one can construct permutation tests
    from score statistics in particular cases. When under the null hypothesis the
    observations are not exchangeable, a representation in terms of Cox-Snell
    residuals allows to develop an approach based on an expected permutation
    p-value (Eppv); this is applied to the logistic regression model. A small
    simulation study and an illustration with real data are given.

  527. Branch and Bound Algorithms for Maximizing Expected Improvement Functions.

    Authors: Pritam Ranjan, Mark Franey, Hugh Chipman
    Subjects: Methodology
    Abstract

    Deterministic computer simulations are often used as a replacement for
    complex physical experiments. Although less expensive than physical
    experimentation, computer codes can still be time-consuming to run. An
    effective strategy for exploring the response surface of the deterministic
    simulator is the use of an approximation to the computer code, such as a
    Gaussian process (GP) model, coupled with a sequential sampling strategy for
    choosing design points that can be used to build the GP model.

  528. Kernel methods and minimum contrast estimators for empirical deconvolution.

    Authors: Peter Hall, Aurore Delaigle
    Subjects: Methodology
    Abstract

    We survey classical kernel methods for providing nonparametric solutions to
    problems involving measurement error. In particular we outline kernel-based
    methodology in this setting, and discuss its basic properties. Then we point to
    close connections that exist between kernel methods and much newer approaches
    based on minimum contrast techniques. The connections are through use of the
    sinc kernel for kernel-based inference.

  529. Statistical inference for multidimensional time-changed L\'evy processes based on low-frequency data.

    Authors: Denis Belomestny
    Subjects: Methodology
    Abstract

    In this article we study the problem of a semi-parametric inference on the
    parameters of a multidimensional L\'evy process (L_{t}) based on the
    low-frequency observations of the corresponding time-changed L\'evy process
    (L_{\mathcal{T}(t)}) where (\mathcal{T}) is a non-negative, non-decreasing
    real- valued process independent of (L_{t}.) We prove strong uniform
    consistency of the proposed estimate for the L\'evy density of (L_{t}) and
    derive the convergence rates in a weighted (L_{\infty}) norm.

  530. Perfect simulation using dominated coupling from the past with application to area-interaction point processes and wavelet thresholding.

    Authors: Bernard W. Silverman, Graeme K. Ambler
    Subjects: Methodology
    Abstract

    We consider perfect simulation algorithms for locally stable point processes
    based on dominated coupling from the past, and apply these methods in two
    different contexts. A new version of the algorithm is developed which is
    feasible for processes which are neither purely attractive nor purely
    repulsive. Such processes include multiscale area-interaction processes, which
    are capable of modelling point patterns whose clustering structure varies
    across scales.

  531. History of applications of martingales in survival analysis.

    Authors: Richard D. Gill, Odd O. Aalen, Per Kragh Andersen, \Ornulf Borgan, Niels Keiding
    Subjects: Methodology
    Abstract

    The paper traces the development of the use of martingale methods in survival
    analysis from the mid 1970's to the early 1990's.

  532. A copula based approach to adaptive sampling.

    Authors: Ralph Silva, Paolo Giordani, Robert Kohn, Xiuyan Mun
    Subjects: Methodology
    Abstract

    Our article is concerned with adaptive sampling schemes for Bayesian
    inference that update the proposal densities using previous iterates. We
    introduce a copula based proposal density which is made more efficient by
    combining it with antithetic variable sampling. We compare the copula based
    proposal to an adaptive proposal density based on a multivariate mixture of
    normals and an adaptive random walk Metropolis proposal. We also introduce a
    refinement of the random walk proposal which performs better for multimodal
    target distributions.

  533. The distribution and quantiles of functionals of weighted empirical distributions when observations have different distributions.

    Authors: S. Nadarajah, C. S. Withers
    Subjects: Methodology
    Abstract

    This paper extends Edgeworth-Cornish-Fisher expansions for the distribution
    and quantiles of nonparametric estimates in two ways. Firstly it allows
    observations to have different distributions. Secondly it allows the
    observations to be weighted in a predetermined way. The use of weighted
    estimates has a long history including applications to regression, rank
    statistics and Bayes theory. However, asymptotic results have generally been
    only first order (the CLT and weak convergence).

  534. Mixed membership stochastic blockmodels.

    Authors: Stephen E Fienberg, Edoardo M Airoldi, David M Blei, Eric P Xing
    Subjects: Methodology
    Abstract

    Observations consisting of measurements on relationships for pairs of objects
    arise in many settings, such as protein interaction and gene regulatory
    networks, collections of author-recipient email, and social networks. Analyzing
    such data with probabilisic models can be delicate because the simple
    exchangeability assumptions underlying many boilerplate models no longer hold.
    In this paper, we describe a latent variable model of such data called the
    mixed membership stochastic blockmodel.

  535. Gamma-based clustering via ordered means with application to gene-expression analysis.

    Authors: Michael A. Newton, Lisa M. Chung
    Subjects: Methodology
    Abstract

    Discrete mixture models provide a well-known basis for effective clustering
    algorithms, although technical challenges have limited their scope. In the
    context of gene-expression data analysis, a model is presented that mixes over
    a finite catalog of structures, each one representing equality and inequality
    constraints among latent expected values. Computations depend on the
    probability that independent gamma-distributed variables attain each of their
    possible orderings.

  536. The Degrees of Freedom of Partial Least Squares Regression.

    Authors: Nicole Kraemer, Masashi Sugiyama
    Subjects: Methodology
    Abstract

    The derivation of statistical properties for Partial Least Squares regression
    can be a challenging task. The reason is that the construction of latent
    components from the predictor variables also depends on the response variable.
    While this typically leads to good performance and interpretable models in
    practice, it makes the statistical analysis more involved. In this work, we
    study the intrinsic complexity of Partial Least Squares Regression. Our
    contribution is an unbiased estimate of its Degrees of Freedom.

  537. Estimation for High-Dimensional Linear Mixed-Effects Models Using $\ell_1$-Penalization.

    Authors: Peter B&#xfc;hlmann, J&#xfc;rg Schelldorfer
    Subjects: Methodology
    Abstract

    We propose an $\ell_1$-penalized estimation procedure for high-dimensional
    linear mixed-effects models. The models are useful whenever there is a grouping
    structure among high-dimensional observations, i.e. for clustered data. We
    prove a consistency and an oracle optimality result and we develop an algorithm
    with provable numerical convergence. Furthermore, we demonstrate the
    performance of the method on simulated and a real high-dimensional dataset.

  538. Structured, Sparse Regression With Application to HIV Drug Resistance.

    Authors: Larry Wasserman, Kathryn Roeder, Daniel Percival, Roni Rosenfeld
    Subjects: Methodology
    Abstract

    We introduce a new version of forward stepwise regression. Our modification
    finds solutions to regression problems where the selected predictors appear in
    a structured pattern, with respect to a predefined distance measure over the
    candidate predictors. Our method is motivated by the problem of predicting
    HIV-1 drug resistance from protein sequences. We find that our methods improve
    the interpretability of drug resistance while producing comparable predictive
    accuracy to standard methods.

  539. Estimating Bayesian Networks for High-dimensional Data with Complex Mean Structure.

    Authors: Jessica Kasza, Gary Glonek, Patty Solomon
    Subjects: Methodology
    Abstract

    The estimation of Bayesian networks given high-dimensional data sets, in
    particular given gene expression data sets, has been the focus of much recent
    research. While there are many methods available for the estimation of such
    structures, these methods typically assume that the data set consists of
    independent and identically distributed samples. However, often the data
    available will have a more complex mean structure and additional components of
    variance.

  540. Bayesian Inference.

    Authors: Judith Rousseau, Christian P. Robert, Jean-Michel Marin
    Subjects: Methodology
    Abstract

    This chapter provides a overview of Bayesian inference, mostly emphasising
    that it is a universal method for summarising uncertainty and making estimates
    and predictions using probability statements conditional on observed data and
    an assumed model (Gelman 2008).

  541. Classification and categorical inputs with treed Gaussian process models.

    Authors: Tamara Broderick, Robert B. Gramacy
    Subjects: Methodology
    Abstract

    Recognizing the successes of treed Gaussian process (TGP) models as an
    interpretable and thrifty model for nonparametric regression, we seek to extend
    the model to classification. Both treed models and Gaussian processes (GPs)
    have, separately, enjoyed great success in application to classification
    problems. An example of the former is Bayesian CART. In the latter, real-valued
    GP output may be utilized for classification via latent variables, which
    provide classification rules by means of a softmax function.

  542. Enhancing hyperspectral image unmixing with spatial correlations.

    Authors: Nicolas Dobigeon, Jean-Yves Tourneret, Olivier Eches
    Subjects: Methodology
    Abstract

    This paper describes a new algorithm for hyperspectral image unmixing. Most
    of the unmixing algorithms proposed in the literature do not take into account
    the possible spatial correlations between the pixels. In this work, a Bayesian
    model is introduced to exploit these correlations. The image to be unmixed is
    assumed to be partitioned into regions (or classes) where the statistical
    properties of the abundance coefficients are homogeneous. A Markov random field
    is then proposed to model the spatial dependency of the pixels within any
    class.

  543. Inference based on Procrustes means for the 3D shape of tree boles and mildly rank-deficient diffusion tensors.

    Authors: Stephan Huckemann
    Subjects: Methodology
    Abstract

    A method for inference on cylindricity of tree boles based on 3D Kendall
    shapes is presented. This method employs Procrustes means which are highly
    popular in the shape community. While for extrinsic means (such are Procrustes
    means for 2D Kendall shapes) and intrinsic means, a strong law of large numbers
    (SLLN) and a central limit theorem (CLT) are available, Procrustes means,
    however as goes the common belief due to a perturbation paradigm, could not be
    used for inference because they are "consistent estimators" only under specific
    distributional assumptions.

  544. On the meaning of mean shape.

    Authors: Stephan Huckemann
    Subjects: Methodology
    Abstract

    A stability result for intrinsic means on the quotient due to an isometric
    and proper Lie group action on a Riemannian manifold is derived, stating that
    intrinsic means are contained in the highest dimensional manifold stratum
    assumed with non-zero probability. In consequence, the Central Limit Theorem
    (CLT) for manifold valued random elements can be extended to non-manifold shape
    spaces. The relationship to other types of means is disussed: the CLT extends
    to Ziezold means but not in general to Procrustes means, since the latter may
    be contained in lower dimensional manifold strata.

  545. Dynamic shape analysis and comparison of leaf growth.

    Authors: Stephan Huckemann
    Subjects: Methodology
    Abstract

    In the statistical analysis of shape a goal beyond the analysis of static
    shapes lies in the quantification of `same' deformation of different shapes.
    Typically, shape spaces are modelled as Riemannian manifolds on which parallel
    transport along geodesics naturally qualifies as a measure for the `similarity'
    of deformation.

  546. Estimation error for blind Gaussian time series filtering.

    Authors: Fabrice Gamboa, Jean-Michel Loubes, Thibault Espinasse
    Subjects: Methodology
    Abstract

    In the frame of time series analysis, we compute the quadratic error in the
    blind estimation of the projection operator for prediction with infinite past.
    The estimation is made using only a single finite sample of the time series. It
    is performed by plugging the empirical covariance in a clever Schur complement
    decomposition of the projector.

  547. Locally adaptive image denoising by a statistical multiresolution criterion.

    Authors: Axel Munk, Zakhar Kabluchko, Thomas Hotz, Philipp Marnitz, Rahel Stichtenoth, Laurie Davies
    Subjects: Methodology
    Abstract

    We demonstrate how one can choose the smoothing parameter in image denoising
    by a statistical multiresolution criterion, both globally and locally. Using
    inhomogeneous diffusion and total variation regularization as examples for
    localized regularization schemes, we present an efficient method for locally
    adaptive image denoising. As expected, the smoothing parameter serves as an
    edge detector in this framework. Numerical examples illustrate the usefulness
    of our approach. We also present an application in confocal microscopy.

  548. Target Detection via Network Filtering.

    Authors: Shu Yang, Eric D. Kolaczyk
    Subjects: Methodology
    Abstract

    A method of `network filtering' has been proposed recently to detect the
    effects of certain external perturbations on the interacting members in a
    network. However, with large networks, the goal of detection seems a priori
    difficult to achieve, especially since the number of observations available
    often is much smaller than the number of variables describing the effects of
    the underlying network.

  549. "Additivity" versus "Maxitivity" at the heart of the paradoxical and efficient nature of Statistics.

    Authors: M. R&#xe9;mon
    Subjects: Methodology
    Abstract

    Unlike the Probability Theory based on additivity, Statistical Inference
    seems to hesitate between "Additivity" and a so-called "Maxitivity" approach.
    After a brief overview of three types of principles for any (parametric)
    statistical theory and the proof that these principles are mutually exclusive,
    the paper shows that two kinds of support measures are conceivable, an additive
    one and a maxitive one (based on maximization operators).

  550. On Bayesian Data Analysis.

    Authors: Judith Rousseau, Christian P. Robert
    Subjects: Methodology
    Abstract

    This introduction to Bayesian statistics presents the main concepts as well
    as the principal reasons advocated in favour of a Bayesian modelling. We cover
    the various approaches to prior determination as well as the basis asymptotic
    arguments in favour of using Bayes estimators. The testing aspects of Bayesian
    inference are also examined in details.

  551. Relative Age Effect in Elite Sports: Methodological Bias or Real Discrimination?.

    Authors: Nicolas Delorme, Julie Boich&#xe9;, Michel Raspaud
    Subjects: Methodology
    Abstract

    Sport sciences researchers talk about a relative age effect when they observe
    a biased distribution of elite athletes' birthdates, with an
    over-representation of those born at the beginning of the competitive year and
    an under-representation of those born at the end. Using the whole sample of the
    French male licensed soccer players (n = 1,831,524), our study suggests that
    there could be an important bias in the statistical test of this effect. This
    bias could in turn lead to falsely conclude to a systemic discrimination in the
    recruitment of professional players.

  552. Robustness and accuracy of methods for high dimensional data analysis based on Student's t statistic.

    Authors: Peter Hall, Jiashun Jin, Aurore Delaigle
    Subjects: Methodology
    Abstract

    Student's $t$ statistic is finding applications today that were never
    envisaged when it was introduced more than a century ago. Many of these
    applications rely on properties, for example robustness against heavy tailed
    sampling distributions, that were not explicitly considered until relatively
    recently. In this paper we explore these features of the $t$ statistic in the
    context of its application to very high dimensional problems, including feature
    selection and ranking, highly multiple hypothesis testing, and sparse, high
    dimensional signal detection.

  553. Non-Gaussian Quasi Maximum Likelihood Estimation of GARCH Models.

    Authors: Jianqing Fan, Lei Qi, Dacheng Xiu
    Subjects: Methodology
    Abstract

    The non-Gaussian quasi maximum likelihood estimator is frequently used in
    GARCH models with intension to improve the efficiency of the GARCH parameters.
    However, the method is usually inconsistent unless the quasi-likelihood happens
    to be the true one. We identify an unknown scale parameter that is critical to
    the consistent estimation of non-Gaussian QMLE. As a part of estimating this
    unknown parameter, a two-step non-Gaussian QMLE (2SNG-QMLE) is proposed for
    estimation the GARCH parameters.

  554. Strict Monotonicity and Convergence Rate of Titterington's Algorithm for Computing D-optimal Designs.

    Authors: Yaming Yu
    Subjects: Methodology
    Abstract

    We study a class of multiplicative algorithms introduced by Silvey et al.
    (1978) for computing D-optimal designs. Strict monotonicity is established for
    a variant considered by Titterington (1978). A formula for the rate of
    convergence is also derived. This is used to explain why modifications
    considered by Titterington (1978) and Dette et al. (2008) usually converge
    faster.

  555. A Conversation with Shayle R. Searle.

    Authors: Martin T. Wells
    Subjects: Methodology
    Abstract

    Born in New Zealand, Shayle Robert Searle earned a bachelor's degree (1949)
    and a master's degree (1950) from Victoria University, Wellington, New Zealand.
    After working for an actuary, Searle went to Cambridge University where he
    earned a Diploma in mathematical statistics in 1953. Searle won a Fulbright
    travel award to Cornell University, where he earned a doctorate in animal
    breeding, with a strong minor in statistics in 1959, studying under Professor
    Charles Henderson.

  556. Bayesian Thought in Early Modern Detective Stories: Monsieur Lecoq, C. Auguste Dupin and Sherlock Holmes.

    Authors: Joseph B. Kadane
    Subjects: Methodology
    Abstract

    This paper reviews the maxims used by three early modern fictional
    detectives: Monsieur Lecoq, C. Auguste Dupin and Sherlock Holmes. It find
    similarities between these maxims and Bayesian thought. Poe's Dupin uses ideas
    very similar to Bayesian game theory. Sherlock Holmes' statements also show
    thought patterns justifiable in Bayesian terms.

  557. An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors.

    Authors: Kaspar Rufibach
    Subjects: Methodology
    Abstract

    In biomedical studies, researchers are often interested in assessing the
    association between one or more ordinal explanatory variables and an outcome
    variable, at the same time adjusting for covariates of any type. The outcome
    variable may be continuous, binary, or represent censored survival times. In
    the absence of precise knowledge of the response function, using monotonicity
    constraints on the ordinal variables improves efficiency in estimating
    parameters, especially when sample sizes are small.

  558. Introducing Monte Carlo Methods with R Solutions to Odd-Numbered Exercises.

    Authors: George Casella, Christian P. Robert
    Subjects: Methodology
    Abstract

    This is the solution manual to the odd-numbered exercises in our book
    "Introducing Monte Carlo Methods with R", published by Springer Verlag on
    December 10, 2009, and made freely available to everyone.

  559. Comment on "Harold Jeffreys's Theory of Probability Revisited".

    Authors: Dennis Lindley
    Subjects: Methodology
    Abstract

    Comment on "Harold Jeffreys's Theory of Probability Revisited"
    [arXiv:0804.3173]

  560. A Multivariate Variance Components Model for Analysis of Covariance in Designed Experiments.

    Authors: James G. Booth, Walter T. Federer, Martin T. Wells, Russell D. Wolfinger
    Subjects: Methodology
    Abstract

    Traditional methods for covariate adjustment of treatment means in designed
    experiments are inherently conditional on the observed covariate values. In
    order to develop a coherent general methodology for analysis of covariance, we
    propose a multivariate variance components model for the joint distribution of
    the response and covariates. It is shown that, if the design is orthogonal with
    respect to (random) blocking factors, then appropriate adjustments to treatment
    means can be made using the univariate variance components model obtained by
    conditioning on the observed covariate values.

  561. Comment on "Harold Jeffreys's Theory of Probability Revisited".

    Authors: Jos&#xe9; M. Bernardo
    Subjects: Methodology
    Abstract

    Comment on "Harold Jeffreys's Theory of Probability Revisited"
    [arXiv:0804.3173]

  562. Bayes, Jeffreys, Prior Distributions and the Philosophy of Statistics.

    Authors: Andrew Gelman
    Subjects: Methodology
    Abstract

    Discussion of "Harold Jeffreys's Theory of Probability revisited," by
    Christian Robert, Nicolas Chopin, and Judith Rousseau, for Statistical Science
    [arXiv:0804.3173]

  563. Comment: The Importance of Jeffreys's Legacy.

    Authors: Robert Kass
    Subjects: Methodology
    Abstract

    Theory of Probability is distinguished by several high-level philosophical
    attitudes, some stressed by Jeffreys, some implicit. By reviewing these we may
    recognize the importance in this work in the historical development of
    statistics. [arXiv:0804.3173]

  564. Comment on "Harold Jeffreys's Theory of Probability Revisited".

    Authors: Stephen Senn
    Subjects: Methodology
    Abstract

    Comment on "Harold Jeffreys's Theory of Probability Revisited"
    [arXiv:0804.3173]

  565. Comment on "Harold Jeffreys's Theory of Probability Revisited".

    Authors: Arnold Zellner
    Subjects: Methodology
    Abstract

    Comment on "Harold Jeffreys's Theory of Probability Revisited"
    [arXiv:0804.3173]

  566. Longitudinal Data with Follow-up Truncated by Death: Match the Analysis Method to Research Aims.

    Authors: Brenda F. Kurland, Laura L. Johnson, Brian L. Egleston, Paula H. Diehr
    Subjects: Methodology
    Abstract

    Diverse analysis approaches have been proposed to distinguish data missing
    due to death from nonresponse, and to summarize trajectories of longitudinal
    data truncated by death. We demonstrate how these analysis approaches arise
    from factorizations of the distribution of longitudinal data and survival
    information. Models are illustrated using cognitive functioning data for older
    adults. For unconditional models, deaths do not occur, deaths are independent
    of the longitudinal response, or the unconditional longitudinal response is
    averaged over the survival distribution.

  567. Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources.

    Authors: Sander Greenland
    Subjects: Methodology
    Abstract

    In designed experiments and surveys, known laws or design feat ures provide
    checks on the most relevant aspects of a model and identify the target
    parameters. In contrast, in most observational studies in the health and social
    sciences, the primary study data do not identify and may not even bound target
    parameters. Discrepancies between target and analogous identified parameters
    (biases) are then of paramount concern, which forces a major shift in modeling
    strategies.

  568. Kernel Partial Least Squares is Universally Consistent.

    Authors: Nicole Kraemer, Gilles Blanchard
    Subjects: Methodology
    Abstract

    We prove the statistical consistency of kernel Partial Least Squares
    Regression applied to a bounded regression learning problem on a reproducing
    kernel Hilbert space. Partial Least Squares stands out of well-known classical
    approaches as e.g. Ridge Regression or Principal Components Regression, as it
    is not defined as the solution of a global cost minimization procedure over a
    fixed model nor is it a linear estimator. Instead, approximate solutions are
    constructed by projections onto a nested set of data-dependent subspaces.

  569. Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing.

    Authors: Robert B. Gramacy, Ester Pantaleo
    Subjects: Methodology
    Abstract

    Portfolio balancing requires estimates of covariance between asset returns.
    Returns data have histories which greatly vary in length, since assets begin
    public trading at different times. This can lead to a huge amount of missing
    data--too much for the conventional imputation-based approach. Fortunately, a
    well-known factorization of the MVN likelihood under the prevailing historical
    missingness pattern leads to a simple algorithm of OLS regressions that is much
    more reliable. When there are more assets than returns, however, OLS becomes
    unstable. Gramacy, et al.

  570. Likelihood-free Markov chain Monte Carlo.

    Authors: Y Fan, S A Sisson
    Subjects: Methodology
    Abstract

    To appear to MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng
    (eds), Chapman & Hall.

  571. Statistical tests for whether a given set of independent, identically distributed draws does not come from a specified probability density.

    Authors: Mark Tygert
    Subjects: Methodology
    Abstract

    We discuss several tests for whether a given set of independent and
    identically distributed (i.i.d.) draws does not come from a specified
    probability density function. The most commonly used are Kolmogorov-Smirnov
    tests, particularly Kuiper's variant, which focus on discrepancies between the
    cumulative distribution function for the specified probability density and the
    empirical cumulative distribution function for the given set of i.i.d.

  572. Reversible jump Markov chain Monte Carlo.

    Authors: Y Fan, S A Sisson
    Subjects: Methodology
    Abstract

    To appear to MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng
    (eds), Chapman & Hall.

  573. Improved estimators for dispersion models with dispersion covariates.

    Authors: Alexandre B. Simas, Andr&#xe9;a V. Rocha, Wagner Barreto-Souza
    Subjects: Methodology
    Abstract

    In this paper we discuss improved estimators for the regression and the
    dispersion parameters in an extended class of dispersion models (J{\o}rgensen,
    1996). This class extends the regular dispersion models by letting the
    dispersion parameter vary throughout the observations, and contains the
    dispersion models as particular case. General formulae for the second-order
    bias are obtained explicitly in dispersion models with dispersion covariates,
    which generalize previous results by Botter and Cordeiro (1998), Cordeiro and
    McCullagh (1991), Cordeiro and Vasconcellos (1999), and Paula (1992).

  574. Skewness of maximum likelihood estimators in dispersion models.

    Authors: Alexandre B. Simas, Gauss M. Cordeiro, Andr&#xe9;a V. Rocha
    Subjects: Methodology
    Abstract

    We introduce the dispersion models with a regression structure to extend the
    generalized linear models, the exponential family nonlinear models (Cordeiro
    and Paula, 1989) and the proper dispersion models (J{\o}rgensen, 1997a). We
    provide a matrix expression for the skewness of the maximum likelihood
    estimators of the regression parameters in dispersion models. The formula is
    suitable for computer implementation and can be applied for several important
    submodels discussed in the literature.

  575. A Binary Control Chart to Detect Small Jumps.

    Authors: Ansgar Steland, Ewaryst Rafalowicz
    Subjects: Methodology
    Abstract

    The classic N p chart gives a signal if the number of successes in a sequence
    of inde- pendent binary variables exceeds a control limit. Motivated by
    engineering applications in industrial image processing and, to some extent,
    financial statistics, we study a simple modification of this chart, which uses
    only the most recent observations. Our aim is to construct a control chart for
    detecting a shift of an unknown size, allowing for an unknown distribution of
    the error terms.

  576. Outlier detection and trimmed estimation in general functional spaces.

    Authors: Daniel Gervini
    Subjects: Methodology
    Abstract

    This article introduces trimmed estimators for the mean and covariance
    functional of data in general Hilbert spaces. The estimators are based on a
    data depth measure that can be computed on any Hilbert space, because it is
    defined only in terms of the interdistances between data points. We show that
    the estimators can attain the maximum breakdown point by properly choosing the
    tuning parameters, and that they possess better outlier resistance properties
    than alternative estimators, as shown by a comparative Monte Carlo study.

  577. Graphically dependent and spatially varying Dirichlet process mixtures.

    Authors: XuanLong Nguyen
    Subjects: Methodology
    Abstract

    We consider the problem of clustering grouped and functional data, which are
    indexed by a covariate, and assessing the dependency of the clustered groups on
    the covariate. We assume that each observation within a group is a draw from a
    mixture model. The mixture components and the number of such components can
    change with the covariate, and are assumed to be unknown a priori. In addition
    to learning the "local" clusters within each group we also assume the existence
    of "global clusters" indexed over the covariate domain when the observations
    across the groups are jointly analyzed.

  578. A survey of statistical network models.

    Authors: Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi
    Subjects: Methodology
    Abstract

    Networks are ubiquitous in science and have become a focal point for
    discussion in everyday life. Formal statistical models for the analysis of
    network data have emerged as a major topic of interest in diverse areas of
    study, and most of these involve a form of graphical representation.
    Probability models on graphs date back to 1959. Along with empirical studies in
    social psychology and sociology from the 1960s, these early works generated an
    active network community and a substantial literature in the 1970s.

  579. Selection models under generalized symmetry settings.

    Authors: Adelchi Azzalini
    Subjects: Methodology
    Abstract

    An active stream of literature has followed up the idea of skew-elliptical
    densities initiated by Azzalini and Capitanio (1999). Their original
    formulation was based on a general lemma which is however of broader
    applicability than usually perceived. This note examines new directions of its
    use, and illustrates them with the construction of some probability
    distributions falling outside the family of the so-called skew-symmetric
    densities.

  580. Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection.

    Authors: Jianqing Fan, Jelena Bradic, Weiwei Wang
    Subjects: Methodology
    Abstract

    In high-dimensional model selection problems, penalized simple least-square
    approaches have been extensively used. This paper addresses the question of
    both robustness and efficiency of penalized model selection methods, and
    proposes a data-driven weighted linear combination of convex loss functions,
    together with weighted $L_1$-penalty. It is completely data-adaptive and does
    not require prior knowledge of the error distribution. The weighted
    $L_1$-penalty is used both to ensure the convexity of the penalty term and to
    ameliorate the bias caused by the $L_1$-penalty.

  581. Ranking Relations using Analogies in Biological and Information Networks.

    Authors: Ricardo Silva, Katherine Heller, Zoubin Ghahramani, Edoardo M. Airoldi
    Subjects: Methodology
    Abstract

    Analogical reasoning depends fundamentally on the ability to learn and
    generalize about relations between objects. We develop an approach to
    relational learning which, given a set of pairs of objects S = {A1:B1, A2:B2,
    >..., AN:BN}, measures how well other pairs A:B fit in with the set S. Our work
    addresses the question: is the relation between objects A and B analogous to
    those relations found in S? Such questions are particularly relevant in
    information retrieval, where an investigator might want to search for analogous
    pairs of objects that match the query set of interest.

  582. Inference for Extremal Conditional Quantile Models, with an Application to Market and Birthweight Risks.

    Authors: Victor Chernozhukov, Ivan Fernandez-Val
    Subjects: Methodology
    Abstract

    Quantile regression is an increasingly important empirical tool in economics
    and other sciences for analyzing the impact of a set of regressors on the
    conditional distribution of an outcome. Extremal quantile regression, or
    quantile regression applied to the tails, is of interest in many economic and
    financial applications, such as conditional value-at-risk, production
    efficiency, and adjustment bands in (S,s) models.

  583. Inferring Multiple Graphical Models.

    Authors: Julien Chiquet, Christophe Ambroise, Yves Grandvalet
    Subjects: Methodology
    Abstract

    Gaussian Graphical Models provide a convenient framework for representing
    dependencies between variables. Recently, this tool has received a high
    interest for the discovery of biological networks. The litterature focuses on
    the case where a single network is inferred from a set of measurements, but, as
    wetlab data is typically scarce, several assays, where the experimental
    conditions affect interactions, are usually merged to infer a single network.

  584. The assessment and planning of non-inferiority trials for retention of effect hypotheses - towards a general approach.

    Authors: M. Mielke, A. Munk
    Subjects: Methodology
    Abstract

    The objective of this paper is to develop statistical methodology for
    planning and evaluating three-armed non-inferiority trials for general
    retention of effect hypotheses, where the endpoint of interest may follow any
    (regular) parametric distribution family. This generalizes and unifies specific
    results for binary, normally and exponentially distributed endpoints. We
    propose a Wald-type test procedure for the retention of effect hypothesis
    (RET), which assures that the test treatment maintains at least a proportion
    $\Delta$ of reference treatment effect compared to placebo.

  585. On the de la Garza Phenomenon.

    Authors: Min Yang
    Subjects: Methodology
    Abstract

    Deriving optimal designs for nonlinear models is in general challenging. One
    crucial step is to determine the number of support points needed. Current tools
    handle this on a case-by-case basis. Each combination of model, optimality
    criterion and objective requires its own proof. The celebrated de la Garza
    Phenomenon states that under a (p-1)th-degree polynomial regression model, any
    optimal design can be based on at most p design points, the minimum number of
    support points such that all parameters are estimable. Does this conclusion
    also hold for nonlinear models?

  586. Nonparametric Bayesian Estimation of a Bivariate Copula Using the Jeffreys Prior.

    Authors: Simon Guillotte, Francois Perron
    Subjects: Methodology
    Abstract

    A bivariate distribution with continuous margins can be uniquely decomposed
    via a copula and its marginal distributions. We consider the problem of
    estimating the copula function and adopt a nonparametric Bayesian approach. On
    the space of copula functions, we construct a finite dimensional approximation
    subspace which is parameterized by a doubly stochastic matrix. A major problem
    here is the selection of a prior distribution on the space of doubly stochastic
    matrices also known as the Birkhoff polytope.

  587. Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratification.

    Authors: Herv&#xe9; Cardot, Etienne Josserand
    Subjects: Methodology
    Abstract

    When one has very large datasets of functional data survey sampling
    approaches are interesting techniques to get estimators of simple functional
    quantities such as the mean curve without being obliged to store all the data.
    We propose here an Horvitz-Thompson estimator of the mean trajectory.
    Introducing a superpopulation framework we first prove that we get consistent
    estimators of the mean function as well as its covariance function.

  588. Bootstrapping confidence levels for hypotheses about regression models.

    Authors: Michael Wood
    Subjects: Methodology
    Abstract

    This paper shows how bootstrapping (using a spreadsheet) can be used to
    derive confidence levels for hypotheses about features of regression models -
    such as their shape, and the location of optimum values. The data used as an
    example leads to a confidence level of 67% that the sample comes from a
    population which displays the hypothesized inverted U shape. There is no
    obvious and satisfactory alternative way of deriving this result, or an
    equivalent result. In particular, null hypothesis tests cannot provide adequate
    support for this type of hypothesis.

  589. Liberating research from null hypotheses: confidence levels for substantive hypotheses instead of p values.

    Authors: Michael Wood
    Subjects: Methodology
    Abstract

    Null hypothesis tests, which result in p values or significance levels, are
    widely used in analyzing and reporting research results, despite very strong
    arguments against their use in many contexts. One suggested alternative is the
    use of confidence intervals. However, this does not directly achieve the
    objective of assessing the credibility of a hypothesis. This paper presents a
    third alternative - assessing confidence levels for hypotheses.

  590. Notes to Robert et al.: Model criticism informs model choice and model comparison.

    Authors: Oliver Ratmann, Christophe Andrieu, Carsten Wiuf, Sylvia Richardson
    Subjects: Methodology
    Abstract

    In their letter to PNAS and a comprehensive set of notes on arXiv
    [arXiv:0909.5673v2], Christian Robert, Kerrie Mengersen and Carla Chen (RMC)
    represent our approach to model criticism in situations when the likelihood
    cannot be computed as a way to "contrast several models with each other". In
    addition, RMC argue that model assessment with Approximate Bayesian Computation
    under model uncertainty (ABCmu) is unduly challenging and question its Bayesian
    foundations.

  591. Projection Pursuit Through $\Phi$-Divergence Minimisation.

    Authors: Jacques Touboul
    Subjects: Methodology
    Abstract

    Consider a defined density on a set of very large dimension. It is quite
    difficult to find an estimate of this density from a data set. However, it is
    possible through a projection pursuit methodology to solve this problem.
    Touboul's article "Projection Pursuit Through Relative Entropy Minimization",
    2009, demonstrates the interest of the author's method in a very simple given
    case. He considers the factorization of a density through an Elliptical
    component and some residual density. The above Touboul's work is based on
    minimizing relative entropy.

  592. Modeling sparse connectivity between underlying brain sources for EEG/MEG.

    Authors: Ryota Tomioka, Motoaki Kawanabe, Klaus-Robert Mueller, Stefan Haufe, Guido Nolte
    Subjects: Methodology
    Abstract

    We propose a novel technique to assess functional brain connectivity in
    EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA),
    can overcome the problem of volume conduction by modeling neural data
    innovatively with the following ingredients: (a) the EEG is assumed to be a
    linear mixture of correlated sources following a multivariate autoregressive
    (MVAR) model, (b) the demixing is estimated jointly with the source MVAR
    parameters, (c) overfitting is avoided by using the Group Lasso penalty.

  593. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.

    Authors: Rui Song, Jianqing Fan, Yang Feng
    Subjects: Methodology
    Abstract

    A variable screening procedure via correlation learning was proposed Fan and
    Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models.
    Even when the true model is linear, the marginal regression can be highly
    nonlinear. To address this issue, we further extend the correlation learning to
    marginal nonparametric learning. Our nonparametric independence screening is
    called NIS, a specific member of the sure independence screening. Several
    closely related variable screening procedures are proposed.

  594. Sequential Regression Trees for Learning and Design.

    Authors: Robert B. Gramacy, Nicholas G. Polson, Matthew A. Taddy
    Subjects: Methodology
    Abstract

    Sequential regression trees are an attractive option for automatic regression
    and classification with complicated response surfaces in on-line application
    settings. We create a dynamic tree model whose state changes in time with the
    accumulation of new data, and provide particle learning algorithms that allow
    for the efficient on-line posterior filtering of tree-states. A major advantage
    of tree regression is that it allows for the use of very simple models within
    each partition.

  595. Robust Fitting of Ellipses and Spheroids.

    Authors: H. Vincent Poor, Jieqi Yu, Sanjeev R. Kulkarni
    Subjects: Methodology
    Abstract

    Ellipse and ellipsoid fitting has been extensively researched and widely
    applied. Although traditional fitting methods provide accurate estimation of
    ellipse parameters in the low-noise case, their performance is compromised when
    the noise level or the ellipse eccentricity are high. A series of robust
    fitting algorithms are proposed that perform well in high-noise,
    high-eccentricity ellipse/spheroid (a special class of ellipsoid) cases. The
    new algorithms are based on the geometric definition of an ellipse/spheroid,
    and improved using global statistical properties of the data.

  596. Vector Autoregressive Models With Measurement Errors for Testing Ganger Causality.

    Authors: Alexandre G. Patriota, Joao R. Sato, Betsabe G. Blas
    Subjects: Methodology
    Abstract

    This paper develops a method for estimating parameters of a vector
    autoregression (VAR) observed in white noise. The estimation method assumes the
    noise variance matrix is known and does not require any iterative process. This
    study provides consistent estimators and shows the asymptotic distribution of
    the parameters required for conducting tests of Granger causality.

  597. Bayesian Inference from Composite Likelihoods, with an Application to Spatial Extremes.

    Authors: Mathieu Ribatet, Daniel Cooley, Anthony C. Davison
    Subjects: Methodology
    Abstract

    Composite likelihoods are increasingly used in applications where the full
    likelihood is analytically unknown or computationally prohibitive. Although the
    maximum composite likelihood estimator has frequentist properties akin to those
    of the usual maximum likelihood estimator, Bayesian inference based on
    composite likelihoods has yet to be explored. In this paper we investigate the
    use of the Metropolis--Hastings algorithm to compute a pseudo-posterior
    distribution based on the composite likelihood.

  598. A test for second order stationarity of a time series based on the Discrete Fourier Transform (Technical Report).

    Authors: Yogesh Dwivedi, Suhasini Subba Rao
    Subjects: Methodology
    Abstract

    We consider a zero mean discrete time series, and define its discrete Fourier
    transform at the canonical frequencies. It is well known that the discrete
    Fourier transform is asymptotically uncorrelated at the canonical frequencies
    if and if only the time series is second order stationary. Exploiting this
    important property, we construct a Portmanteau type test statistic for testing
    stationarity of the time series. It is shown that under the null of
    stationarity, the test statistic is approximately a chi square distribution.

  599. Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs.

    Authors: Satoshi Aoki
    Subjects: Methodology
    Abstract

    We present some optimal criteria to evaluate model-robustness of non-regular
    two-level fractional factorial designs. Our method is based on minimizing the
    sum of squares of all the off-diagonal elements in the information matrix, and
    considering expectation under appropriate distribution functions for unknown
    contamination of the interaction effects. By considering uniform distributions
    on symmetric support, our criteria can be expressed as linear combinations of
    $B_s(d)$ characteristic, which is used to characterize the generalized minimum
    aberration.

  600. Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modelling.

    Authors: Christian P. Robert, James P. Hobert, Vivekananda Roy
    Subjects: Methodology
    Abstract

    Every reversible Markov chain defines an operator whose spectrum encodes the
    convergence properties of the chain. When the state space is finite, the
    spectrum is just the set of eigenvalues of the corresponding Markov transition
    matrix. However, when the state space is infinite, the spectrum may be
    uncountable, and is nearly always impossible to calculate. In most applications
    of the data augmentation (DA) algorithm, the state space of the DA Markov chain
    is infinite.

  601. Comments on "Particle Markov Chain Monte Carlo" by C. Andrieu, A. Doucet and R. Hollenstein.

    Authors: Julien Cornebise, Gareth W. Peters
    Subjects: Methodology
    Abstract

    We merge in this note our two discussions about the Read Paper "Particle
    Markov chain Monte Carlo" (Andrieu, Doucet, and Holenstein, 2010) presented on
    October 16th 2009 at the Royal Statistical Society, appearing in the Journal of
    the Royal Statistical Society Series B. We also present a more detailed version
    of the ABC extension.

  602. A random-projection based procedure to test if a stationary process is Gaussian.

    Authors: Juan .A. Cuesta-Albertos, Fabrice Gamboa Alicia Nieto-Reyes
    Subjects: Methodology
    Abstract

    In this paper we address the statistical problem of testing if a stationary
    process is Gaussian. The observation consists in a finite sample path of the
    process. Using a random projection technique introduced and studied in
    Cuesta-Albertos et al. 2007 in the frame of goodness of fit test for functional
    data, we perform some decision rules. These rules really stand on the whole
    distribution of the process and not only on its marginal distribution at a
    fixed order. The main idea is to test the Gaussianity on the marginal
    distribution of some random linear combinations of the process.

  603. Decomposition and Model Selection for Large Contingency Tables.

    Authors: Markus Kalisch, Peter B&#xfc;hlmann, Corinne Dahinden
    Subjects: Methodology
    Abstract

    Large contingency tables summarizing categorical variables arise in many
    areas. For example in biology when a large number of biomarkers are
    cross-tabulated according to their discrete expression level. Interactions of
    the variables are generally studied with log-linear models and the structure of
    a log-linear model can be visually represented by a graph from which the
    conditional independence structure can then be read off.

  604. Spatial Analysis of Opportunistic Downlink Relaying in a Two-Hop Cellular System.

    Authors: Martin Haenggi, Radha Krishna Ganti
    Subjects: Methodology
    Abstract

    We consider a two-hop cellular system in which the mobile nodes help the base
    station by relaying information to the dead spots. While two-hop cellular
    schemes have been analyzed previously, the distribution of the node locations
    has not been explicitly taken into account. In this paper, we model the node
    locations of the base stations and the mobile stations as a point process on
    the plane and then analyze the performance of two different two-hop schemes in
    the downlink.

  605. A Hierarchical Bayesian Model for Frame Representation.

    Authors: L. Cha&#xe2;ri, J.-C. Pesquet, J.-Y. Tourneret, Ph. Ciuciu, A. Benazza-Benyahia
    Subjects: Methodology
    Abstract

    In many signal processing problems, it may be fruitful to represent the
    signal under study in a frame. If a probabilistic approach is adopted, it
    becomes then necessary to estimate the hyper-parameters characterizing the
    probability distribution of the frame coefficients. This problem is difficult
    since in general the frame synthesis operator is not bijective. Consequently,
    the frame coefficients are not directly observable. This paper introduces a
    hierarchical Bayesian model for frame representation.

  606. Local statistical modeling by cluster-weighted.

    Authors: Giorgio Vittadini, Salvatore Ingrassia, Simona C. Minotti
    Subjects: Methodology
    Abstract

    We investigate statistical properties of Cluster-Weighted Modeling, which is
    a framework for supervised learning originally developed in order to recreate a
    digital violin with traditional inputs and realistic sound. The analysis is
    carried out in comparison with Finite Mixtures of Regression models. Based on
    some geometrical arguments, we highlight that Cluster-WeightedModeling provides
    a quite general framework for local statistical modeling. Theoretical results
    are illustrated on the ground of some numerical simulations.

  607. High dimensional sparse covariance estimation via directed acyclic graphs.

    Authors: Peter B&#xfc;hlmann, Philipp R&#xfc;timann
    Subjects: Methodology
    Abstract

    We present a graph-based technique for estimating sparse covariance matrices
    and their inverse from high-dimensional data. The method is based on learning a
    directed acyclic graph (DAG) and estimating parameters of a multivariate
    Gaussian distribution based on a DAG. For inferring the underlying DAG we use
    the PC-algorithm and for estimating the DAG-based covariance matrix and its
    inverse, we use a Cholesky decomposition approach which provides a positive
    (semi-)definite sparse estimate.

  608. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew $t$ distribution.

    Authors: Adelchi Azzalini, Antonella Capitanio
    Subjects: Methodology
    Abstract

    A fairly general procedure is studied to perturbate a multivariate density
    satisfying a weak form of multivariate symmetry, and to generate a whole set of
    non-symmetric densities. The approach is general enough to encompass a number
    of recent proposals in the literature, variously related to the skew normal
    distribution. The special case of skew elliptical densities is examined in
    detail, establishing connections with existing similar work.

  609. Statistical applications of the multivariate skew-normal distribution.

    Authors: Adelchi Azzalini, Antonella Capitanio
    Subjects: Methodology
    Abstract

    Azzalini & Dalla Valle (1996) have recently discussed the multivariate
    skew-normal distribution which extends the class of normal distributions by the
    addition of a shape parameter. The first part of the present paper examines
    further probabilistic properties of the distribution, with special emphasis on
    aspects of statistical relevance. Inferential and other statistical issues are
    discussed in the following part, with applications to some multivariate
    statistics problems, illustrated by numerical examples.

  610. Nonparametric Benchmarking Using the Extended Rank Likelihood.

    Authors: James G. Scott
    Subjects: Methodology
    Abstract

    This paper introduces a novel class of methods for statistical benchmarking
    in data sets with awkward marginals and complicated dependence structures. We
    develop both a semiparametric version of the method based on Gaussian copulas,
    and a nonparametric version using Bayesian tree models in conjunction with the
    extended rank likelihood of Hoff (2007). Two simple examples are used to study
    the method and compare it to other logical benchmarking methods. Finally, the
    method is applied to a very large database on corporate performance over the
    last four decades.

  611. Regression on a Graph.

    Authors: Arne Kovac, Andrew D.A.C. Smith
    Subjects: Methodology
    Abstract

    The `Signal plus Noise' model for nonparametric regression can be extended to
    the case of observations taken at the vertices of a graph. This model includes
    many familiar regression problems. This article discusses the use of the edges
    of a graph to measure roughness in penalized regression. Distance between
    estimate and observation is measured at every vertex in the $L_2$ norm, and
    roughness is penalized on every edge in the $L_1$ norm. Thus the ideas of
    total-variation penalization can be extended to a graph.

  612. On Bayesian Curve Fitting Via Auxiliary Variables.

    Authors: Y. Fan, J.-L Dortet-Bernadet, S. A. Sisson
    Subjects: Methodology
    Abstract

    In this article we revisit the auxiliary variable method introduced in Smith
    and kohn (1996) for the fitting of P-th order spline regression models with an
    unknown number of knot points. We introduce modifications which allow the
    location of knot points to be random, and we further consider an extension of
    the method to handle models with non-Gaussian errors. We provide a new
    algorithm for the MCMC sampling of such models. Simulated data examples are
    used to compare the performance of our method with existing ones.

  613. Minimax rank estimation for subspace tracking.

    Authors: Patrick J. Wolfe, Patrick O. Perry
    Subjects: Methodology
    Abstract

    Rank estimation is a classical model order selection problem that arises in a
    variety of important statistical signal and array processing systems, yet is
    addressed relatively infrequently in the extant literature. Here we present
    sample covariance asymptotics stemming from random matrix theory, and bring
    them to bear on the problem of optimal rank estimation in the context of the
    standard array observation model with additive white Gaussian noise.

  614. Identification and quantification of Granger causality between gene sets.

    Authors: Andre Fujita, Joao Ricardo Sato, Kaname Kojima, Luciana Rodrigues Gomes, Masao Nagasaki, Mari Cleide Sogayar, Satoru Miyano
    Subjects: Methodology
    Abstract

    Wiener and Granger have introduced an intuitive concept of causality between
    two variables which is based on the idea that an effect never occurs before its
    cause. Later, Geweke has generalized this concept to a multivariate Granger
    causality, i.e., n variables Granger-cause another variable. Although Granger
    causality is not "effective causality", this concept is useful to infer
    directionality and information flow in observational data. Granger causality is
    usually identified by using VAR models due to their simplicity.

  615. Irregular sets and Central Limit Theorems for dependent triangular arrays.

    Authors: Beatriz Marron, Ana Tablar
    Subjects: Methodology
    Abstract

    In previous papers, we studied the asymptotic behaviour of
    $S_N(A,X)=(2N+1)^{-d/2}\sum_{n \in A_N} X_n,$ where $X$ is a centered,
    stationary and weakly dependent random field, and $A_N=A \cap [-N,N]^d$, $A
    \subset \mathbb{Z}^d$. This leads to the definition of asymptotically
    measurable sets, which enjoy the property that $S_N(A;X)$ has a Gaussian weak
    limit for any $X$ belonging to a certain class. Here we extend this type of
    results to the case of weakly dependent triangular arrays and present an
    application of this technique to regression models.

  616. Empirically corrected estimation of complete-data population summaries under model misspecification.

    Authors: Arseni Seregin, Vladimir N. Minin, John D. O&#x27;Brien
    Subjects: Methodology
    Abstract

    Inference problems with incomplete observations often aim at estimating
    population-level properties of complete data. We introduce a simple empirical
    correction that provides partial protection against model misspecification
    during such estimation. Unlike nonparametric or semiparametric techniques, our
    empirical correction does not produce consistent estimates. Instead, our method
    first fits a misspecified parametric model, whose plug-in estimate of the
    quantity of interest is naturally inconsistent.

  617. Local likelihood estimation of local parameters for nonstationary random fields.

    Authors: Ethan Anderes, Michael Stein
    Subjects: Methodology
    Abstract

    We develop a weighted local likelihood estimate for the parameters that
    govern the local spatial dependency of a locally stationary random field. The
    advantage of this local likelihood estimate is that it smoothly downweights the
    influence of far away observations, works for irregular sampling locations, and
    when designed appropriately, can trade bias and variance for reducing
    estimation error. This paper starts with an exposition of our technique on the
    problem of estimating an unknown positive function when multiplied by a
    stationary random field.

  618. Statistical Inference for Disordered Sphere Packings.

    Authors: Jeffrey Picka
    Subjects: Methodology
    Abstract

    Sphere packings are essential to the development of physical models for
    powders, composite materials, and the atomic structure of the liquid state.
    There is a strong scientific need to be able to assess the fit of packing
    models to data, but this is complicated by the lack of formal probabilistic
    models for packings. Without formal models, simulation algorithms and
    collections of physical objects must be used as models.

  619. Estimation of safety areas for epidemic spread.

    Authors: Beatriz Marron, Ana Tablar
    Subjects: Methodology
    Abstract

    In this work we study safety areas in epidemic spred. The aim of this work
    is, given the evolution of epidemic at time $t$, find a safety set at time
    $t+h$. This is, a random set $K_{t+h}$ such that the probability that infection
    reaches $K_{t+h}$ at time $t+h$ is small.

  620. Two-sample Bayesian nonparametric hypothesis testing.

    Authors: C.C. Holmes, F. Caron, J. E. Griffin, D. A. Stephens
    Subjects: Methodology
    Abstract

    In this article we describe Bayesian nonparametric procedures for two-sample
    hypothesis testing. Namely, given two sets of samples y^{(1)} iid F^{(1)} and
    y^{(2)} iid F^{(2)}, with F^{(1)}, F^{(2)} unknown, we wish to evaluate the
    evidence for the null hypothesis H_{0}:F^{(1)} = F^{(2)} versus the
    alternative. Our method is based upon a nonparametric Polya tree prior centered
    either subjectively or using an empirical procedure.

  621. Nonparametric methods for volatility density estimation.

    Authors: Bert van Es, Peter Spreij, Harry van Zanten
    Subjects: Methodology
    Abstract

    Stochastic volatility modelling of financial processes has become
    increasingly popular. The proposed models usually contain a stationary
    volatility process. We will motivate and review several nonparametric methods
    for estimation of the density of the volatility process. Both models based on
    discretely sampled continuous time processes and discrete time models will be
    discussed.

  622. Outlier Elimination for Robust Ellipse and Ellipsoid Fitting.

    Authors: H. Vincent Poor, Jieqi Yu, Haipeng Zheng, Sanjeev R. Kulkarni
    Subjects: Methodology
    Abstract

    In this paper, an outlier elimination algorithm for ellipse/ellipsoid fitting
    is proposed. This two-stage algorithm employs a proximity-based outlier
    detection algorithm (using the graph Laplacian), followed by a model-based
    outlier detection algorithm similar to random sample consensus (RANSAC). These
    two stages compensate for each other so that outliers of various types can be
    eliminated with reasonable computation. The outlier elimination algorithm
    considerably improves the robustness of ellipse/ellipsoid fitting as
    demonstrated by simulations.

  623. Bayesian Core: The Complete Solution Manual.

    Authors: Christian P. Robert, Jean-Michel Marin
    Subjects: Methodology
    Abstract

    This solution manual contains the unabridged and original solutions to all
    the exercises proposed in Bayesian Core, along with R programs when necessary.

  624. Effect of indirect dependencies on "A mutual information minimization approach for a class of nonlinear recurrent separating systems".

    Authors: Yannick Deville, Alain Deville, Shahram Hosseini
    Subjects: Methodology
    Abstract

    In a recent paper [4], Duarte and Jutten investigated the Blind Source
    Separation (BSS) problem, for the nonlinear mixing model that they introduced
    in that paper. They proposed to solve this problem by using
    information-theoretic tools, more precisely by minimizing the mutual
    information (MI) of the outputs of the separating structure. When applying the
    MI approach to BSS problems, one usually determines the analytical expressions
    of the derivatives of the MI with respect to the parameters of the considered
    separating model.

  625. Comment: The Essential Role of Pair Matching.

    Authors: Jennifer Hill, Marc Scott
    Subjects: Methodology
    Abstract

    Comment on "The Essential Role of Pair Matching in Cluster-Randomized
    Experiments, with Application to the Mexican Universal Health Insurance
    Evaluation" [arXiv:0910.3752]

  626. Comment: The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation.

    Authors: Dylan S. Small, Kai Zhang
    Subjects: Methodology
    Abstract

    Comment on ``The Essential Role of Pair Matching in Cluster-Randomized
    Experiments, with Application to the Mexican Universal Health Insurance
    Evaluation'' [arXiv:0910.3752]

  627. Rejoinder: Matched Pairs and the Future of Cluster-Randomized Experiments.

    Authors: Kosuke Imai, Gary King, Clayton Nall
    Subjects: Methodology
    Abstract

    Rejoinder to "The Essential Role of Pair Matching in Cluster-Randomized
    Experiments, with Application to the Mexican Universal Health Insurance
    Evaluation" [arXiv:0910.3752]

  628. The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation.

    Authors: Kosuke Imai, Gary King, Clayton Nall
    Subjects: Methodology
    Abstract

    A basic feature of many field experiments is that investigators are only able
    to randomize clusters of individuals--such as households, communities, firms,
    medical practices, schools or classrooms--even when the individual is the unit
    of interest. To recoup the resulting efficiency loss, some studies pair similar
    clusters and randomize treatment within pairs. However, many other studies
    avoid pairing, in part because of claims in the literature, echoed by clinical
    trials standards organizations, that this matched-pair, cluster-randomization
    design has serious problems.

  629. Comment: Citation Statistics.

    Authors: Sune Lehmann, Benny E. Lautrup, Andrew D. Jackson
    Subjects: Methodology
    Abstract

    We discuss the paper "Citation Statistics" by the Joint Committee on
    Quantitative Assessment of Research [arXiv:0910.3529]. In particular, we focus
    on a necessary feature of "good" measures for ranking scientific authors: that
    good measures must able to accurately distinguish between authors.

  630. Citation Statistics.

    Authors: Robert Adler, John Ewing, Peter Taylor
    Subjects: Methodology
    Abstract

    This is a report about the use and misuse of citation data in the assessment
    of scientific research. The idea that research assessment must be done using
    ``simple and objective'' methods is increasingly prevalent today. The ``simple
    and objective'' methods are broadly interpreted as bibliometrics, that is,
    citation data and the statistics derived from them. There is a belief that
    citation statistics are inherently more accurate because they substitute simple
    numbers for complex judgments, and hence overcome the possible subjectivity of
    peer review. But this belief is unfounded.

  631. Rejoinder: Citation Statistics.

    Authors: Robert Adler, John Ewing, Peter Taylor
    Subjects: Methodology
    Abstract

    Rejoinder to "Citation Statistics" [arXiv:0910.3529]

  632. Least Squares estimation of two ordered monotone regression curves.

    Authors: Fadoua Balabdaoui, Filippo Santambrogio, Kaspar Rufibach
    Subjects: Methodology
    Abstract

    In this paper, we consider the problem of finding the Least Squares
    estimators of two isotonic regression curves $g^\circ_1$ and $g^\circ_2$ under
    the additional constraint that they are ordered; e.g., $g^\circ_1 \le
    g^\circ_2$.

  633. Comment: Bibliometrics in the Context of the UK Research Assessment Exercise.

    Authors: Bernard W. Silverman
    Subjects: Methodology
    Abstract

    Research funding and reputation in the UK have, for over two decades, been
    increasingly dependent on a regular peer-review of all UK departments. This is
    to move to a system more based on bibliometrics. Assessment exercises of this
    kind influence the behavior of institutions, departments and individuals, and
    therefore bibliometrics will have effects beyond simple measurement.
    [arXiv:0910.3529]

  634. Comment: Citation Statistics.

    Authors: David Spiegelhalter, Harvey Goldstein
    Subjects: Methodology
    Abstract

    Comment on "Citation Statistics" [arXiv:0910.3529]

  635. Comment: Citation Statistics.

    Authors: Peter Gavin Hall
    Subjects: Methodology
    Abstract

    Comment on "Citation Statistics" [arXiv:0910.3529]

  636. An Evening Spent with Bill van Zwet.

    Authors: R. J. Beran, N. I. Fisher
    Subjects: Methodology
    Abstract

    Willem Rutger van Zwet was born in Leiden, the Netherlands, on March 31,
    1934. He received his high school education at the Gymnasium Haganum in The
    Hague and obtained his Masters degree in Mathematics at the University of
    Leiden in 1959. After serving in the army for almost two years, he obtained his
    Ph.D. at the University of Amsterdam in 1964, with Jan Hemelrijk as advisor. In
    1965, he was appointed Associate Professor of Statistics at the University of
    Leiden and promoted to Full Professor in 1968.

  637. A Conversation with Murray Rosenblatt.

    Authors: David R. Brillinger, Richard A. Davis
    Subjects: Methodology
    Abstract

    On an exquisite March day in 2006, David Brillinger and Richard Davis sat
    down with Murray and Ady Rosenblatt at their home in La Jolla, California for
    an enjoyable day of reminiscences and conversation. Our mentor, Murray
    Rosenblatt, was born on September 7, 1926 in New York City and attended City
    College of New York before entering graduate school at Cornell University in
    1946. After completing his Ph.D.

  638. Maximum Entropy Edgeworth Estimates of Volumes of Polytopes.

    Authors: Alexander Barvinok, J.A.Hartigan
    Subjects: Methodology
    Abstract

    The number of points (x_1,...x_n) that lie in an integer cube C in R_n and
    satisfy the constraints S_i=sum_i[ h_{ij}(x_j) ] is approximated by an
    Edgeworth corrected gaussian approximation based on the maximum entropy density
    p on C, that satisfies ES = s . Under p, the variables X_1,...X_n are
    independent with densities of exponential form. Conditional on S = s, X is
    uniformly distributed over the integers in C that satisfy S = s . The number of
    points in C satisfying S=s is p{S=s} exp(I(p)) where I(p) is the entropy of the
    density p .

  639. Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications.

    Authors: Thomas Brendan Murphy, Nema Dean, Adrian E. Raftery
    Subjects: Methodology
    Abstract

    Food authenticity studies are concerned with determining if food samples have
    been correctly labelled or not. Discriminant analysis methods are an integral
    part of the methodology for food authentication. Motivated by food authenticity
    applications, a model-based discriminant analysis method that includes variable
    selection is presented. The discriminant analysis model is fitted in a
    semi-supervised manner using both labeled and unlabeled data.

  640. Overlapping Stochastic Block Models.

    Authors: Christophe Ambroise, Pierre Latouche, Etienne Birmel&#xe9;
    Subjects: Methodology
    Abstract

    Complex systems in nature and in society are often represented as networks,
    describing the rich set of interactions between objects of interest. Many
    deterministic and probabilistic clustering methods have been developed to
    analyze such structures. Given a network, almost all of them partition the
    vertices into \emph{disjoint} clusters, according to their connection profile.
    However, recent studies have shown that these techniques were too restrictive
    and that most of the existing networks contained overlapping clusters.

  641. Bayesian testing of many hypotheses $\times$ many genes: A study of sleep apnea.

    Authors: Dylan S. Small, Shane T. Jensen, Ibrahim Erkan, Erna S. Arnardottir
    Subjects: Methodology
    Abstract

    Substantial statistical research has recently been devoted to the analysis of
    large-scale microarray experiments which provide a measure of the simultaneous
    expression of thousands of genes in a particular condition. A typical goal is
    the comparison of gene expression between two conditions (e.g., diseased vs.
    nondiseased) to detect genes which show differential expression. Classical
    hypothesis testing procedures have been applied to this problem and more recent
    work has employed sophisticated models that allow for the sharing of
    information across genes.

  642. Sure Independence Screening in Generalized Linear Models with NP-Dimensionality.

    Authors: Rui Song, Jianqing Fan
    Subjects: Methodology
    Abstract

    Ultrahigh dimensional variable selection plays an increasingly important role
    in contemporary scientific discoveries and statistical research. Among others,
    Fan and Lv (2008) propose an independent screening framework by ranking the
    marginal correlations. They showed that the correlation ranking procedure
    possesses a sure independence screening property within the context of the
    linear model with Gaussian covariates and responses.

  643. Sure Independence Screening in Generalized Linear Models with NP-Dimensionality.

    Authors: Rui Song, Jianqing Fan
    Subjects: Methodology
    Abstract

    Ultrahigh dimensional variable selection plays an increasingly important role
    in contemporary scientific discoveries and statistical research. Among others,
    Fan and Lv (2008) propose an independent screening framework by ranking the
    marginal correlations. They showed that the correlation ranking procedure
    possesses a sure independence screening property within the context of the
    linear model with Gaussian covariates and responses.

  644. Moment analysis of the Delaunay tessellation field estimator.

    Authors: M.N.M. van Lieshout
    Subjects: Methodology
    Abstract

    The Campbell--Mecke theorem is used to derive explicit expressions for the
    mean and variance of Schaap and Van de Weygaert's Delaunay tessellation field
    estimator. Special attention is paid to Poisson processes.

  645. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm.

    Authors: Marloes H. Maathuis, Markus Kalisch, Peter B&#xfc;hlmann
    Subjects: Methodology
    Abstract

    We consider variable selection in high-dimensional linear models where the
    number of covariates greatly exceeds the sample size. We introduce the new
    concept of partial faithfulness and use it to infer associations between the
    covariates and the response.

  646. Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm.

    Authors: Marloes H. Maathuis, Markus Kalisch, Peter B&#xfc;hlmann
    Subjects: Methodology
    Abstract

    We consider variable selection in high-dimensional linear models where the
    number of covariates greatly exceeds the sample size. We introduce the new
    concept of partial faithfulness and use it to infer associations between the
    covariates and the response.

  647. Estimating the null distribution for conditional inference and genome-scale screening.

    Authors: David R. Bickel
    Subjects: Methodology
    Abstract

    In a novel approach to the multiple testing problem, Efron (2004; 2007)
    formulated estimators of the distribution of test statistics or nominal
    p-values under a null distribution suitable for modeling the data of thousands
    of unaffected genes, non-associated single-nucleotide polymorphisms, or other
    biological features. Estimators of the null distribution can improve not only
    the empirical Bayes procedure for which it was originally intended, but also
    many other multiple comparison procedures.

  648. The Rank of the Covariance Matrix of an Evanescent Field.

    Authors: M. Kliger, J. M. Francos
    Subjects: Methodology
    Abstract

    Evanescent random fields arise as a component of the 2-D Wold decomposition
    of homogenous random fields. Besides their theoretical importance, evanescent
    random fields have a number of practical applications, such as in modeling the
    observed signal in the space time adaptive processing (STAP) of airborne radar
    data.

  649. A Bounded Derivative Method for the Maximum Likelihood Estimation on Weibull Parameters.

    Authors: DeTao Mao, Wenyuan Li
    Subjects: Methodology
    Abstract

    For the basic maximum likelihood estimating function of the two parameters
    Weibull distribution, a simple proof on its global monotonicity is given to
    ensure the existence and uniqueness of its solution. The boundary of the
    function's first-order derivative is defined based on its scale-free property.
    With a bounded derivative, the possible range of the root of this function can
    be determined. A novel root-finding algorithm employing these established
    results is proposed accordingly, its convergence is proved analytically as
    well.

  650. Model choice versus model criticism.

    Authors: Christian P. Robert, Kerrie L. Mengersen, Carla Chen
    Subjects: Methodology
    Abstract

    The new perspectives on ABC and Bayesian model criticisms presented in
    Ratmann et al.(2009) are challenging standard approaches to Bayesian model
    choice. We discuss here some issues arising from the authors' approach,
    including prior influence, model assessment and criticism, and the meaning of
    error in ABC.

  651. A generalized algorithm for estimating the parameters of a multivariate, coupled diffusion system.

    Authors: Melvin M. Varughese
    Subjects: Methodology
    Abstract

    Diffusion processes are used to model a wide range of real-world phenomena.
    Consequently, the ability to estimate the diffusion parameters from available
    time series data would greatly further our understanding of such phenomena. A
    new algorithm is proposed to estimate the parameters for a Fokker-Planck
    system. The algorithm is applicable for a wide class of multivariate diffusion
    systems. Not only does the method provide reliable parameter estimates in a
    highly computationally efficient manner, it can also produce credibility
    intervals for these parameters.

  652. On the relevance of the Bayesian approach to Statistics.

    Authors: Christian P. Robert
    Subjects: Methodology
    Abstract

    We argue here about the relevance and the ultimate unity of the Bayesian
    approach in a non-conflicting and non-antagonistic manner. Our main theme is
    that Bayesian data analysis is an effective tool for handling complex models,
    as proven by the increasing proportion of Bayesian studies in the applied
    sciences. We disregard in this essay the philosophical debates on the deeper
    meaning of probability and on the random nature of parameters as things of the
    past that do a disservice to the approach and are incomprehensible to most
    bystanders.

  653. Errors-in-variables models: a generalized functions approach.

    Authors: Victoria Zinde-Walsh
    Subjects: Methodology
    Abstract

    Identification in errors-in-variables regression models was recently extended
    to wide models classes by S. Schennach (Econometrica, 2007) (S) via use of
    generalized functions. In this paper the problems of non- and semi- parametric
    identification in such models are re-examined. Nonparametric identification
    holds under weaker assumptions than in (S); the proof here does not rely on
    decomposition of generalized functions into ordinary and singular parts, which
    may not hold.

  654. Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times.

    Authors: Marloes H. Maathuis, Michael G. Hudgens
    Subjects: Methodology
    Abstract

    New methods and theory have recently been developed to nonparametrically
    estimate cumulative incidence functions for competing risks survival data
    subject to current status censoring. In particular, the limiting distribution
    of the nonparametric maximum likelihood estimator (MLE) and a simplified "naive
    estimator" have been established under certain smoothness conditions. In this
    paper, we establish the large-sample behavior of these estimators in two
    additional models, namely when the observation time distribution has finite
    discrete support and when the observation times are grouped.

  655. Prediction of Ordered Random Effects in a Simple Small Area Model.

    Authors: Yaakov Malinovsky, Yosef Rinott
    Subjects: Methodology
    Abstract

    Prediction of a vector of ordered parameters or part of it arises naturally
    in the context of Small Area Estimation (SAE). For example, one may want to
    estimate the parameters associated with the top ten areas, the best or worst
    area, or a certain percentile. We use a simple SAE model to show that
    estimation of ordered parameters by the corresponding ordered estimates of each
    area separately does not yield good results with respect to MSE.

  656. An Alternating l1 approach to the compressed sensing problem.

    Authors: Stephane Chretien
    Subjects: Methodology
    Abstract

    Compressed sensing is a new methodology for constructing sensors which allow
    sparse signals to be efficiently recovered using only a small number of
    observations. The recovery problem can often be stated as the one of finding
    the solution of an underdetermined system of linear equations with the smallest
    possible support. The most studied relaxation of this hard combinatorial
    problem is the $l_1$-relaxation consisting of searching for solutions with
    smallest $l_1$-norm.

  657. Bayesian separation of spectral sources under non-negativity and full additivity constraints.

    Authors: Nicolas Dobigeon, Jean-Yves Tourneret, Said Moussaoui, Cedric Carteret
    Subjects: Methodology
    Abstract

    This paper addresses the problem of separating spectral sources which are
    linearly mixed with unknown proportions. The main difficulty of the problem is
    to ensure the full additivity (sum-to-one) of the mixing coefficients and
    non-negativity of sources and mixing coefficients. A Bayesian estimation
    approach based on Gamma priors was recently proposed to handle the
    non-negativity constraints in a linear mixture model. However, incorporating
    the full additivity constraint requires further developments.

  658. Simultaneous confidence bands for nonparametric regression with repeated measurements data.

    Authors: David A. Degras
    Subjects: Methodology
    Abstract

    We look into nonparametric regression with repeated measurements collected on
    a fine grid. An asymptotic normality result is obtained in a function space.
    This result can be used to build simultaneous confidence bands (SCB) for
    various tasks in statistical exploration, estimation and inference. Two
    applications are proposed: one is a SCB procedure for the regression function
    and the other is a goodness-of-fit test for linear regression models.

  659. Maximum Entropy Estimation for Survey sampling.

    Authors: Fabrice Gamboa, Jean-Michel Loubes, Paul Rochet
    Subjects: Methodology
    Abstract

    Calibration methods have been widely studied in survey sampling over the last
    decades. Viewing calibration as an inverse problem, we extend the calibration
    technique by using a maximum entropy method. Finding the optimal weights is
    achieved by considering random weights and looking for a discrete distribution
    which maximizes an entropy under the calibration constraint. This method points
    a new frame for the computation of such estimates and the investigation of its
    statistical properties.

  660. FDR Control with adaptive procedures and FDR monotonicity.

    Authors: Amit Zeisel, Or Zuk, Eytan Domany
    Subjects: Methodology
    Abstract

    The steep rise in availability and usage of high-throughput technologies in
    biology brought with it a clear need for methods to control the False Discovery
    Rate (FDR) in multiple tests. Benjamini and Hochberg (BH) introduced in 1995 a
    simple procedure and proved that it provided a bound on the expected value, FDR
    < q. Since then, many authors tried to improve the BH bound, with one approach
    being designing adaptive procedures, which aim at estimating the number of true
    null hypothesis in order to get a better FDR bound.

  661. Cross-Validation for Unsupervised Learning.

    Authors: Patrick O. Perry
    Subjects: Methodology
    Abstract

    Cross-validation (CV) is a popular method for model-selection. Unfortunately,
    it is not immediately obvious how to apply CV to unsupervised or exploratory
    contexts. This thesis discusses some extensions of cross-validation to
    unsupervised learning, specifically focusing on the problem of choosing how
    many principal components to keep. We introduce the latent factor model, define
    an objective criterion, and show how CV can be used to estimate the intrinsic
    dimensionality of a data set.

  662. Structure Variability in Bayesian Networks.

    Authors: Marco Scutari
    Subjects: Methodology
    Abstract

    The structure of a Bayesian network encodes most of the information about the
    probability distribution of the data, which is uniquely identified given some
    general distributional assumptions. Therefore it's important to study the
    variability of its network structure, which can be used to compare the
    performance of different learning algorithms and to measure the strength of any
    arbitrary subset of arcs.

  663. Estimating migration proportions from discretely observed continuous diffusion processes.

    Authors: V. Calian, G. Stefansson, L. P. Folkow, A.S. Blix
    Subjects: Methodology
    Abstract

    We model two time and space scales discrete observations by using a unique
    continuous diffusion process with time dependent coefficient. We define new
    parameters for the large scale model as functions of the small scale
    distribution cumulants. We use the non - uniform distribution of the
    observation time intervals to obtain consistent and unbiased estimators for
    these parameters. Closed form expressions for migration proportions between
    spatial domains are derived as functions of these parameters. The models are
    applied to estimate migration patterns from satellite tag data.

  664. Shrinkage Tuning Parameter Selection in Precision Matrices Estimation.

    Authors: Heng Lian
    Subjects: Methodology
    Abstract

    Recent literature provides many computational and modeling approaches for
    covariance matrices estimation in a penalized Gaussian graphical models but
    relatively little study has been carried out on the choice of the tuning
    parameter. This paper tries to fill this gap by focusing on the problem of
    shrinkage parameter selection when estimating sparse precision matrices using
    the penalized likelihood approach. Previous approaches typically used K-fold
    cross-validation in this regard.

  665. Harold Jeffreys' Theory of Probability revisited: a reply.

    Authors: Judith Rousseau, Christian P. Robert, Nicolas Chopin
    Subjects: Methodology
    Abstract

    We are grateful to all discussants (Bernardo, Gelman, Kass, Lindley, Senn,
    and Zellner) of our re-visitation for their strong support in our enterprise
    and for their overall agreement with our perspective. Further discussions with
    them and other leading statisticians showed that the legacy of Theory of
    Probability is alive and lasting.

  666. Harold Jeffreys' Theory of Probability revisited: a reply.

    Authors: Judith Rousseau, Christian P. Robert, Nicolas Chopin
    Subjects: Methodology
    Abstract

    We are grateful to all discussants (Bernardo, Gelman, Kass, Lindley, Senn,
    and Zellner) of our re-visitation for their strong support in our enterprise
    and for their overall agreement with our perspective. Further discussions with
    them and other leading statisticians showed that the legacy of Theory of
    Probability is alive and lasting.

  667. Tuning parameter selection for penalized likelihood estimation of inverse covariance matrix.

    Authors: Xin Gao, Daniel Q. Pu, Yuehua Wu, Hong Xu
    Subjects: Methodology
    Abstract

    In a Gaussian graphical model, the conditional independence between two
    variables are characterized by the corresponding zero entries in the inverse
    covariance matrix. Maximum likelihood method using the smoothly clipped
    absolute deviation (SCAD) penalty (Fan and Li, 2001) and the adaptive LASSO
    penalty (Zou, 2006) have been proposed in literature. In this article, we
    establish the result that using Bayesian information criterion (BIC) to select
    the tuning parameter in penalized likelihood estimation with both types of
    penalties can lead to consistent graphical model selection.

  668. Estimating high-dimensional intervention effects from observational data.

    Authors: Marloes H. Maathuis, Markus Kalisch, Peter B&#xfc;hlmann
    Subjects: Methodology
    Abstract

    We assume that we have observational data generated from an unknown
    underlying directed acyclic graph (DAG) model. A DAG is typically not
    identifiable from observational data, but it is possible to consistently
    estimate the equivalence class of a DAG. Moreover, for any given DAG, causal
    effects can be estimated using intervention calculus. In this paper, we combine
    these two parts. For each DAG in the estimated equivalence class, we use
    intervention calculus to estimate the causal effects of the covariates on the
    response.

  669. The distribution of the maximum of a first order moving average: the continuous case.

    Authors: Christopher S. Withers, Saralees Nadarajah
    Subjects: Methodology
    Abstract

    We give the distribution of $M_n$, the maximum of a sequence of $n$
    observations from a moving average of order 1. Solutions are first given in
    terms of repeated integrals and then for the case where the underlying
    independent random variables have an absolutely continuous density.

  670. Regularized estimation of large-scale gene association networks using graphical Gaussian models.

    Authors: Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix
    Subjects: Methodology
    Abstract

    Graphical Gaussian models are popular tools for the estimation of
    (undirected) gene association networks from microarray data. A key issue when
    the number of variables greatly exceeds the number of samples is the estimation
    of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the
    sample covariance matrix leads to poor estimates in this scenario, standard
    methods are inappropriate and adequate regularization techniques are needed.

  671. Bayesian orthogonal component analysis for sparse representation.

    Authors: Nicolas Dobigeon, Jean-Yves Tourneret
    Subjects: Methodology
    Abstract

    This paper addresses the problem of identifying a lower dimensional space
    where observed data can be sparsely represented. This under-complete dictionary
    learning task can be formulated as a blind separation problem of sparse sources
    linearly mixed with an unknown orthogonal mixing matrix. This issue is
    formulated in a Bayesian framework. First, the unknown sparse sources are
    modeled as Bernoulli-Gaussian processes. To promote sparsity, a weighted
    mixture of an atom at zero and a Gaussian distribution is proposed as prior
    distribution for the unobserved sources.

  672. Decentralized Sequential Hypothesis Testing using Asynchronous Communication.

    Authors: Georgios Fellouris, George V. Moustakides
    Subjects: Methodology
    Abstract

    We present a test for the problem of decentralized sequential hypothesis
    testing, which is asymptotically optimum. By selecting a suitable sampling
    mechanism at each sensor, communication between sensors and fusion center is
    asynchronous and limited to 1-bit data. The proposed SPRT-like test turns out
    to be order-2 asymptotically optimum in the case of continuous time and
    continuous path signals, while in discrete time this strong asymptotic
    optimality property is preserved under proper conditions. If these conditions
    do not hold, then we can show optimality of order-1.

  673. Self-consistent method for density estimation.

    Authors: Alberto Bernacchia, Simone Pigolotti
    Subjects: Methodology
    Abstract

    The estimation of a density profile from experimental data points is a
    challenging problem, usually tackled by plotting a histogram. Prior assumptions
    on the nature of the density, from its smoothness to the specification of its
    form, allow the design of accurate estimation procedures, such as Maximum
    Likelihood. Our aim is to construct a procedure that makes the smallest
    possible number of assumptions, but still providing an accurate estimate of the
    density.

  674. Learning networks from high dimensional binary data: An application to genomic instability data.

    Authors: Pei Wang, Dennis L. Chao, Li Hsu
    Subjects: Methodology
    Abstract

    Genomic instability, the propensity of aberrations in chromosomes, plays a
    critical role in the development of many diseases. High throughput genotyping
    experiments have been performed to study genomic instability in diseases. The
    output of such experiments can be summarized as high dimensional binary
    vectors, where each binary variable records aberration status at one marker
    locus. It is of keen interest to understand how these aberrations interact with
    each other. In this paper, we propose a novel method, \texttt{LogitNet}, to
    infer the interactions among aberration events.

  675. A nonparametric independence test using random permutations.

    Authors: Jesus E. Garcia, Veronica A. Gonzalez-Lopez
    Subjects: Methodology
    Abstract

    We propose a new nonparametric test for the supposition of independence
    between two continuous random variables. The test is based on the size of the
    longest increasing subsequence of a random permutation. We identified the
    independence assumption between the two continuous variables with the space of
    permutation equipped with the uniform distribution and we show the exact
    distribution of the statistic. We calculate the distribution for several sample
    sizes.

  676. Estimation of Ambiguity Functions With Limited Spread.

    Authors: Heidi Hindberg, Sofia C. Olhede
    Subjects: Methodology
    Abstract

    This paper proposes a new estimation procedure for the ambiguity function of
    a non-stationary time series. The stochastic properties of the empirical
    ambiguity function calculated from a single sample in time are derived.
    Different thresholding procedures are introduced for the estimation of the
    ambiguity function. Such estimation methods are suitable if the ambiguity
    function is only non-negligible in a limited region of the ambiguity plane.

  677. Nonparametric estimation of the volatility function in a high-frequency model corrupted by noise.

    Authors: Axel Munk, Johannes Schmidt-Hieber
    Subjects: Methodology
    Abstract

    We consider the models Y_{i,n}=\int_0^{i/n}
    \sigma(s)dW_s+\tau(i/n)\epsilon_{i,n}, and \tilde
    Y_{i,n}=\sigma(i/n)W_{i/n}+\tau(i/n)\epsilon_{i,n}, i=1,...,n, where W_t
    denotes a standard Brownian motion and \epsilon_{i,n} are centered i.i.d.
    random variables with E(\epsilon_{i,n}^2)=1 and finite fourth moment.
    Furthermore, \sigma and \tau are unknown deterministic functions and W_t and
    (\epsilon_{1,n},...,\epsilon_{n,n}) are assumed to be independent processes.
    Based on a spectral decomposition of the covariance structures we derive series
    estimators for \sigma^2 and \tau^2 and investigate t

  678. Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities.

    Authors: Swarnendu Kar, Kishan G. Mehrotra, Pramod K. Varshney
    Subjects: Methodology
    Abstract

    Among the various procedures used to detect potential changes in a stochastic
    process the moving sum algorithms are very popular due to their intuitive
    appeal and good statistical performance. One of the important design parameters
    of a change detection algorithm is the expected interval between false
    positives, also known as the average run length (ARL). Computation of the ARL
    usually involves numerical procedures but in some cases it can be approximated
    using a series involving multivariate probabilities.

  679. Convergence of Nonparametric Long-Memory Phase I Designs.

    Authors: Assaf P. Oron, Peter D. Hoff
    Subjects: Methodology
    Abstract

    We examine Phase I cancer clinical trial designs that use toxicity estimates
    based on all available data at each dose-allocation decision, but refrain from
    employing parametric models or Bayesian decision rules. We show that one such
    design family, called here "interval designs", converges almost surely to the
    maximum tolerated dose under fairly general conditions. Another family called
    "point designs" does not converge.

  680. Sequential Quantile Prediction of Time Series.

    Authors: G&#xe9;rard Biau, Beno&#xee;t Patra
    Subjects: Methodology
    Abstract

    Motivated by a broad range of potential applications, we address the quantile
    prediction problem of real-valued time series. We present a sequential quantile
    forecasting model based on the combination of a set of elementary nearest
    neighbor-type predictors called "experts" and show its consistency under a
    minimum of conditions. Our approach builds on the methodology developed in
    recent years for prediction of individual sequences and exploits the quantile
    structure as a minimizer of the so-called pinball loss function.

RSS-материал