Applications

  1. Wiki surveys: Open and quantifiable social data collection.

    Authors: Matthew J. Salganik, Karen E. C. Levy
    Subjects: Applications
    Abstract

    Research about attitudes and opinions is central to social science and relies
    on two common methodological approaches: surveys and interviews. While surveys
    enable the quantification of large amounts of information quickly and at a
    reasonable cost, they are routinely criticized for being "top-down" and rigid.
    In contrast, interviews allow unanticipated information to "bubble up" directly
    from respondents, but are slow, expensive, and difficult to quantify.

  2. Partial Sliced Inverse Regression for Quality-Relevant Multivariate Statistical Process Monitoring.

    Authors: Yue Yu, Zhijie Sun
    Subjects: Applications
    Abstract

    This paper introduces a popular dimension reduction method, sliced inverse
    regression (SIR), into multivariate statistical process monitoring. Provides an
    extension of SIR for the single-index model by adopting the idea from partial
    least squares (PLS). Our partial sliced inverse regression (PSIR) method has
    the merit of incorporating information from both predictors (x) and responses
    (y), and it has capability of handling large, nonlinear, or "n<p" dataset.

  3. P-values, q-values and posterior probabilities for equivalence in genomics studies.

    Authors: J. Tuke, G. F. V. Glonek, P. J. Solomon
    Subjects: Applications
    Abstract

    Equivalence testing is of emerging importance in genomics studies but has
    hitherto been little studied in this content. In this paper, we define the
    notion of equivalence of gene expression and determine a `strength of evidence'
    measure for gene equivalence. It is common practice in genome-wide studies to
    rank genes according to observed gene-specific P-values or adjusted P-values,
    which are assumed to measure the strength of evidence against the null
    hypothesis of no differential gene expression.

  4. Propensity score matching in SPSS.

    Authors: Felix Thoemmes
    Subjects: Applications
    Abstract

    Propensity score matching is a tool for causal inference in non-randomized
    studies that allows for conditioning on large sets of covariates. The use of
    propensity scores in the social sciences is currently experiencing a tremendous
    increase; however it is far from a commonly used tool. One impediment towards a
    more wide-spread use of propensity score methods is the reliance on specialized
    software, because many social scientists still use SPSS as their main analysis
    tool. The current paper presents an implementation of various propensity score
    matching methods in SPSS.

  5. Technical Report #SEHIR-IE-VA-12-1: Optimal Obstacle Placement with Disambiguations.

    Authors: Elvan Ceyhan, Vural Aksakalli
    Subjects: Applications
    Abstract

    We introduce the optimal obstacle placement with disambiguations problem
    wherein the goal is to place true obstacles in an environment cluttered with
    false obstacles so as to maximize the total traversal length of a navigating
    agent (NAVA). Prior to the traversal, NAVA is given location information and
    probabilistic estimates of each disk-shaped hindrance (hereinafter referred to
    as disk) being a true obstacle. The NAVA can disambiguate a disk's status only
    when situated on its boundary. There exists an obstacle placing agent (OPA)
    that locates obstacles prior to NAVA's traversal.

  6. Modelling the effects of air pollution on health using Bayesian Dynamic Generalised Linear Models.

    Authors: Gavin Shaddick, Duncan Lee
    Subjects: Applications
    Abstract

    The relationship between short-term exposure to air pollution and mortality
    or morbidity has been the subject of much recent research, in which the
    standard method of analysis uses Poisson linear or additive models.

  7. Signal extraction and breakpoint identification for array CGH data using robust state space model.

    Authors: Bin Zhu, Jeremy M. G. Taylor, Peter X.-K. Song
    Subjects: Applications
    Abstract

    Array comparative genomic hybridization(CGH) is a high resolution technique
    to assess DNA copy number variation. Identifying breakpoints where copy number
    changes will enhance the understanding of the pathogenesis of human diseases,
    such as cancers. However, the biological variation and experimental errors
    contained in array CGH data may lead to false positive identification of
    breakpoints.

  8. Fossil fuel consumption and economic growth: causality relationship in the world.

    Authors: Hazuki Ishida
    Subjects: Applications
    Abstract

    Fossil fuels are major sources of energy, and have several advantages over
    other primary energy sources. Without extensive dependence on fossil fuels, it
    is questionable whether our economic prosperity can continue or not. This paper
    analyzes cointegration and causality between fossil fuel consumption and
    economic growth in the world over the period 1971--2008. The estimation results
    indicate that fossil fuel consumption and GDP are cointegrated and there exists
    long-run unidirectional causality from fossil fuel consumption to GDP.

  9. Database likelihood ratios and familial DNA searching.

    Authors: Ronald Meester, Klaas Slooten
    Subjects: Applications
    Abstract

    Familial Searching is the process of searching in a DNA database for
    relatives of a given individual. It is well known that in order to evaluate the
    genetic evidence in favour of a certain given form of relatedness between two
    individuals, one needs to calculate the appropriate likelihood ratio, which is
    in this context called a Kinship Index. Suppose that the database contains, for
    a given type of relative, at most one related individual.

  10. Dynamic Decision Making for Graphical Models Applied to Oil Exploration.

    Authors: Gabriele Martinelli, Jo Eidsvik, Ragnar Hauge
    Subjects: Applications
    Abstract

    We present a framework for sequential decision making in problems described
    by graphical models. The setting is given by dependent discrete random
    variables with associated costs or revenues. In our examples, the dependent
    variables are the potential outcomes (oil, gas or dry) when drilling a
    petroleum well. The goal is to develop an optimal selection strategy that
    incorporates a chosen utility function within an approximated dynamic
    programming scheme.

  11. Reliability-based design optimization of imperfect shells using adaptive kriging meta-models.

    Authors: Vincent Dubourg, Bruno Sudret, Jean-Marc Bourinet
    Subjects: Applications
    Abstract

    The optimal and robust design of structures has gained much attention in the
    past ten years due to the ever increasing need for manufacturers to build
    robust systems at the lowest cost. Reliability-based design optimization (RBDO)
    allows the analyst to minimize some cost function while ensuring some minimal
    performances cast as admissible probabilities of failure for a set of
    performance functions. In order to address real-world problems in which the
    performance is assessed through computational models (e.g.

  12. Confidence bounds for the sensitivity lack of a less specific diagnostic test, without gold standard.

    Authors: Lutz Mattner, Frauke Mattner
    Subjects: Applications
    Abstract

    We consider the problem of comparing two diagnostic tests based on a sample
    of paired test results without true state determinations, in cases where the
    second test can reasonably be assumed to be at least as specific as the first.
    For such cases, we provide two informative confidence bounds: A lower one for
    the prevalence times the sensitivity gain of the second test with respect to
    the first, and an upper one for the sensitivity of the first test. Neither
    conditional independence of the two tests nor perfectness of any of them needs
    to be assumd.

  13. Covariance Eigenvector Sparsity for Compression and Denoising.

    Authors: Georgios B. Giannakis, Ioannis D. Schizas
    Subjects: Applications
    Abstract

    Sparsity in the eigenvectors of signal covariance matrices is exploited in
    this paper for compression and denoising. Dimensionality reduction (DR) and
    quantization modules present in many practical compression schemes such as
    transform codecs, are designed to capitalize on this form of sparsity and
    achieve improved reconstruction performance compared to existing
    sparsity-agnostic codecs.

  14. Comment on `Large-scale dynamic gene regulatory network inference combining differential equation models with local dynamic Bayesian network analysis'.

    Authors: Sach Mukherjee, Chris. J. Oates, Steven. M. Hill
    Subjects: Applications
    Abstract

    Recently, Li et al. (Bioinformatics 27(19), 2686-91, 2011) proposed a method,
    called Differential Equation-based Local Dynamic Bayesian Network (DELDBN), for
    reverse engineering gene regulatory networks from time-course data. We commend
    the authors for an interesting paper that draws attention to the close
    relationship between dynamic Bayesian networks (DBNs) and differential
    equations (DEs). Their central claim is that modifying a DBN to model Euler
    approximations to the gradient rather than expression levels themselves is
    beneficial for network inference.

  15. Futures pricing in electricity markets based on stable CARMA spot models.

    Authors: Claudia Kl&#xfc;ppelberg, Fred Espen Benth, Gernot M&#xfc;ller, Linda Vos
    Subjects: Applications
    Abstract

    We present a new model for the electricity spot price dynamics, which is able
    to capture seasonality, low-frequency dynamics and the extreme spikes in the
    market. Instead of the usual purely deterministic trend we introduce a
    non-stationary independent increments process for the low-frequency dynamics,
    and model the large fluctuations by a non-Gaussian stable CARMA process. The
    model allows for analytic futures prices, and we apply these to model and
    estimate the whole market consistently.

  16. Multidimensional Wavelet-based Regularized Reconstruction for Parallel Acquisition in Neuroimaging.

    Authors: Lotfi Chaari, Jean-Christophe Pesquet, Philippe Ciuciu, S&#xe9;bastien M&#xe9;riaux, Solveig Badillo
    Subjects: Applications
    Abstract

    Parallel MRI is a fast imaging technique that enables the acquisition of
    highly resolved images in space or/and in time. The performance of parallel
    imaging strongly depends on the reconstruction algorithm, which can proceed
    either in the original k-space (GRAPPA, SMASH) or in the image domain
    (SENSE-like methods). To improve the performance of the widely used SENSE
    algorithm, 2D- or slice-specific regularization in the wavelet domain has been
    deeply investigated.

  17. Economic Determinants of Happiness.

    Authors: Teng Guo, Lingyi Hu
    Subjects: Applications
    Abstract

    Many scholars have recently begun to dispute the assumed link between
    individual wellbeing and economic conditions and the extent to which the latter
    matters (Easterlin, 1995; Stevenson and Wolfers 2008; Tella and MacCulloch
    2008). This dilemma is empirically demonstrated in the Latin America Public
    Opinion Project (LAPOP, 2011), which surveyed North and Latin America in terms
    of perceived life satisfaction. Higher measures found in the less developed
    countries of Brazil, Costa Rica, and Panama than in North America pose an
    intriguing quandary to traditional economic theory.

  18. BATMAN-an R package for the automated quantification of metabolites from NMR spectra using a Bayesian Model.

    Authors: Maria De Iorio, William Astle, Timothy Ebbels, Jie Hao
    Subjects: Applications
    Abstract

    Motivation: NMR spectra are widely used in metabolomics to obtain metabolite
    profiles in complex biological mixtures. Common methods used to assign and
    estimate concentrations of metabolite involve either an expert manual peak
    fitting or extra pre-processing steps, such as peak alignment and binning. Peak
    fitting is very time consuming and is subject to human error. Conversely,
    alignment and binning can introduce artifacts and limit immediate biological
    interpretation of models.

  19. Variational approximation for mixtures of linear mixed models.

    Authors: David J. Nott, Siew Li Tan
    Subjects: Applications
    Abstract

    Mixtures of linear mixed models (MLMMs) are useful for clustering grouped
    data in applications such as gene expression time course experiments. These
    models can be estimated by likelihood maximization through the EM algorithm and
    the optimal number of components determined by comparing different mixture
    models using penalized log-likelihood criteria such as BIC. In this paper, we
    propose fitting MLMMs with variational methods which can perform parameter
    estimation and model selection simultaneously.

  20. Spatially-explicit models for inference about density in unmarked populations.

    Authors: Richard B. Chandler, J. Andrew Royle
    Subjects: Applications
    Abstract

    Spatial capture-recapture (SCR) methods represent a major advance over
    traditional capture-capture methods because they yield explicit estimates of
    animal density instead of population size within an unknown area, and they
    account for heterogeneity in capture probability arising from the juxtaposition
    of individuals and sample locations. However, the requirement that all
    individuals can be uniquely identified excludes their use in many contexts.

  21. A Bayesian Joinpoint regression model with an unknown number of break-points.

    Authors: Miguel A. Martinez-Beneito, Gonzalo Garc&#xed;a-Donato, Diego Salmer&#xf3;n
    Subjects: Applications
    Abstract

    Joinpoint regression is used to determine the number of segments needed to
    adequately explain the relationship between two variables. This methodology can
    be widely applied to real problems, but we focus on epidemiological data, the
    main goal being to uncover changes in the mortality time trend of a specific
    disease under study. Traditionally, Joinpoint regression problems have paid
    little or no attention to the quantification of uncertainty in the estimation
    of the number of change-points. In this context, we found a satisfactory way to
    handle the problem in the Bayesian methodology.

  22. Multivariate integer-valued autoregressive models applied to earthquake counts.

    Authors: Mathieu Boudreault, Arthur Charpentier
    Subjects: Applications
    Abstract

    In various situations in the insurance industry, in finance, in epidemiology,
    etc., one needs to represent the joint evolution of the number of occurrences
    of an event. In this paper, we present a multivariate integer-valued
    autoregressive (MINAR) model, derive its properties and apply the model to
    earthquake occurrences across various pairs of tectonic plates. The model is an
    extension of Pedelis & Karlis (2011) where cross autocorrelation (spatial
    contagion in a seismic context) is considered.

  23. Weighted KS Statistics for Inference on Conditional Moment Inequalities.

    Authors: Timothy B. Armstrong
    Subjects: Applications
    Abstract

    This paper proposes confidence regions for the identified set in conditional
    moment inequality models using Kolmogorov-Smirnov statistics with a truncated
    inverse variance weighting with increasing truncation points. The new weighting
    differs from those proposed in the literature in two important ways. First,
    confidence regions based on KS tests with the weighting function I propose
    converge to the identified set at a faster rate than existing procedures based
    on bounded weight functions in a broad class of models.

  24. Asymptotically Exact Inference in Conditional Moment Inequality Models.

    Authors: Timothy B. Armstrong
    Subjects: Applications
    Abstract

    This paper derives the rate of convergence and asymptotic distribution for a
    class of Kolmogorov-Smirnov style test statistics for conditional moment
    inequality models for parameters on the boundary of the identified set under
    general conditions. In contrast to other moment inequality settings, the rate
    of convergence is faster than root-$n$, and the asymptotic distribution depends
    entirely on nonbinding moments. The results require the development of new
    techniques that draw a connection between moment selection, irregular
    identification, bandwidth selection and nonstandard M-estimation.

  25. Network Inference and Biological Dynamics.

    Authors: Chris J. Oates, Sach Mukherjee
    Subjects: Applications
    Abstract

    Network inference approaches are now widely used in biological applications
    to probe regulatory relationships between molecular components such as genes or
    proteins. Many methods have been proposed for this setting, but the connections
    and differences between their statistical formulations have received less
    attention. In this paper, we show how a broad class of statistical network
    inference methods, including a number of existing approaches, can be described
    in terms of variable selection for the linear model.

  26. Spatio-temporal Compressed Sensing with Coded Apertures and Keyed Exposures.

    Authors: Zachary T. Harmany, Roummel F. Marcia, Rebecca M. Willett
    Subjects: Applications
    Abstract

    Optical systems which measure independent random projections of a scene
    according to compressed sensing (CS) theory face a myriad of practical
    challenges related to the size of the physical platform, photon efficiency, the
    need for high temporal resolution, and fast reconstruction in video settings.
    This paper describes a coded aperture and keyed exposure approach to
    compressive measurement in optical systems.

  27. Semiparametric modeling of autonomous nonlinear dynamical systems with application to plant growth.

    Authors: Prabir Burman, Jie Peng, Debashis Paul
    Subjects: Applications
    Abstract

    We propose a semiparametric model for autonomous nonlinear dynamical systems
    and devise an estimation procedure for model fitting. This model incorporates
    subject-specific effects and can be viewed as a nonlinear semiparametric mixed
    effects model. We also propose a computationally efficient model selection
    procedure. We show by simulation studies that the proposed estimation as well
    as model selection procedures can efficiently handle sparse and noisy
    measurements.

  28. Spatial modeling of extreme snow depth.

    Authors: Anthony C. Davison, Juliette Blanchet
    Subjects: Applications
    Abstract

    The spatial modeling of extreme snow is important for adequate risk
    management in Alpine and high altitude countries. A natural approach to such
    modeling is through the theory of max-stable processes, an infinite-dimensional
    extension of multivariate extreme value theory. In this paper we describe the
    application of such processes in modeling the spatial dependence of extreme
    snow depth in Switzerland, based on data for the winters 1966--2008 at 101
    stations.

  29. Efficient methods for sampling spike trains in networks of coupled neurons.

    Authors: Liam Paninski, Yuriy Mishchenko
    Subjects: Applications
    Abstract

    Monte Carlo approaches have recently been proposed to quantify connectivity
    in neuronal networks. The key problem is to sample from the conditional
    distribution of a single neuronal spike train, given the activity of the other
    neurons in the network. Dependencies between neurons are usually relatively
    weak; however, temporal dependencies within the spike train of a single neuron
    are typically strong. In this paper we develop several specialized
    Metropolis--Hastings samplers which take advantage of this dependency
    structure.

  30. On Bayesian "central clustering": Application to landscape classification of Western Ghats.

    Authors: Sabyasachi Mukhopadhyay, Sourabh Bhattacharya, Kajal Dihidar
    Subjects: Applications
    Abstract

    Landscape classification of the well-known biodiversity hotspot, Western
    Ghats (mountains), on the west coast of India, is an important part of a
    world-wide program of monitoring biodiversity. To this end, a massive
    vegetation data set, consisting of 51,834 4-variate observations has been
    clustered into different landscapes by Nagendra and Gadgil [Current Sci. 75
    (1998) 264--271]. But a study of such importance may be affected by
    nonuniqueness of cluster analysis and the lack of methods for quantifying
    uncertainty of the clusterings obtained.

  31. A space--time varying coefficient model: The equity of service accessibility.

    Authors: Nicoleta Serban
    Subjects: Applications
    Abstract

    Research in examining the equity of service accessibility has emerged as
    economic and social equity advocates recognized that where people live
    influences their opportunities for economic development, access to quality
    health care and political participation. In this research paper service
    accessibility equity is concerned with where and when services have been and
    are accessed by different groups of people, identified by location or
    underlying socioeconomic variables.

  32. A method for visual identification of small sample subgroups and potential biomarkers.

    Authors: Magnus Fontes, Charlotte Soneson
    Subjects: Applications
    Abstract

    In order to find previously unknown subgroups in biomedical data and generate
    testable hypotheses, visually guided exploratory analysis can be of tremendous
    importance. In this paper we propose a new dissimilarity measure that can be
    used within the Multidimensional Scaling framework to obtain a joint
    low-dimensional representation of both the samples and variables of a
    multivariate data set, thereby providing an alternative to conventional
    biplots.

  33. Estimating principal components of covariance matrices using the Nystr\"{o}m method.

    Authors: Patrick J. Wolfe, Nicholas Arcolano
    Subjects: Applications
    Abstract

    Covariance matrix estimates are an essential part of many signal processing
    algorithms, and are often used to determine a low-dimensional principal
    subspace via their spectral decomposition. However, exact eigenanalysis is
    computationally intractable for sufficiently high-dimensional matrices, and in
    the case of small sample sizes, sample eigenvalues and eigenvectors are known
    to be poor estimators of their true counterparts. To address these issues, we
    propose a covariance estimator that is computationally efficient while also
    performing shrinkage on the sample eigenvalues.

  34. Current Trends in Evolving Specialization in UK Universities.

    Authors: Fionn Murtagh
    Subjects: Applications
    Abstract

    There are very significant changes taking place in the university sector and
    in related higher education institutes in many parts of the world. In this work
    we look at financial data from 2010 and 2011 from the UK higher education
    sector. Situating ourselves to begin with in the context of teaching versus
    research in universities, we look at the data in order to explore the new
    divergence between the broad agendas of teaching and research in universities.
    The innovation agenda has become at least equal to the research and teaching
    objectives of universities.

  35. Modelling comonotonic group-life under dependent decrement causes.

    Authors: Dabuxilatu Wang
    Subjects: Applications
    Abstract

    Comonotonicity had been a extreme case of dependency between random
    variables. This article consider an extension of single life model under
    multiple dependent decrement causes to the case of comonotonic group-life.

  36. Feature selection for high-dimensional integrated data.

    Authors: Charles Zheng, Scott Schwartz, Robert Chapkin, Raymond Carroll, Ivan Ivanov
    Subjects: Applications
    Abstract

    Motivated by the problem of identifying correlations between genes or
    features of two related biological systems, we propose a model of \emph{feature
    selection} in which only a subset of the predictors $X_t$ are dependent on the
    multidimensional variate $Y$, and the remainder of the predictors constitute a
    "noise set" $X_u$ independent of $Y$.

  37. Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data.

    Authors: Jun S. Liu, Jing Zhang, Yu Zhang
    Subjects: Applications
    Abstract

    Interactions among multiple genes across the genome may contribute to the
    risks of many complex human diseases. Whole-genome single nucleotide
    polymorphisms (SNPs) data collected for many thousands of SNP markers from
    thousands of individuals under the case--control design promise to shed light
    on our understanding of such interactions. However, nearby SNPs are highly
    correlated due to linkage disequilibrium (LD) and the number of possible
    interactions is too large for exhaustive evaluation.

  38. Estimating within-household contact networks from egocentric data.

    Authors: Jr., Mark S. Handcock, Gail E. Potter, Ira M. Longini, M. Elizabeth Halloran
    Subjects: Applications
    Abstract

    Acute respiratory diseases are transmitted over networks of social contacts.
    Large-scale simulation models are used to predict epidemic dynamics and
    evaluate the impact of various interventions, but the contact behavior in these
    models is based on simplistic and strong assumptions which are not informed by
    survey data. These assumptions are also used for estimating transmission
    measures such as the basic reproductive number and secondary attack rates.
    Development of methodology to infer contact networks from survey data could
    improve these models and estimation methods.

  39. Adverse Subpopulation Regression for Multivariate Outcomes with High-Dimensional Predictors.

    Authors: Bin Zhu, David B. Dunson, Allison E. Ashley-Koch
    Subjects: Applications
    Abstract

    Biomedical studies have a common interest in assessing relationships between
    multiple related health outcomes and high-dimensional predictors. For example,
    in reproductive epidemiology, one may collect pregnancy outcomes such as length
    of gestation and birth weight and predictors such as single nucleotide
    polymorphisms in multiple candidate genes and environmental exposures. In such
    settings, there is a need for simple yet flexible methods for selecting true
    predictors of adverse health responses from a high-dimensional set of candidate
    predictors.

  40. Generalized Admixture Mapping for Complex Traits.

    Authors: Bin Zhu, David B. Dunson, Allison E. Ashley-Koch
    Subjects: Applications
    Abstract

    Admixture mapping is a popular tool to identify regions of the genome
    associated with traits in a recently admixed population. Existing methods have
    been developed primarily for identification of a single locus influencing a
    dichotomous trait within a case-control study design. We propose a generalized
    admixture mapping (GLEAM) approach, a flexible and powerful regression method
    for both quantitative and qualitative traits, which is able to test for
    association between the trait and local ancestries in multiple loci
    simultaneously and adjust for covariates.

  41. Generalized genetic association study with samples of related individuals.

    Authors: Xin Gao, Zeny Feng, William W. L. Wong, Flavio Schenkel
    Subjects: Applications
    Abstract

    Genetic association study is an essential step to discover genetic factors
    that are associated with a complex trait of interest. In this paper we present
    a novel generalized quasi-likelihood score (GQLS) test that is suitable for a
    study with either a quantitative trait or a binary trait. We use a logistic
    regression model to link the phenotypic value of the trait to the distribution
    of allelic frequencies. In our model, the allele frequencies are treated as a
    response and the trait is treated as a covariate that allows us to leave the
    distribution of the trait values unspecified.

  42. Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects.

    Authors: Qi Long, Matthias Chung, Carlos S. Moreno, Brent A. Johnson
    Subjects: Applications
    Abstract

    In biomedical studies it is of substantial interest to develop risk
    prediction scores using high-dimensional data such as gene expression data for
    clinical endpoints that are subject to censoring. In the presence of
    well-established clinical risk factors, investigators often prefer a procedure
    that also adjusts for these clinical variables. While accelerated failure time
    (AFT) models are a useful tool for the analysis of censored outcome data, it
    assumes that covariate effects on the logarithm of time-to-event are linear,
    which is often unrealistic in practice.

  43. Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.

    Authors: Francesco C. Stingo, Yian A. Chen, Marina Vannucci, Mahlet G. Tadesse
    Subjects: Applications
    Abstract

    The vast amount of biological knowledge accumulated over the years has
    allowed researchers to identify various biochemical interactions and define
    different families of pathways. There is an increased interest in identifying
    pathways and pathway elements involved in particular biological processes. Drug
    discovery efforts, for example, are focused on identifying biomarkers as well
    as pathways related to a disease. We propose a Bayesian model that addresses
    this question by incorporating information on pathways and gene networks in the
    analysis of DNA microarray data.

  44. Cooperative Sequential Spectrum Sensing Based on Level-triggered Sampling.

    Authors: Xiaodong Wang, Yasin Yilmaz, George Moustakides
    Subjects: Applications
    Abstract

    We propose a new framework for cooperative spectrum sensing in cognitive
    radio networks, that is based on a novel class of non-uniform samplers, called
    the event-triggered samplers, and sequential detection. In the proposed scheme,
    each secondary user computes its local sensing decision statistic based on its
    own channel output; and whenever such decision statistic crosses certain
    predefined threshold values, the secondary user will send one (or several) bit
    of information to the fusion center.

  45. Towards Uncertainty Quantification and Inference in the stochastic SIR Epidemic Model.

    Authors: J. Andr&#xe9;s Christen, Marcos A. Capistr&#xe1;n, Jorge X. Velasco-Hern&#xe1;ndez
    Subjects: Applications
    Abstract

    In this paper we introduce a novel method to conduct inference with models
    defined through a continuous-time Markov process, and we apply these results to
    a classical stochastic SIR model as a case study. Using the inverse-size
    expansion of van Kampen we obtain approximations for first and second moments
    for the state variables. These approximate moments are in turn matched to the
    moments of an inputed generic discrete distribution aimed at generating an
    approximate likelihood that is valid both for low count or high count data.

  46. Fr\'echet means of curves for signal averaging and application to ECG data analysis.

    Authors: J&#xe9;r&#xe9;mie Bigot
    Subjects: Applications
    Abstract

    Signal averaging is the process that consists in computing a mean shape from
    a set of noisy signals. In the presence of geometric variability in time in the
    data, the usual Euclidean mean of the raw data yields a mean pattern that does
    not reflect the typical shape of the observed signals. In this setting, it is
    necessary to use alignment techniques for a precise synchronization of the
    signals, and then to average the aligned data to obtain a consistent mean
    shape.

  47. Analysis of a capture-recapture estimator for the size of populations with heterogenous catchability, and its evaluation on RDS data from rural Uganda.

    Authors: Yakir Berchenko, Richard G. White, Cyprian Wejnert, Simon D.W. Frost
    Subjects: Applications
    Abstract

    In this paper, we consider capture-recapture experiments with heterogenous
    catchability. In the setting we consider, the widespread Huggins-Alho estimator
    is not very suitable and we introduce and study a new generalized
    Horvitz-Thompson estimator. Our motivation is Respondent Driven Sampling (RDS),
    a prime example for such a setting where the capture probability is dependent
    on both the unknown population size as well as on an observable covariate, the
    network degree of an individual, due to peer recruitment.

  48. The potential for bias in principal causal effect estimation when treatment received depends on a key covariate.

    Authors: Corwin M. Zigler, Thomas R. Belin
    Subjects: Applications
    Abstract

    Motivated by a potential-outcomes perspective, the idea of principal
    stratification has been widely recognized for its relevance in settings
    susceptible to posttreatment selection bias such as randomized clinical trials
    where treatment received can differ from treatment assigned. In one such
    setting, we address subtleties involved in inference for causal effects when
    using a key covariate to predict membership in latent principal strata.

  49. Modeling item--item similarities for personalized recommendations on Yahoo! front page.

    Authors: Liang Zhang, Rahul Mazumder, Deepak Agarwal
    Subjects: Applications
    Abstract

    We consider the problem of algorithmically recommending items to users on a
    Yahoo! front page module. Our approach is based on a novel multilevel
    hierarchical model that we refer to as a User Profile Model with Graphical
    Lasso (UPG). The UPG provides a personalized recommendation to users by
    simultaneously incorporating both user covariates and historical user
    interactions with items in a model based way. In fact, we build a per-item
    regression model based on a rich set of user covariates and estimate individual
    user affinity to items by introducing a latent random vector for each user.

  50. A penalized likelihood approach to estimate within-household contact networks from egocentric data.

    Authors: Niel Hens, Gail E. Potter
    Subjects: Applications
    Abstract

    Acute infectious diseases are transmitted over networks of social contacts.
    Epidemic models are used to predict the spread of emergent pathogens and
    compare intervention strategies. Many of these models assume equal probability
    of contact within mixing groups (homes, schools, etc.), but little work has
    inferred the actual contact network, which may influence epidemic estimates. We
    develop a penalized likelihood method to infer contact networks within
    households, a key area for disease transmission.

  51. Explosive Volatility: A Model of Financial Contagion.

    Authors: Nicholas G. Polson, James G. Scott
    Subjects: Applications
    Abstract

    This paper proposes a model of financial contagion that accounts for
    explosive, mutually exciting shocks to market volatility. We fit the model
    using country-level data during the European sovereign debt crisis, which has
    its roots in the period 2008--2010, and was continuing to affect global markets
    as of October, 2011.

  52. Causal modeling and inference for electricity markets.

    Authors: Egil Ferkingstad, Anders L&#xf8;land, Mathilde Wilhelmsen
    Subjects: Applications
    Abstract

    How does dynamic price information flow among Northern European electricity
    spot prices and prices of major electricity generation fuel sources? We use
    time series models combined with new advances in causal inference to answer
    these questions. Applying our methods to weekly Nordic and German electricity
    prices, and oil, gas and coal prices, with German wind power and Nordic water
    reservoir levels as exogenous variables, we estimate a causal model for the
    price dynamics, both for contemporaneous and lagged relationships.

  53. Dynamic Bit Allocation for Object Tracking in Bandwidth Limited Sensor Networks.

    Authors: Pramod K. Varshney, Engin Masazade, Ruixin Niu
    Subjects: Applications
    Abstract

    In this paper, we study the target tracking problem in wireless sensor
    networks (WSNs) using quantized sensor measurements under limited bandwidth
    availability. At each time step of tracking, the available bandwidth $R$ needs
    to be distributed among the $N$ sensors in the WSN for the next time step. The
    optimal solution for the bandwidth allocation problem can be obtained by using
    a combinatorial search which may become computationally prohibitive for large
    $N$ and $R$.

  54. Measuring reproducibility of high-throughput experiments.

    Authors: Peter J. Bickel, James B. Brown, Haiyan Huang, Qunhua Li
    Subjects: Applications
    Abstract

    Reproducibility is essential to reliable scientific discovery in
    high-throughput experiments. In this work we propose a unified approach to
    measure the reproducibility of findings identified from replicate experiments
    and identify putative discoveries using reproducibility. Unlike the usual
    scalar measures of reproducibility, our approach creates a curve, which
    quantitatively assesses when the findings are no longer consistent across
    replicates.

  55. Modelling the impact of human activity on nitrogen dioxide concentrations in Europe.

    Authors: Gavin Shaddick, Haojie Yan, Danielle Vienneau
    Subjects: Applications
    Abstract

    Ambient concentrations of many pollutants are associated with emissions due
    to human activity, such as road transport and other combustion sources. In this
    paper we consider air pollution as a multi--level phenomenon within a Bayesian
    hierarchical model. We examine different scales of variation in pollution
    concentrations ranging from large scale transboundary effects to more localised
    effects which are directly related to human activity.

  56. Asymptotically Optimal Tests when Parameters are Estimated.

    Authors: Tewfik Lounis
    Subjects: Applications
    Abstract

    The main purpose of this paper is to provide an asymptotically optimal test.
    The proposed statistic is of Neyman-Pearson-type when the parameters are
    estimated with a particular kind of estimators. It is shown that the proposed
    estimators enable us to achieve this end. Two particular cases, AR(1) and ARCH
    models were studied and the asymptotic power function was derived.

  57. Identification of Demand through Statistical Distribution Modeling for Improved Demand Forecasting.

    Authors: Murphy Choy, Michelle L.F. Cheong
    Subjects: Applications
    Abstract

    Demand functions for goods are generally cyclical in nature with
    characteristics such as trend or stochasticity. Most existing demand
    forecasting techniques in literature are designed to manage and forecast this
    type of demand functions. However, if the demand function is lumpy in nature,
    then the general demand forecasting techniques may fail given the unusual
    characteristics of the function.

  58. Deriving the number of jobs in proximity services from the number of inhabitants in French rural municipalities.

    Authors: Sylvie Huet, Guillaume Deffuant, Maxime Lenormand
    Subjects: Applications
    Abstract

    We use a minimum requirement approach to derive the number of jobs of
    proximity services per inhabitant in a municipality from its number of
    inhabitants. We apply this approach to four different subsets of
    municipalities, each defined by a specific range of distance to the
    municipality where the inhabitants go the most frequently to get services
    (called MFM). For each subset, we get satisfactory results in regression.

  59. Likelihood Consensus-Based Distributed Particle Filtering with Distributed Proposal Density Adaptation.

    Authors: Franz Hlawatsch, Ondrej Hlinka, Petar M. Djuric
    Subjects: Applications
    Abstract

    We present a consensus-based distributed particle filter (PF) for wireless
    sensor networks. Each sensor runs a local PF to compute a global state estimate
    that takes into account the measurements of all sensors. The local PFs use the
    joint (all-sensors) likelihood function, which is calculated in a distributed
    way by a novel generalization of the likelihood consensus scheme. A performance
    improvement (or a reduction of the required number of particles) is achieved by
    a novel distributed, consensus-based method for adapting the proposal densities
    of the local PFs.

  60. Genetic Testing for Complex Diseases: a Simulation Study Perspective.

    Authors: Nguyen Xuan Vinh
    Subjects: Applications
    Abstract

    It is widely recognized nowadays that complex diseases are caused by, amongst
    the others, multiple genetic factors. The recent advent of genome-wide
    association study (GWA) has triggered a wave of research aimed at discovering
    genetic factors underlying common complex diseases. While the number of
    reported susceptible genetic variants is increasing steadily, the application
    of such findings into diseases prognosis for the general population is still
    unclear, and there are doubts about whether the size of the contribution by
    such factors is significant.

  61. Evolutionary Model of Non-Durable Markets.

    Authors: Joachim Kaldasch
    Subjects: Applications
    Abstract

    Presented is an evolutionary model of consumer non-durable markets, which is
    an extension of a previously published paper on consumer durables. The model
    suggests that the repurchase process is governed by preferential growth.
    Applying statistical methods it can be shown that in a competitive market the
    mean price declines according to an exponential law towards a natural price,
    while the corresponding price distribution is approximately given by a Laplace
    distribution for independent price decisions of the manufacturers.

  62. A new estimator for the tail-dependence coefficient.

    Authors: Marta Ferreira
    Subjects: Applications
    Abstract

    Recently, the concept of tail dependence has been discussed in financial
    applications related to market or credit risk. The multivariate extreme value
    theory is a proper tool to measure and model dependence, for example, of large
    loss events. A common measure of tail dependence is given by the so-called
    tail-dependence coefficient. We present a simple estimator of this latter that
    avoids the drawbacks of the estimation procedure that has been used so far. We
    prove strong consistency and asymptotic normality and analyze the finite sample
    behavior through simulation.

  63. Multi-Attribute Networks and the Impact of Partial Information on Inference and Characterization.

    Authors: Eric D. Kolaczyk, Natallia Katenka
    Subjects: Applications
    Abstract

    Association networks represent systems of interacting elements, where a link
    between two different elements indicates a sufficient level of similarity
    between element attributes. While in reality relational ties between elements
    can be expected to be based on similarity across multiple attributes, the vast
    majority of work to date on association networks involves ties defined with
    respect to only a single attribute.

  64. Optimal R-Estimation of a Spherical Location.

    Authors: Baba Thiam, Christophe Ley, Yvik Swan, Thomas Verdebout
    Subjects: Applications
    Abstract

    In this paper, we provide R-estimators of the location of a rotationally
    symmetric distribution on the unit sphere of $R^k$. In order to do so we ?first
    prove the local asymptotic normality property of a sequence of rotationally
    symmetric models; this is a non standard result due to the curved nature of the
    unit sphere. We then construct our estimators by adapting the Le Cam one-step
    methodology to spherical statistics and ranks. We show that they are
    asymptotically normal under any rotationally symmetric distribution and achieve
    the efficiency bound under a specific density.

  65. Hierarchical Bayesian modelling of the electricity load.

    Authors: Anne Philippe, Tristan Launay, Sophie Lamarche
    Subjects: Applications
    Abstract

    In this paper, we study a non-linear model used to estimate and forecast the
    electricity load, that usually requires four or more years worth of data to
    avoid any overfitting phenomenon. We first propose a non-informative prior to
    be used when the number of observations is large enough. When the observations
    are too few, we propose a hierarchical prior to include information coming from
    another bigger, similar, sample. The posterior densities associated with these
    two priors are derived and a MCMC algorithm is provided in each case.

  66. Estimation and Selection for Topic Models.

    Authors: Matthew A. Taddy
    Subjects: Applications
    Abstract

    Topic modeling is a mixed-membership framework for dimension reduction that
    is widely applied in text-mining, among other areas. This article describes an
    algorithm for posterior maximization under such models, identifying
    computational and conceptual gains that come from working with an alternative
    model parameterization. We then show that fitted parameters can be used as the
    basis for a novel approach to marginal likelihood estimation, founded on
    block-diagonal approximation to the information matrix, that facilitates
    choosing the number of latent topics.

  67. Pricing Weather Derivatives for Extreme Events.

    Authors: Richard L. Smith, Robert J. Erhardt
    Subjects: Applications
    Abstract

    We consider pricing weather derivatives for use as protection against weather
    extremes. The method described utilizes results from spatial statistics and
    extreme value theory to first model extremes in the weather as a max-stable
    process, and then use these models to simulate payments for a general
    collection of weather derivatives. These simulations capture the spatial
    dependence of payments. Incorporating results from catastrophe ratemaking, we
    show how this method can be used to compute risk loads and premiums for weather
    derivatives which are renewal-additive.

  68. Sensor Management: Past, Present, and Future.

    Authors: Douglas Cochran, Alfred O. Hero III
    Subjects: Applications
    Abstract

    Sensor systems typically operate under resource constraints that prevent the
    simultaneous use of all resources all of the time. Sensor management becomes
    relevant when the sensing system has the capability of actively managing these
    resources; i.e., changing its operating configuration during deployment in
    reaction to previous measurements. Examples of systems in which sensor
    management is currently used or is likely to be used in the near future include
    autonomous robots, surveillance and reconnaissance networks, and waveform-agile
    radars.

  69. Reverse engineering gene regulatory networks using approximate Bayesian computation.

    Authors: Andrea Rau, Florence Jaffr&#xe9;zic, Jean-Louis Foulley, R.W. Doerge
    Subjects: Applications
    Abstract

    Gene regulatory networks are collections of genes that interact with one
    other and with other substances in the cell. By measuring gene expression over
    time using high-throughput technologies, it may be possible to reverse
    engineer, or infer, the structure of the gene network involved in a particular
    cellular process.

  70. Likelihood Consensus and Its Application to Distributed Particle Filtering.

    Authors: Franz Hlawatsch, Ondrej Hlinka, Ondrej Sluciak, Petar M. Djuric, Markus Rupp
    Subjects: Applications
    Abstract

    We consider distributed state estimation in a wireless sensor network without
    a fusion center. Each sensor performs a global estimation task - based on the
    past and current measurements of all sensors - using only local processing and
    local communications with its neighbors. In this task, the joint (all-sensors)
    likelihood function (JLF) plays a central role as it epitomizes the
    measurements of all sensors. We propose a distributed method for computing an
    approximation of the JLF by means of consensus algorithms.

  71. A shrinkage probability hypothesis density filter for multitarget tracking.

    Authors: Hao Zhang, Huadong Meng, Xiqin Wang, Huisi Tong
    Subjects: Applications
    Abstract

    In radar systems, tracking targets in low signal-to-noise ratio (SNR)
    environments is a very important task. There are some algorithms designed for
    multitarget tracking. Their performances, however, are not satisfactory in low
    SNR environments. Track-before-detect (TBD) algorithms have been developed as a
    class of improved methods for tracking in low SNR environments. However,
    multitarget TBD is still an open issue. In this paper, multitarget TBD
    measurements are modeled, and a highly efficient filter in the framework of
    finite set statistics (FISST) is designed.

  72. Off-grid Direction of Arrival Estimation Using Sparse Bayesian Inference.

    Authors: Zai Yang, Lihua Xie, Cishen Zhang
    Subjects: Applications
    Abstract

    This paper is focused on solving the narrowband direction of arrival
    estimation problem from a sparse signal reconstruction perspective. Existing
    sparsity-based methods have shown advantages over conventional ones but exhibit
    limitations in practical situations where the true directions are not in the
    sampling grid. A so-called off-grid model is broached to reduce the modeling
    error caused by the off-grid directions.

  73. Biosensor Arrays for Estimating Molecular Concentration in Fluid Flows.

    Authors: Vikram Krishnamurthy, Maryam Abolfath-Beygi
    Subjects: Applications
    Abstract

    This paper constructs dynamical models and estimation algorithms for the
    concentration of target molecules in a fluid flow using an array of novel
    biosensors. Each biosensor is constructed out of protein molecules embedded in
    a synthetic cell membrane. The concentration evolves according to an
    advection-diffusion partial differential equation which is coupled with
    chemical reaction equations on the biosensor surface.

  74. A sentiment analysis of Singapore Presidential Election 2011 using Twitter data with census correction.

    Authors: Murphy Choy, Michelle L.F. Cheong, Ma Nang Laik, Koo Ping Shung
    Subjects: Applications
    Abstract

    Sentiment analysis is a new area in text analytics where it focuses on the
    analysis and understanding of the emotions from the text patterns. This new
    form of analysis has been widely adopted in customer relation management
    especially in the context of complaint management. With increasing level of
    interest in this technology, more and more companies are adopting it and using
    it to champion their marketing efforts. However, sentiment analysis using
    twitter has remained extremely difficult to manage due to the sampling bias.

  75. Correlation Angles and Inner Products: Application to a Problem from Physics.

    Authors: Jonathan Pakianathan, David H. Douglass, Adam Towsley
    Subjects: Applications
    Abstract

    Covariance is used as an inner product on a formal vector space built on n
    random variables to define measures of correlation Md across a set of vectors
    in a d-dimensional space. For d = 1, one has the diameter; for d = 2, one has
    an area. These concepts are directly applied to correlation studies in climate
    science.

  76. Variable Selection and Sensitivity Analysis via Dynamic Trees with an Application to Computer Code Performance Tuning.

    Authors: Robert B. Gramacy, Matthew A. Taddy, Stefan M. Wild
    Subjects: Applications
    Abstract

    We show how the newly developed dynamic tree model can support variable
    selection and a sensitivity analysis of inputs, two tasks usually requiring
    disparate model structure. To this end, we adapt methods used in conjunction
    with static tree models and Gaussian process models (GPs).

  77. Designing Attractive Models via Automated Identification of Chaotic and Oscillatory Dynamical Regimes.

    Authors: Tina Toni, Michael P.H. Stumpf, Daniel SIlk, Paul D.W. Kirk, Chris P. Barnes, Anna Rose, Simon Moon, Margaret J. Dallman
    Subjects: Applications
    Abstract

    Chaos and oscillations continue to capture the interest of both the
    scientific and public domains. Yet despite the importance of these qualitative
    features, most attempts at constructing mathematical models of such phenomena
    have taken an indirect, quantitative approach, e.g. by fitting models to a
    finite number of data-points. Here we develop a qualitative inference framework
    that allows us to both reverse engineer and design systems exhibiting these and
    other dynamical behaviours by directly specifying the desired characteristics
    of the underlying dynamical attractor.

  78. Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data.

    Authors: Jeffrey S. Morris, Veerabhadran Baladandayuthapani, Richard C. Herrick, Pietro Sanna, Howard Gutstein
    Subjects: Applications
    Abstract

    Image data are increasingly encountered and are of growing importance in many
    areas of science. Much of these data are quantitative image data, which are
    characterized by intensities that represent some measurement of interest in the
    scanned images. The data typically consist of multiple images on the same
    domain and the goal of the research is to combine the quantitative information
    across images to make inference about populations or interventions.

  79. The generalized shrinkage estimator for the analysis of functional connectivity of brain signals.

    Authors: Mark Fiecas, Hernando Ombao
    Subjects: Applications
    Abstract

    We develop a new statistical method for estimating functional connectivity
    between neurophysiological signals represented by a multivariate time series.
    We use partial coherence as the measure of functional connectivity. Partial
    coherence identifies the frequency bands that drive the direct linear
    association between any pair of channels. To estimate partial coherence, one
    would first need an estimate of the spectral density matrix of the multivariate
    time series.

  80. An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies.

    Authors: Jia Li, George C. Tseng
    Subjects: Applications
    Abstract

    Global expression analyses using microarray technologies are becoming more
    common in genomic research, therefore, new statistical challenges associated
    with combining information from multiple studies must be addressed. In this
    paper we will describe our proposal for an adaptively weighted (AW) statistic
    to combine multiple genomic studies for detecting differentially expressed
    genes. We will also present our results from comparisons of our proposed AW
    statistic to Fisher's equally weighted (EW), Tippett's minimum $p$-value (minP)
    and Pearson's (PR) statistics.

  81. Detecting simultaneous variant intervals in aligned sequences.

    Authors: Nancy R. Zhang, David Siegmund, Benjamin Yakir
    Subjects: Applications
    Abstract

    Given a set of aligned sequences of independent noisy observations, we are
    concerned with detecting intervals where the mean values of the observations
    change simultaneously in a subset of the sequences. The intervals of changed
    means are typically short relative to the length of the sequences, the subset
    where the change occurs, the "carriers," can be relatively small, and the sizes
    of the changes can vary from one sequence to another. This problem is motivated
    by the scientific problem of detecting inherited copy number variants in
    aligned DNA samples.

  82. Adaptive Markov Chain Monte Carlo Forward Simulation for Statistical Analysis in Epidemic Modelling of Human Papillomavirus.

    Authors: Julien Cornebise, Gareth W. Peters, Igor A. Korostil, David G. Regan
    Subjects: Applications
    Abstract

    We develop a Bayesian statistical model and estimation methodology based on
    Forward Projection Adaptive Markov chain Monte Carlo in order to perform the
    calibration of a high-dimensional non-linear system of Ordinary Differential
    Equations representing an epidemic model for Human Papillomavirus types 6 and
    11 (HPV-6, HPV-11). The model is compartmental and involves stratification by
    age, gender and sexual activity-group.

  83. Partition Decomposition for Roll Call Data.

    Authors: Scott Pauls, Daniel N. Rockmore, Greg Leibon, Robert Savell
    Subjects: Applications
    Abstract

    In this paper we bring to bear some new tools from statistical learning on
    the analysis of roll call data. We present a new data-driven model for roll
    call voting that is geometric in nature. We construct the model by adapting the
    "Partition Decoupling Method," an unsupervised learning technique originally
    developed for the analysis of families of time series, to produce a multiscale
    geometric description of a weighted network associated to a set of roll call
    votes.

  84. On the visualisation, verification and recalibration of ternary probabilistic forecasts.

    Authors: Tim E. Jupp, Rachel Lowe, Caio A.S. Coelho, David B. Stephenson
    Subjects: Applications
    Abstract

    We develop a geometrical interpretation of ternary probabilistic forecasts in
    which forecasts and observations are regarded as points inside a triangle.
    Within the triangle, we define a continuous colour palette in which hue and
    colour saturation are defined with reference to the observed climatology. In
    contrast to current methods, forecast maps created with this colour scheme
    convey all of the information present in each ternary forecast.

  85. Latent Protein Trees.

    Authors: Ricardo Henao, J. Will Thompson, M. Arthur Moseley, Geoffrey S. Ginsburg, Lawrence Carin, Joseph E. Lucas
    Subjects: Applications
    Abstract

    Unbiased, label-free proteomics is becoming a powerful technique for
    measuring protein expression in almost any biological sample. The output of
    these measurements after preprocessing are a collection of features (10's to
    100's of thousands) and their associated intensities for each sample. Subsets
    of features within the data are from the same peptide, subsets of peptides are
    from the same protein, and subsets of proteins are in the same biological
    pathways, therefore there is the potential for very complex and informative
    correlational structure inherent in this data.

  86. Two-stage empirical likelihood for longitudinal neuroimaging data.

    Authors: Joseph G. Ibrahim, Xiaoyan Shi, Jeffrey Lieberman, Martin Styner, Yimei Li, Hongtu Zhu
    Subjects: Applications
    Abstract

    Longitudinal imaging studies are essential to understanding the neural
    development of neuropsychiatric disorders, substance use disorders, and the
    normal brain. The main objective of this paper is to develop a two-stage
    adjusted exponentially tilted empirical likelihood (TETEL) for the spatial
    analysis of neuroimaging data from longitudinal studies. The TETEL method as a
    frequentist approach allows us to efficiently analyze longitudinal data without
    modeling temporal correlation and to classify different time-dependent
    covariate types.

  87. A simple and objective method for reproducible resting state network (RSN) detection in fMRI.

    Authors: Gautam V. Pendse, David Borsook, Lino Becerra
    Subjects: Applications
    Abstract

    Spatial Independent Component Analysis (ICA) decomposes the time by space
    functional MRI (fMRI) matrix into a set of 1-D basis time courses and their
    associated 3-D spatial maps that are optimized for mutual independence. When
    applied to resting state fMRI (rsfMRI), ICA produces several spatial
    independent components (ICs) that seem to have biological relevance - the
    so-called resting state networks (RSNs). The ICA problem is well posed when the
    true data generating process follows a linear mixture of ICs model in terms of
    the identifiability of the mixing matrix.

  88. Missing data in value-added modeling of teacher effects.

    Authors: Daniel F. McCaffrey, J. R. Lockwood
    Subjects: Applications
    Abstract

    The increasing availability of longitudinal student achievement data has
    heightened interest among researchers, educators and policy makers in using
    these data to evaluate educational inputs, as well as for school and possibly
    teacher accountability. Researchers have developed elaborate "value-added
    models" of these longitudinal data to estimate the effects of educational
    inputs (e.g., teachers or schools) on student achievement while using prior
    achievement to adjust for nonrandom assignment of students to schools and
    classes.

  89. Confounding of three binary-variables counterfactual model.

    Authors: Jingwei Liu, Shuang Hu
    Subjects: Applications
    Abstract

    Confounding of three binary-variables counterfactual model is discussed in
    this paper. According to the effect between the control variable and the
    covariate variable, we investigate three counterfactual models: the control
    variable is independent of the covariate variable, the control variable has the
    effect on the covariate variable and the covariate variable affects the control
    variable.

  90. Graph Classification using Signal Subgraphs: Applications in Statistical Connectomics.

    Authors: Joshua T. Vogelstein, William R. Gray, R. Jacob Vogelstein, Carey E. Priebe
    Subjects: Applications
    Abstract

    This manuscript considers the following "graph classification" question:
    given a collection of graphs and associated classes, how can one predict the
    class of a newly observed graph? To address this question we propose a
    statistical model for graph/class pairs. This model naturally leads to a set of
    estimators to identify the class-conditional signal, or "signal subgraph,"
    defined as the collection of edges that are probabilistically different between
    the classes.

  91. Variance estimation for nearest neighbor imputation for US Census long form data.

    Authors: Jae Kwang Kim, Wayne A. Fuller, William R. Bell
    Subjects: Applications
    Abstract

    Variance estimation for estimators of state, county, and school district
    quantities derived from the Census 2000 long form are discussed. The variance
    estimator must account for (1) uncertainty due to imputation, and (2) raking to
    census population controls.

  92. Mean--variance portfolio optimization when means and covariances are unknown.

    Authors: Tze Leung Lai, Haipeng Xing, Zehao Chen
    Subjects: Applications
    Abstract

    Markowitz's celebrated mean--variance portfolio optimization theory assumes
    that the means and covariances of the underlying asset returns are known.

  93. The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model.

    Authors: Dylan S. Small, Xu Han, Dean P. Foster, Vishal Patel
    Subjects: Applications
    Abstract

    We study the causal effect of winning an Oscar Award on an actor or actress's
    survival. Does the increase in social rank from a performer winning an Oscar
    increase the performer's life expectancy? Previous studies of this issue have
    suffered from healthy performer survivor bias, that is, candidates who are
    healthier will be able to act in more films and have more chance to win Oscar
    Awards. To correct this bias, we adapt Robins' rank preserving structural
    accelerated failure time model and $g$-estimation method.

  94. Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data.

    Authors: Hongyu Zhao, Ruiyan Luo
    Subjects: Applications
    Abstract

    Recent technological advances have made it possible to simultaneously measure
    multiple protein activities at the single cell level. With such data collected
    under different stimulatory or inhibitory conditions, it is possible to infer
    the causal relationships among proteins from single cell interventional data.
    In this article we propose a Bayesian hierarchical modeling framework to infer
    the signaling pathway based on the posterior distributions of parameters in the
    model.

  95. The mortality of the Italian population: Smoothing techniques on the Lee--Carter model.

    Authors: Valeria D&#x27;Amato, Gabriella Piscopo, Maria Russolillo
    Subjects: Applications
    Abstract

    Several approaches have been developed for forecasting mortality using the
    stochastic model. In particular, the Lee-Carter model has become widely used
    and there have been various extensions and modifications proposed to attain a
    broader interpretation and to capture the main features of the dynamics of the
    mortality intensity.

  96. Quantum Monte Carlo simulation.

    Authors: Yazhen Wang
    Subjects: Applications
    Abstract

    Contemporary scientific studies often rely on the understanding of complex
    quantum systems via computer simulation. This paper initiates the statistical
    study of quantum simulation and proposes a Monte Carlo method for estimating
    analytically intractable quantities. We derive the bias and variance for the
    proposed Monte Carlo quantum simulation estimator and establish the asymptotic
    theory for the estimator. The theory is used to design a computational scheme
    for minimizing the mean square error of the estimator.

  97. Point process modeling of wildfire hazard in Los Angeles County, California.

    Authors: Haiyong Xu, Frederic Paik Schoenberg
    Subjects: Applications
    Abstract

    The Burning Index (BI) produced daily by the United States government's
    National Fire Danger Rating System is commonly used in forecasting the hazard
    of wildfire activity in the United States. However, recent evaluations have
    shown the BI to be less effective at predicting wildfires in Los Angeles
    County, compared to simple point process models incorporating similar
    meteorological information.

  98. Bayesian Synthesis: Combining subjective analyses, with an application to ozone data.

    Authors: Qingzhao Yu, Steven N. MacEachern, Mario Peruggia
    Subjects: Applications
    Abstract

    Bayesian model averaging enables one to combine the disparate predictions of
    a number of models in a coherent fashion, leading to superior predictive
    performance. The improvement in performance arises from averaging models that
    make different predictions. In this work, we tap into perhaps the biggest
    driver of different predictions---different analysts---in order to gain the
    full benefits of model averaging.

  99. Hierarchical Bayesian estimation of inequality measures with nonrectangular censored survey data with an application to wealth distribution of French households.

    Authors: Eric Gautier
    Subjects: Applications
    Abstract

    We consider the estimation of wealth inequality measures with their
    confidence interval, based on survey data with interval censoring. We rely on a
    Bayesian hierarchical model. It consists of a model where, due to survey
    sampling and unit nonresponse, the summaries of the wealth distribution of
    households are observed with error; a mixture of multivariate models for the
    wealth components where groups correspond to portfolios of assets; and a prior
    on the parameters. A Gibbs sampler is used for numerical purposes to do the
    inference. We apply this strategy to the French 2004 Wealth Survey.

  100. Response-adaptive dose-finding under model uncertainty.

    Authors: Holger Dette, Bj&#xf6;rn Bornkamp, Frank Bretz, Jos&#xe9; Pinheiro
    Subjects: Applications
    Abstract

    Dose-finding studies are frequently conducted to evaluate the effect of
    different doses or concentration levels of a compound on a response of
    interest. Applications include the investigation of a new medicinal drug, a
    herbicide or fertilizer, a molecular entity, an environmental toxin, or an
    industrial chemical. In pharmaceutical drug development, dose-finding studies
    are of critical importance because of regulatory requirements that marketed
    doses are safe and provide clinically relevant efficacy.

  101. Assessment of synchrony in multiple neural spike trains using loglinear point process models.

    Authors: Robert E. Kass, Wei-Liem Loh, Ryan C. Kelly
    Subjects: Applications
    Abstract

    Neural spike trains, which are sequences of very brief jumps in voltage
    across the cell membrane, were one of the motivating applications for the
    development of point process methodology. Early work required the assumption of
    stationarity, but contemporary experiments often use time-varying stimuli and
    produce time-varying neural responses. More recently, many statistical methods
    have been developed for nonstationary neural point process data.

  102. Degradation modeling applied to residual lifetime prediction using functional data analysis.

    Authors: Rensheng R. Zhou, Nicoleta Serban, Nagi Gebraeel
    Subjects: Applications
    Abstract

    Sensor-based degradation signals measure the accumulation of damage of an
    engineering system using sensor technology. Degradation signals can be used to
    estimate, for example, the distribution of the remaining life of partially
    degraded systems and/or their components. In this paper we present a
    nonparametric degradation modeling framework for making inference on the
    evolution of degradation signals that are observed sparsely or over short
    intervals of times.

  103. Testing the isotropy of high energy cosmic rays using spherical needlets.

    Authors: Dominique Picard, Gilles Fa&#xff;, Jacques Delabrouille, G&#xe9;rard Kerkyacharyan
    Subjects: Applications
    Abstract

    For many decades, ultra-high energy charged particles have been a puzzle for
    particle physicists and astrophysicists. Nor the sites of production, nor the
    mechanism responsible for the generation of these ultra-energetic `cosmic rays'
    (CR) are currently known. They seem to arrive from random direction in the sky,
    although the most energetic ones, which are not deflected much by the magnetic
    fields, are supposed to point towards their source with good accuracy.

  104. Discussion of "Network routing in a dynamic environment".

    Authors: Stephen E. Fienberg, Andrew C. Thomas
    Subjects: Applications
    Abstract

    Discussion of "Network routing in a dynamic environment" by N.D. Singpurwalla
    [arXiv:1107.4852]

  105. A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data.

    Authors: Yuriy Mishchencko, Joshua T. Vogelstein, Liam Paninski
    Subjects: Applications
    Abstract

    Deducing the structure of neural circuits is one of the central problems of
    modern neuroscience. Recently-introduced calcium fluorescent imaging methods
    permit experimentalists to observe network activity in large populations of
    neurons, but these techniques provide only indirect observations of neural
    spike trains, with limited time resolution and signal quality. In this work we
    present a Bayesian approach for inferring neural circuitry given this type of
    imaging data.

  106. State-space solutions to the dynamic magnetoencephalography inverse problem using high performance computing.

    Authors: Christopher J. Long, Patrick L. Purdon, Simona Temereanca, Neil U. Desai, Matti S. H&#xe4;m&#xe4;l&#xe4;inen, Emery N. Brown
    Subjects: Applications
    Abstract

    Determining the magnitude and location of neural sources within the brain
    that are responsible for generating magnetoencephalography (MEG) signals
    measured on the surface of the head is a challenging problem in functional
    neuroimaging. The number of potential sources within the brain exceeds by an
    order of magnitude the number of recording sites. As a consequence, the
    estimates for the magnitude and location of the neural sources will be
    ill-conditioned because of the underdetermined nature of the problem.

  107. A nonstationary nonparametric Bayesian approach to dynamically modeling effective connectivity in functional magnetic resonance imaging experiments.

    Authors: Sourabh Bhattacharya, Ranjan Maitra
    Subjects: Applications
    Abstract

    Effective connectivity analysis provides an understanding of the functional
    organization of the brain by studying how activated regions influence one
    other. We propose a nonparametric Bayesian approach to model effective
    connectivity assuming a dynamic nonstationary neuronal system. Our approach
    uses the Dirichlet process to specify an appropriate (most plausible according
    to our prior beliefs) dynamic model as the "expectation" of a set of plausible
    models upon which we assign a probability distribution. This addresses model
    uncertainty associated with dynamic effective connectivity.

  108. Bernoulli Runs: Using "Book Cricket" to Evaluate Cricketers.

    Authors: Anand Ramalingam
    Subjects: Applications
    Abstract

    This paper proposes a simple method to evaluate batsmen and bowlers in
    cricket. The idea in this paper refines "book cricket" and evaluates a batsman
    by answering the question: How many runs a team consisting of same player
    replicated eleven times will score?

  109. Variance Decomposition and Replication In Scrabble: When You Can Blame Your Tiles?.

    Authors: Andrew C. Thomas
    Subjects: Applications
    Abstract

    In the game of Scrabble, letter tiles are drawn uniformly at random from a
    bag. The variability of possible draws as the game progresses is a source of
    variation that makes it more likely for an inferior player to win a
    head-to-head match against a superior player, and more difficult to determine
    the true ability of a player in a tournament or contest.

  110. Comparison of SCIPUFF Plume Prediction with Particle Filter Assimilated Prediction for Dipole Pride 26 Data.

    Authors: Gabriel Terejanu, Yang Cheng, Tarunraj Singh, Peter D. Scott
    Subjects: Applications
    Abstract

    This paper presents the application of a particle filter for data
    assimilation in the context of puff-based dispersion models. Particle filters
    provide estimates of the higher moments, and are well suited for strongly
    nonlinear and/or non-Gaussian models. The Gaussian puff model SCIPUFF, is used
    in predicting the chemical concentration field after a chemical incident. This
    model is highly nonlinear and evolves with variable state dimension and, after
    sufficient time, high dimensionality.

  111. Characteristic Characteristics.

    Authors: Scott Pauls, Sean Brocklebank, Daniel Rockmore, Timothy C. Bates
    Subjects: Applications
    Abstract

    While five-factor models of personality are widespread, there is still not
    universal agreement on this as a structural framework. Part of the reason for
    the lingering debate is its dependence on factor analysis. In particular,
    derivation or refutation of the model via other statistical means is a
    worthwhile project.

  112. On Daryl Bem's Feeling the Future Paper.

    Authors: Akhila Raman
    Subjects: Applications
    Abstract

    It has been argued by Daryl Bem in his 2011 paper that 8 out of 9 experiments
    yielded statistically significant results in favour of the psi effect. It is
    pointed out in this short communication that many of the results in the above
    mentioned paper could be explained by using well known concepts in statistics
    such as Confidence Level and Standard Error of the Sample Mean. This short
    communication also discusses implied confidence level and confidence intervals
    in polling results.

  113. KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking.

    Authors: Patrick J. Wolfe, Daniel Rudoy, Daryush D. Mehta
    Subjects: Applications
    Abstract

    Vocal tract resonance characteristics in acoustic speech signals are
    classically tracked using frame-by-frame point estimates of formant frequencies
    followed by candidate selection and smoothing using dynamic programming methods
    that minimize ad hoc cost functions. The goal of the current work is to provide
    both point estimates and associated uncertainties of center frequencies and
    bandwidths in a statistically principled state-space framework.

  114. Statistical Distribution of Crystallographic Groups for Inorganic Crystal Structure Database.

    Authors: Miyako Fujiwara, Yoshiaki Itoh, Takeo Matsumoto, Hiroshi Takeda
    Subjects: Applications
    Abstract

    We introduce a method that defines the species (representatives) of inorganic
    compounds, and studied the statistical distribution of the defined species
    among space groups (distribution of space groups), by using ICSD (Inorganic
    Crystal Structure Database). Here we show that the number of formula units in a
    unit cell gives a natural classification to understand the statistical
    distribution of crystallographic groups.

  115. Distribution fitting 12. Sampling distribution of compounds abundance from plant species measured by instrumentation. Application to plants metabolism classification.

    Authors: Radu E. Sestra&#x15f;, Lorentz J&#xe4;tschi, Sorana D. Bolboac&#x103;
    Subjects: Applications
    Abstract

    A series of ten plant species belonging to Magnoliopsida - Dicotyledons class
    were analyzed in terms of chemical compounds distribution of abundance,
    starting from the assumption that these distributions should give a picture of
    similarities and differences between plants metabolism. From a pool of
    theoretical distributions, log-normal distribution was selected giving the best
    accuracy with the modeled phenomena and agreement with the observed data.

  116. A Markov Chain approach to determine the optimal performance period and bad definition for credit scorecard.

    Authors: Choy
    Subjects: Applications
    Abstract

    Performance period determination and bad definition for credit scorecard has
    been a mix of fortune for the typical data modeler. The lack of literature on
    these matters led to a proliferation of approaches and techniques to solve the
    problems. However, the most commonly accepted approach involves subjective
    interpretations of the performance period and bad definition as well as being
    chicken and egg problem. These complications result in poorly developed credit
    scorecard with minimal benefits to the banks.

  117. A Practical Implementation of the Bernoulli Factory.

    Authors: Andrew C. Thomas, Jose H. Blanchet
    Subjects: Applications
    Abstract

    The Bernoulli Factory is an algorithm that takes as input a series of i.i.d.
    Bernoulli random variables with an unknown but fixed success probability $p$,
    and outputs a corresponding series of Bernoulli random variables with success
    probability $f(p)$, where the function $f$ is known and defined on the interval
    $[0,1]$. While several practical uses of the method have been proposed in Monte
    Carlo applications, these require an implementation framework that is flexible,
    general and efficient.

  118. Assessment of Aortic Aneurysm Rupture Risk.

    Authors: Rafael Izbicki, Ann B. Lee, Ender A. Finol
    Subjects: Applications
    Abstract

    The rupture of an abdominal aortic aneurysm (AAA) is associated with a high
    mortality. When an AAA ruptures, 50% of the patients die before reaching the
    hospital. Of the patients that are able to reach the operating room, only 50%
    have it successfully repaired (Fillinger et al, 2003). Therefore, it is
    important to find good predictors for immediate risk of rupture. Clinically,
    the size of the aneurysm is the variable vascular surgeons usually use to
    evaluate this risk.

  119. Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors.

    Authors: Arnaud Doucet, Anthony Lee, Francois Caron, Chris Holmes
    Subjects: Applications
    Abstract

    We explore the use of generalized t priors on regression coefficients to help
    understand the nature of association signal within "hit regions" of genome-wide
    association studies. The particular generalized t distribution we adopt is a
    Student distribution on the absolute value of its argument. For low degrees of
    freedom we show that the generalized t exhibits 'sparsity-prior' properties
    with some attractive features over other common forms of sparse priors and
    includes the well known double-exponential distribution as the degrees of
    freedom tends to infinity.

  120. Bootstrapping Manski's Maximum Score Estimator.

    Authors: Emilio Seijo, Bodhisattva Sen
    Subjects: Applications
    Abstract

    In this paper we study the applicability of the bootstrap to do inference on
    Manski's maximum score estimator under the full generality of the model. We
    propose three new, model-based bootstrap procedures for this problem and show
    their consistency. Simulation experiments are carried out to evaluate their
    performance and to compare them with subsampling methods. Additionally, we
    prove a uniform convergence theorem for triangular arrays of random variables
    coming from binary choice models, which may be of independent interest.

  121. Rejoinder: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Blakeley B. McShane, Abraham J. Wyner
    Subjects: Applications
    Abstract

    Rejoinder to "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  122. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Gavin A. Schmidt, Michael E. Mann, Scott D. Rutherford
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  123. Matrix Variate Logistic Regression Analysis.

    Authors: Hung Hung, Chen-Chien Wang
    Subjects: Applications
    Abstract

    Logistic regression has been widely applied in the field of biostatistics for
    a long time. It aims to model the conditional success probability of an event
    of interest as the logit function of a linear combination of covariates, for
    the sake of further interpretation of covariates and prediction of new
    observation. In some applications, however, covariates of interest have a
    natural structure, such as being a matrix, at the time of being collected. The
    rows and columns of the covariate matrix would have different meanings, and
    they must contain useful information regarding the response.

  124. HIV dynamics and natural history studies: Joint modeling with doubly interval-censored event time and infrequent longitudinal data.

    Authors: Li Su, Joseph W. Hogan
    Subjects: Applications
    Abstract

    Hepatitis C virus (HCV) coinfection has become one of the most challenging
    clinical situations to manage in HIV-infected patients. Recently the effect of
    HCV coinfection on HIV dynamics following initiation of highly active
    antiretroviral therapy (HAART) has drawn considerable attention. Post-HAART HIV
    dynamics are commonly studied in short-term clinical trials with frequent data
    collection design. For example, the elimination process of plasma virus during
    treatment is closely monitored with daily assessments in viral dynamics studies
    of AIDS clinical trials.

  125. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Stephen McIntyre, Ross McKitrick
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  126. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Jonathan Rougier
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  127. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Bo Li, Doug Nychka
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  128. Estimate the Occurrence Rate of the DNA Palindromes.

    Authors: I-Ping Tu, Yuan-Fu Huang, Shao-Hsuan Wang
    Subjects: Applications
    Abstract

    A DNA palindrome is a segment of double-stranded DNA sequence with inver-
    sion symmetry which may form secondary structures conferring significant
    biolog- ical functions ranging from RNA transcription to DNA replication. To
    test if the clusters of DNA palindromes distribute randomly is an interesting
    bioinformatic problem, where the occurrence rate of the DNA palindromes is a
    key estimator for setting up a test.

  129. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Eugene R. Wahl, Caspar M. Ammann
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  130. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Bala Rajaratnam, Peter Craigmile
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  131. Spurious predictions with random time series: The Lasso in the context of paleoclimatic reconstructions. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Martin P. Tingley
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  132. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Jason E. Smerdon
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  133. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Richard A. Davis, Jingchen Liu
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  134. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Murali Haran, Nathan M. Urban
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  135. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Lasse Holmstr&#xf6;m
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  136. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: Alexey Kaplan
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  137. Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?.

    Authors: L. Mark Berliner
    Subjects: Applications
    Abstract

    Discussion of "A statistical analysis of multiple temperature proxies: Are
    reconstructions of surface temperatures over the last 1000 years reliable?" by
    B.B. McShane and A.J. Wyner [arXiv:1104.4002]

  138. RESID: A Practical Stochastic Model for Software Reliability.

    Authors: Arnab Chakraborty
    Subjects: Applications
    Abstract

    A new approach called RESID is proposed in this paper for estimating
    reliability of a software allowing for imperfect debugging. Unlike earlier
    approaches based on counting number of bugs or modelling inter-failure time
    gaps, RESID focuses on the probability of "bugginess" of different parts of a
    program buggy. This perspective allows an easy way to incorporate the structure
    of the software under test, as well as imperfect debugging. One main design
    objective behind RESID is ease of implementation in practical scenarios.

  139. Principal arc analysis on direct product manifolds.

    Authors: Sungkyu Jung, J. S. Marron, Mark Foskey
    Subjects: Applications
    Abstract

    We propose a new approach to analyze data that naturally lie on manifolds. We
    focus on a special class of manifolds, called direct product manifolds, whose
    intrinsic dimension could be very high. Our method finds a low-dimensional
    representation of the manifold that can be used to find and visualize the
    principal modes of variation of the data, as Principal Component Analysis (PCA)
    does in linear spaces. The proposed method improves upon earlier manifold
    extensions of PCA by more concisely capturing important nonlinear modes.

  140. A dynamic Bayesian nonlinear mixed-effects model of HIV response incorporating medication adherence, drug resistance and covariates.

    Authors: Hulin Wu, Yangxin Huang, Jeanne Holden-Wiltse, Edward P. Acosta
    Subjects: Applications
    Abstract

    HIV dynamic studies have contributed significantly to the understanding of
    HIV pathogenesis and antiviral treatment strategies for AIDS patients.
    Establishing the relationship of virologic responses with clinical factors and
    covariates during long-term antiretroviral (ARV) therapy is important to the
    development of effective treatments. Medication adherence is an important
    predictor of the effectiveness of ARV treatment, but an appropriate determinant
    of adherence rate based on medication event monitoring system (MEMS) data is
    critical to predict virologic outcomes.

  141. Spatial models generated by nested stochastic partial differential equations, with an application to global ozone mapping.

    Authors: David Bolin, Finn Lindgren
    Subjects: Applications
    Abstract

    A new class of stochastic field models is constructed using nested stochastic
    partial differential equations (SPDEs). The model class is computationally
    efficient, applicable to data on general smooth manifolds, and includes both
    the Gaussian Mat\'{e}rn fields and a wide family of fields with oscillating
    covariance functions. Nonstationary covariance models are obtained by spatially
    varying the parameters in the SPDEs, and the model parameters are estimated
    using direct numerical optimization, which is more efficient than standard
    Markov Chain Monte Carlo procedures.

  142. Orthogonal simple component analysis: A new, exploratory approach.

    Authors: Karim Anaya-Izquierdo, Frank Critchley, Karen Vines
    Subjects: Applications
    Abstract

    Combining principles with pragmatism, a new approach and accompanying
    algorithm are presented to a longstanding problem in applied statistics: the
    interpretation of principal components. Following Rousson and Gasser [53 (2004)
    539--555] @p250pt@ the ultimate goal is not to propose a method that leads
    automatically to a unique solution, but rather to develop tools for assisting
    the user in his or her choice of an interpretable solution. Accordingly, our
    approach is essentially exploratory.

  143. Random lasso.

    Authors: Ji Zhu, Bin Nan, Saharon Rosset, Sijian Wang
    Subjects: Applications
    Abstract

    We propose a computationally intensive method, the random lasso method, for
    variable selection in linear models. The method consists of two major steps. In
    step 1, the lasso method is applied to many bootstrap samples, each using a set
    of randomly selected covariates. A measure of importance is yielded from this
    step for each covariate. In step 2, a similar procedure to the first step is
    implemented with the exception that for each bootstrap sample, a subset of
    covariates is randomly selected with unequal selection probabilities determined
    by the covariates' importance.

  144. A generalized linear mixed model for longitudinal binary data with a marginal logit link function.

    Authors: Michael Parzen, Souparno Ghosh, Stuart Lipsitz, Debajyoti Sinha, Garrett M. Fitzmaurice, Bani K. Mallick, Joseph G. Ibrahim
    Subjects: Applications
    Abstract

    Longitudinal studies of a binary outcome are common in the health, social,
    and behavioral sciences. In general, a feature of random effects logistic
    regression models for longitudinal binary data is that the marginal functional
    form, when integrated over the distribution of the random effects, is no longer
    of logistic form. Recently, Wang and Louis [Biometrika 90 (2003) 765--775]
    proposed a random intercept model in the clustered binary data setting where
    the marginal model has a logistic form.

  145. Improved variable selection with Forward-Lasso adaptive shrinkage.

    Authors: Gareth M. James, Peter Radchenko
    Subjects: Applications
    Abstract

    Recently, considerable interest has focused on variable selection methods in
    regression situations where the number of predictors, $p$, is large relative to
    the number of observations, $n$. Two commonly applied variable selection
    approaches are the Lasso, which computes highly shrunk regression coefficients,
    and Forward Selection, which uses no shrinkage. We propose a new approach,
    "Forward-Lasso Adaptive SHrinkage" (FLASH), which includes the Lasso and
    Forward Selection as special cases, and can be used in both the linear
    regression and the Generalized Linear Model domains.

  146. The Maximal Likelihood Transition Path in a Delayed Stochastic System.

    Authors: Hao Wu, Huijun Jiang, Zhonghuai Hou
    Subjects: Applications
    Abstract

    The maximal likelihood transition path(MLP)is informative to explain the
    mechanism of the noise-induced transition, such as chemical reactions,
    biological switches, nucleation processes, etc. In present paper, we
    investigate the MLP between two metastable states in a delayed stochastic
    system by employing a recently developed minimum action method. A modified
    version of the Maier-Stein model with linear delayed feedback is considered as
    an example.

  147. Nonlinear tube-fitting for the analysis of anatomical and functional structures.

    Authors: Jeff Goldsmith, Brian Caffo, Ciprian Crainiceanu, Daniel Reich, Yong Du, Craig Hendrix
    Subjects: Applications
    Abstract

    We are concerned with the estimation of the exterior surface and interior
    summaries of tube-shaped anatomical structures. This interest is motivated by
    two distinct scientific goals, one dealing with the distribution of HIV
    microbicide in the colon and the other with measuring degradation in
    white-matter tracts in the brain. Our problem is posed as the estimation of the
    support of a distribution in three dimensions from a sample from that
    distribution, possibly measured with error. We propose a novel tube-fitting
    algorithm to construct such estimators.

  148. Ratings and rankings: Voodoo or Science?.

    Authors: Paolo Paruolo, Andrea Saltelli, Michaela Saisana
    Subjects: Applications
    Abstract

    Composite indicators aggregate a set of variables using weights which are
    understood to reflect the variables' importance in the index. In this paper we
    propose to measure the importance of a given variable within existing composite
    indicators via Karl Pearson's `correlation ratio'; we call this measure `main
    effect'.

  149. Latent rank change detection for analysis of splice-junction microarrays with nonlinear effects.

    Authors: Jonathan Gelfond, Lee Ann Zarzabal, Tarea Burton, Suzanne Burns, Mari Sogayar, Luiz O. F. Penalva
    Subjects: Applications
    Abstract

    Alternative splicing of gene transcripts greatly expands the functional
    capacity of the genome, and certain splice isoforms may indicate specific
    disease states such as cancer. Splice junction microarrays interrogate
    thousands of splice junctions, but data analysis is difficult and error prone
    because of the increased complexity compared to differential gene expression
    analysis.

  150. Detecting multiple authorship of United States Supreme Court legal decisions using function words.

    Authors: Jeffrey S. Rosenthal, Albert H. Yoon
    Subjects: Applications
    Abstract

    This paper uses statistical analysis of function words used in legal
    judgments written by United States Supreme Court justices, to determine which
    justices have the most variable writing style (which may indicated greater
    reliance on their law clerks when writing opinions), and also the extent to
    which different justices' writing styles are distinguishable from each other.

  151. Structured penalties for functional linear models---partially empirical eigenvectors for regression.

    Authors: Timothy W. Randolph, Jaroslaw Harezlak, Ziding Feng
    Subjects: Applications
    Abstract

    One of the challenges with functional data is incorporating spatial
    structure, or local correlation, into the analysis. This structure is inherent
    in the output from an increasing number of biomedical technologies, and a
    functional linear model is often used to estimate the relationship between the
    predictor functions and scalar responses. Common approaches to the ill-posed
    problem of estimating a coefficient function typically involve two stages:
    regularization and estimation.

  152. On Intrinsic Geometric Stability of Controller.

    Authors: Bhupendra Nath Tiwari, Stefano Bellucci, N. Amuthan, S. Krishnakumar
    Subjects: Applications
    Abstract

    This work explores the role of the intrinsic fluctuations in finite parameter
    controller configurations characterizing an ensemble of arbitrary irregular
    filter circuits. Our analysis illustrates that the parametric intrinsic
    geometric description exhibits a set of exact pair correction functions and
    global correlation volume with and without the variation of the mismatch
    factor. The present consideration shows that the canonical fluctuations can
    precisely be depicted without any approximation.

  153. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection.

    Authors: Jian Huang, Patrick Breheny
    Subjects: Applications
    Abstract

    A number of variable selection methods have been proposed involving nonconvex
    penalty functions. These methods, which include the smoothly clipped absolute
    deviation (SCAD) penalty and the minimax concave penalty (MCP), have been
    demonstrated to have attractive theoretical properties, but model fitting is
    not a straightforward task, and the resulting solutions may be unstable.

  154. Encoding and decoding V1 fMRI responses to natural images with sparse nonparametric models.

    Authors: Bin Yu, Pradeep Ravikumar, Vincent Q Vu, Thomas Naselaris, Kendrick N Kay, Jack L Gallant
    Subjects: Applications
    Abstract

    Functional MRI (fMRI) has become the most common method for investigating the
    human brain. However, fMRI data present some complications for statistical
    analysis and modeling. One recently developed approach to these data focuses on
    estimation of computational encoding models that describe how stimuli are
    transformed into brain activity measured in individual voxels. Here we aim at
    building encoding models for fMRI signals recorded in primary visual cortex of
    the human brain. We use residual analyses to reveal systematic nonlinearity
    across voxels not taken into account by previous models.

  155. An autoregressive approach to house price modeling.

    Authors: Lawrence D. Brown, Chaitra H. Nagaraja, Linda H. Zhao
    Subjects: Applications
    Abstract

    A statistical model for predicting individual house prices and constructing a
    house price index is proposed utilizing information regarding sale price, time
    of sale and location (ZIP code). This model is composed of a fixed time effect
    and a random ZIP (postal) code effect combined with an autoregressive
    component. The former two components are applied to all home sales, while the
    latter is applied only to homes sold repeatedly. The time effect can be
    converted into a house price index.

  156. Estimating the number of neurons in multi-neuronal spike trains.

    Authors: Mengxin Li, Wei-Liem Loh
    Subjects: Applications
    Abstract

    A common way of studying the relationship between neural activity and
    behavior is through the analysis of neuronal spike trains that are recorded
    using one or more electrodes implanted in the brain. Each spike train typically
    contains spikes generated by multiple neurons. A natural question that arises
    is "what is the number of neurons $\nu$ generating the spike train?"; This
    article proposes a method-of-moments technique for estimating $\nu$.

  157. A spatial analysis of multivariate output from regional climate models.

    Authors: Stephan R. Sain, Reinhard Furrer, Noel Cressie
    Subjects: Applications
    Abstract

    Climate models have become an important tool in the study of climate and
    climate change, and ensemble experiments consisting of multiple climate-model
    runs are used in studying and quantifying the uncertainty in climate-model
    output. However, there are often only a limited number of model runs available
    for a particular experiment, and one of the statistical challenges is to
    characterize the distribution of the model output. To that end, we have
    developed a multivariate hierarchical approach, at the heart of which is a new
    representation of a multivariate Markov random field.

  158. Editorial.

    Authors: Michael L. Stein
    Subjects: Applications
    Abstract

    Many of you reading these words will have been attracted by the discussion
    paper [McShane and Wyner (2011)], in which case, this may be the first, but
    hopefully not the last, time you will have read anything in a statistics
    journal. I would like to take this opportunity to discuss the review process in
    our journal and to make some comments about the role of statistics and
    uncertainty assessment in paleoclimatology and the broader debate about climate
    change.

  159. Biometric Cards for Indian Population: Role of Mathematical Models in Assisting and Planning.

    Authors: Arni S.R. Srinivasa Rao
    Subjects: Applications
    Abstract

    Mathematical models could be helpful in assisting the Indian Government's new
    initiative of issuing biometric cards to its citizens. In this note, we look
    into the role of mathematical models in estimating the missing, non-enumerated
    population numbers, estimating annual numbers of cards required by age, gender
    and regions in India. The linkage between National Population Register and
    biometric cards is also highlighted. There are other scientific issues, namely,
    electronic, data storage management, identity verification etc, which we do not
    address in this paper.

  160. Density-based Monte Carlo Filter and its Application in Pharmacokinetic Parameter Estimation for Stochastic Differential Equation Models.

    Authors: Guanghui Huang, Jianping Wan, Hui Chen
    Subjects: Applications
    Abstract

    A genetic algorithm based on a density-based Monte Carlo filter is proposed
    to estimate the unknown parameters in the stochastic differential equations
    involved in PKPD modelling. The resulted estimation is a robust procedure. The
    performances of extended Kalman filter and the proposed filter are compared
    through a simulation test. It is found that the proposed method is more
    accurate than the method based on the extended Kalman filter to estimate the
    values of unobservable variables and the unknown parameters with respect to
    mean absolute error.

  161. Estimating infectious disease parameters from data on social contacts and serological status.

    Authors: Nele Goeyvaerts, Niel Hens, Benson Ogunjimi, Marc Aerts, Ziv Shkedy, Pierre Van Damme, Philippe Beutels
    Subjects: Applications
    Abstract

    In dynamic models of infectious disease transmission, typically various
    mixing patterns are imposed on the so-called Who-Acquires-Infection-From-Whom
    matrix (WAIFW). These imposed mixing patterns are based on prior knowledge of
    age-related social mixing behavior rather than observations. Alternatively, one
    can assume that transmission rates for infections transmitted predominantly
    through non-sexual social contacts, are proportional to rates of conversational
    contact which can be estimated from a contact survey.

  162. Modeling Gaussian Random Fields by Anchored Inversion and Monte Carlo Sampling.

    Authors: Zepu Zhang
    Subjects: Applications
    Abstract

    It is common and convenient to treat distributed physical parameters as
    Gaussian random fields and model them in an "inverse procedure" using
    measurements of various properties of the fields. This article presents a
    general method for this problem based on a flexible parameterization device
    called "anchors", which captures local or global features of the fields. A
    classification of all relevant data into two categories closely cooperates with
    the anchor concept to enable systematic use of datasets of different sources
    and disciplinary natures.

  163. On a recent development in stochastic inversion with applications to hydrogeology.

    Authors: Zepu Zhang
    Subjects: Applications
    Abstract

    We comment on a recent approach to stochastic inversion, which centers on a
    concept known as "anchors" and conducts nonparametric estimation of the
    likelihood of the anchors (along with other model parameters) with respect to
    data obtained from field processes. The method is called "anchored inversion"
    or (less accurately) "method of anchored distribution". Conceptual and
    technical observations are made regarding the development, interpretation, and
    use of this approach.

  164. A Generic Multivariate Distribution for Counting Data.

    Authors: Marcos Capistr&#xe1;n, J. Andr&#xe9;s Christen
    Subjects: Applications
    Abstract

    Motivated by the need, in some Bayesian likelihood free inference problems,
    of imputing a multivariate counting distribution based on its vector of means
    and variance-covariance matrix, we define a generic multivariate discrete
    distribution. Based on blending the Binomial, Poisson and Negative-Binomial
    distributions, and using a normal multivariate copula, the required
    distribution is defined. This distribution tends to the Multivariate Normal for
    large counts and has an approximate pmf version that is quite simple to
    evaluate.

  165. Comparing air quality statistical models.

    Authors: Michela Cameletti, Rosaria Ignaccolo, Stefano Bande
    Subjects: Applications
    Abstract

    Air pollution is a great concern because of its impact on human health and on
    the environment. Statistical models play an important role in improving
    knowledge of this complex spatio-temporal phenomenon and in supporting public
    agencies and policy makers. We focus on the class of hierarchical models that
    provides a flexible framework for incorporating spatio-temporal interactions at
    different hierarchical levels. The challenge is to choose a model that is
    satisfactory in terms of goodness of fit, interpretability, parsimoniousness,
    prediction capability and computational costs.

  166. Nonparametric Methodology for the Time-Dependent Partial Area under the ROC Curve.

    Authors: Hung Hung, Chin-Tsang Chiang
    Subjects: Applications
    Abstract

    To assess the classification accuracy of a continuous diagnostic result, the
    receiver operating characteristic (ROC) curve is commonly used in applications.
    The partial area under the ROC curve (pAUC) is one of widely accepted summary
    measures due to its generality and ease of probability interpretation. In the
    field of life science, a direct extension of the pAUC into the time-to-event
    setting can be used to measure the usefulness of a biomarker for disease
    detection over time.

  167. A note on logistic regression and logistic kernel machine models.

    Authors: Pei Wang, Jie Peng, Ru Wang
    Subjects: Applications
    Abstract

    This is a note on logistic regression models and logistic kernel machine
    models. It contains derivations to some of the expressions in a paper -- SNP
    Set Analysis for Detecting Disease Association Using Exon Sequence Data --
    submitted to BMC proceedings by these authors.

  168. On Properties of the Minimum Entropy Sub-tree to Compute Lower Bounds on the Partition Function.

    Authors: Mehdi Molkaraie, Payam Pakzad
    Subjects: Applications
    Abstract

    Computing the partition function and the marginals of a global probability
    distribution are two important issues in any probabilistic inference problem.
    In a previous work, we presented sub-tree based upper and lower bounds on the
    partition function of a given probabilistic inference problem. Using the
    entropies of the sub-trees we proved an inequality that compares the lower
    bounds obtained from different sub-trees. In this paper we investigate the
    properties of one specific lower bound, namely the lower bound computed by the
    minimum entropy sub-tree.

  169. Instant Replay: Investigating statistical Analysis in Sports.

    Authors: Gagan Sidhu
    Subjects: Applications
    Abstract

    Technology has had an unquestionable impact on the way people watch sports.
    As technology has evolved, so too has the knowledge of a casual sports fan. A
    direct result of this evolution is the amount of statistical analysis in sport.
    The goal of statistical analysis in sports is a simple one: to eliminate
    subjective analysis. Over the past four decades, statistics have slowly
    pervaded the viewing experience of sports. In this paper, we analyze previous
    work that proposed metrics and models that seek to evaluate various aspects of
    sports.

  170. A Dynamic Spatio-temporal Precipitation Model.

    Authors: Hans R. K&#xfc;nsch, Fabio Sigrist, Werner A. Stahel
    Subjects: Applications
    Abstract

    A spatio-temporal model for precipitation is presenteds. Modeling the
    continuous and the discrete part of rainfall together, it is assumed that
    precipitation has a censored and power-transformed normal distribution. The
    mean of this distribution is linked to covariates. Spatio-temporal correlations
    are accounted for by a latent Gaussian variable that follows a Markovian
    temporal evolution combined with spatially correlated innovations. We propose
    to specify the temporal evolution using a vector autoregression that is
    motivated by an autoregressive convolution approach.

  171. Scaling and Hierarchy in Urban Economies.

    Authors: Cosma Rohilla Shalizi
    Subjects: Applications
    Abstract

    In several recent publications, Bettencourt, West and collaborators claim
    that properties of cities such as gross economic production, personal income,
    numbers of patents filed, number of crimes committed, etc., show super-linear
    power-scaling with total population, while measures of resource use show
    sub-linear power-law scaling.

  172. The Banff Challenge: Statistical Detection of a Noisy Signal.

    Authors: A. C. Davison, N. Sartori
    Subjects: Applications
    Abstract

    Particle physics experiments such as those run in the Large Hadron Collider
    result in huge quantities of data, which are boiled down to a few numbers from
    which it is hoped that a signal will be detected. We discuss a simple
    probability model for this and derive frequentist and noninformative Bayesian
    procedures for inference about the signal. Both are highly accurate in
    realistic cases, with the frequentist procedure having the edge for interval
    estimation, and the Bayesian procedure yielding slightly better point
    estimates.

  173. Multivariate Goodness of Fit Procedures for Unbinned Data: An Annotated Bibliography.

    Authors: Giulio Palombo
    Subjects: Applications
    Abstract

    Unbinned maximum likelihood is a common procedure for parameter estimation.
    After parameters have been estimated, it is crucial to know whether the fit
    model adequately describes the experimental data. Univariate Goodness of Fit
    procedures have been thoroughly analyzed. In multi-dimensions, Goodness of Fit
    test powers have rarely been studied on realistic problems. There is no
    definitive answer to regarding which method is better. Test performance is
    strictly related to specific analysis characteristics. In this work, a review
    of multi-variate Goodness of Fit techniques is presented.

  174. A New Methodology for Real Estate Appraisal using GAMLSS Models.

    Authors: Lutemberg Florencio, Francisco Cribari-Neto, Raydonal Ospina
    Subjects: Applications
    Abstract

    The valuation of real estates (e.g., house, land, among others) is of extreme
    importance for decision making. Their singular characteristics make valuation
    through hedonic pricing methods dificult since the theory does not specify the
    correct regression functional form nor which explanatory variables should be
    included in the hedonic equation. In this article we perform real estate
    appraisal using a class of regression models proposed by Rigby & Stasinopoulos
    (2005): generalized additive models for location, scale and shape (GAMLSS).

  175. Improving PSF calibration in confocal microscopic imaging---estimating and exploiting bilateral symmetry.

    Authors: Nicolai Bissantz, Hajo Holzmann, Miros&#x142;aw Pawlak
    Subjects: Applications
    Abstract

    A method for estimating the axis of reflectional symmetry of an image
    $f(x,y)$ on the unit disc $D=\{(x,y):x^2+y^2\leq1\}$ is proposed, given that
    noisy data of $f(x,y)$ are observed on a discrete grid of edge width $\Delta$.
    Our estimation procedure is based on minimizing over $\beta\in[0,\pi)$ the
    $L_2$ distance between empirical versions of $f$ and $\tau_{\beta}f$, the image
    of $f$ after reflection at the axis along $(\cos\beta,\sin\beta)$. Here, $f$
    and $\tau_{\beta}f$ are estimated using truncated radial series of the Zernike
    type.

  176. No alignment of cattle along geomagnetic field lines found.

    Authors: J. Hert, L. Jelinek, L. Pekarek, A. Pavlicek
    Subjects: Applications
    Abstract

    This paper presents a study of the body orientation of domestic cattle on
    free pastures in several European states, based on Google satellite
    photographs. In sum, 232 herds with 3412 individuals were evaluated. Two
    independent groups participated in our study and came to the same conclusion
    that, in contradiction to the recent findings of other researchers, no
    alignment of the animals and of their herds along geomagnetic field lines could
    be found.

  177. Joint Detection and Estimation: Optimum Tests and Applications.

    Authors: George V. Moustakides, Xiaodong Wang, Ali Tajer, Guido H. Jajamovich
    Subjects: Applications
    Abstract

    We consider a well defined joint detection and parameter estimation problem.
    By combining the Baysian formulation of the estimation subproblem with suitable
    constraints on the detection subproblem we develop optimum one- and two-step
    test for the joint detection/estimation case. The proposed combined strategies
    have the very desirable characteristic to allow for the trade-off between
    detection power and estimation efficiency. Our theoretical developments are
    then applied to the problems of retrospective changepoint detection and MIMO
    radar.

  178. Statistical Multiresolution Estimation in Imaging: Fundamental Concepts and Algorithmic Framework.

    Authors: Axel Munk, Philipp Marnitz, Klaus Frick
    Subjects: Applications
    Abstract

    In this paper we introduce a general class of statistical multiresolution
    estimators and develop an algorithmic framework for computing those. By this we
    mean estimators that are defined as solutions of convex optimization problems
    with $\ell_\infty$-type constraints. We employ a combination of an alternating
    direction augmented Lagrangian technique with Dykstra's algorithm for computing
    orthogonal projections onto intersections of convex sets. The capability of the
    proposed method is illustrated by various examples from imaging.

  179. An exponential random graph modeling approach to creating group-based representative whole-brain connectivity networks.

    Authors: Sean L. Simpson, Malaak N. Moussa, Paul J. Laurienti
    Subjects: Applications
    Abstract

    Group-based brain connectivity networks have great appeal for researchers
    interested in gaining further insight into complex brain function and how it
    changes across different mental states and disease conditions. Accurately
    constructing these networks presents a daunting challenge given the
    difficulties associated with accounting for inter-subject topological
    variability. Viable approaches to this task must engender networks that capture
    the constitutive topological properties of the group of subjects' networks that
    it is aiming to represent.

  180. Valued Ties Tell Fewer Lies, II: Why Not To Dichotomize Network Edges With Bounded Outdegrees.

    Authors: Andrew C. Thomas, Joseph K. Blitzstein
    Subjects: Applications
    Abstract

    Various methods have been proposed for creating a binary version of a complex
    network with valued ties. Rather than the default method of choosing a single
    threshold value about which to dichotomize, we consider a method of choosing
    the highest k outbound arcs from each person and assigning a binary tie, as
    this has the advantage of minimizing the isolation of nodes that may otherwise
    be weakly connected. However, simulations and real data sets establish that
    this method is worse than the default thresholding method and should not be
    generally considered to deal with valued networks.

  181. Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge?.

    Authors: Brian Francis, Regina Dittrich, Reinhold Hatzinger
    Subjects: Applications
    Abstract

    This paper is motivated by a Eurobarometer survey on science knowledge. As
    part of the survey, respondents were asked to rank sources of science
    information in order of importance. The official statistical analysis of these
    data however failed to use the complete ranking information. We instead propose
    a method which treats ranked data as a set of paired comparisons which places
    the problem in the standard framework of generalized linear models and also
    allows respondent covariates to be incorporated. An extension is proposed to
    allow for heterogeneity in the ranked responses.

  182. Sparse modeling of categorial explanatory variables.

    Authors: Gerhard Tutz, Jan Gertheiss
    Subjects: Applications
    Abstract

    Shrinking methods in regression analysis are usually designed for metric
    predictors. In this article, however, shrinkage methods for categorial
    predictors are proposed. As an application we consider data from the Munich
    rent standard, where, for example, urban districts are treated as a categorial
    predictor. If independent variables are categorial, some modifications to usual
    shrinking procedures are necessary. Two $L_1$-penalty based methods for factor
    selection and clustering of categories are presented and investigated.

  183. Bayesian semiparametric inference for multivariate doubly-interval-censored data.

    Authors: Maria De Iorio, Alejandro Jara, Emmanuel Lesaffre, Fernando Quintana
    Subjects: Applications
    Abstract

    Based on a data set obtained in a dental longitudinal study, conducted in
    Flanders (Belgium), the joint time to caries distribution of permanent first
    molars was modeled as a function of covariates. This involves an analysis of
    multivariate continuous doubly-interval-censored data since: (i) the emergence
    time of a tooth and the time it experiences caries were recorded yearly, and
    (ii) events on teeth of the same child are dependent. To model the joint
    distribution of the emergence times and the times to caries, we propose a
    dependent Bayesian semiparametric model.

  184. Detection of treatment effects by covariate-adjusted expected shortfall.

    Authors: Xuming He, Ya-Hui Hsu, Mingxiu Hu
    Subjects: Applications
    Abstract

    The statistical tests that are commonly used for detecting mean or median
    treatment effects suffer from low power when the two distribution functions
    differ only in the upper (or lower) tail, as in the assessment of the Total
    Sharp Score (TSS) under different treatments for rheumatoid arthritis. In this
    article, we propose a more powerful test that detects treatment effects through
    the expected shortfalls.

  185. Testing affiliation in private-values models of first-price auctions using grid distributions.

    Authors: Luciano I. de Castro, Harry J. Paarsch
    Subjects: Applications
    Abstract

    Within the private-values paradigm, we construct a tractable empirical model
    of equilibrium behavior at first-price auctions when bidders' valuations are
    potentially dependent, but not necessarily affiliated. We develop a test of
    affiliation and apply our framework to data from low-price, sealed-bid auctions
    held by the Department of Transportation in the State of Michigan to procure
    road-resurfacing services: we do not reject the hypothesis of affiliation in
    cost signals.

  186. Model-robust regression and a Bayesian ``sandwich'' estimator.

    Authors: Adam A. Szpiro, Kenneth M. Rice, Thomas Lumley
    Subjects: Applications
    Abstract

    We present a new Bayesian approach to model-robust linear regression that
    leads to uncertainty estimates with the same robustness properties as the
    Huber--White sandwich estimator. The sandwich estimator is known to provide
    asymptotically correct frequentist inference, even when standard modeling
    assumptions such as linearity and homoscedasticity in the data-generating
    mechanism are violated. Our derivation provides a compelling Bayesian
    justification for using this simple and popular tool, and it also clarifies
    what is being estimated when the data-generating mechanism is not linear.

  187. A Bayesian graphical modeling approach to microRNA regulatory network inference.

    Authors: Francesco C. Stingo, Yian A. Chen, Marina Vannucci, Marianne Barrier, Philip E. Mirkes
    Subjects: Applications
    Abstract

    It has been estimated that about 30% of the genes in the human genome are
    regulated by microRNAs (miRNAs). These are short RNA sequences that can
    down-regulate the levels of mRNAs or proteins in animals and plants. Genes
    regulated by miRNAs are called targets.

  188. Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption.

    Authors: Xia Wang, Dipak K. Dey
    Subjects: Applications
    Abstract

    In the information system research, a question of particular interest is to
    interpret and to predict the probability of a firm to adopt a new technology
    such that market promotions are targeted to only those firms that were more
    likely to adopt the technology. Typically, there exists significant difference
    between the observed number of ``adopters'' and ``nonadopters,'' which is
    usually coded as binary response. A critical issue involved in modeling such
    binary response data is the appropriate choice of link functions in a
    regression model.

  189. An imputation-based approach for parameter estimation in the presence of ambiguous censoring with application in industrial supply chain.

    Authors: Samiran Ghosh
    Subjects: Applications
    Abstract

    This paper describes a novel approach based on "proportional imputation" when
    identical units produced in a batch have random but independent installation
    and failure times. The current problem is motivated by a real life industrial
    production-delivery supply chain where identical units are shipped after
    production to a third party warehouse and then sold at a future date for
    possible installation.

  190. Bayesian Analysis of Loss Ratios Using the Reversible Jump Algorithm.

    Authors: Garfield Brown, Steve Brooks
    Subjects: Applications
    Abstract

    In this paper we consider the problem of model choice for a set of insurance
    loss ratios. We use a reversible jump algorithm for our model discrimination
    and show how the vanilla reversible jump algorithm can be improved on using
    recent methodological advances in reversible jump computation.

  191. Nonparametric inference of doubly stochastic Poisson process data via the kernel method.

    Authors: Tingting Zhang, S. C. Kou
    Subjects: Applications
    Abstract

    Doubly stochastic Poisson processes, also known as the Cox processes,
    frequently occur in various scientific fields. In this article, motivated
    primarily by analyzing Cox process data in biophysics, we propose a
    nonparametric kernel-based inference method. We conduct a detailed study,
    including an asymptotic analysis, of the proposed method, and provide
    guidelines for its practical use, introducing a fast and stable regression
    method for bandwidth selection.

  192. A nonlinear mixed effects directional model for the estimation of the rotation axes of the human ankle.

    Authors: Mohammed Haddou, Louis-Paul Rivest, Michael Pierrynowski
    Subjects: Applications
    Abstract

    This paper suggests a nonlinear mixed effects model for data points in
    $\mathit{SO}(3)$, the set of $3\times3$ rotation matrices, collected according
    to a repeated measure design. Each sample individual contributes a sequence of
    rotation matrices giving the relative orientations of the right foot with
    respect to the right lower leg as its ankle moves. The random effects are the
    five angles characterizing the orientation of the two rotation axes of a
    subject's right ankle. The fixed parameters are the average value of these
    angles and their variances within the population.

  193. Modeling the dynamics of biomarkers during primary HIV infection taking into account the uncertainty of infection date.

    Authors: D. Commenges, J. Drylewicz, J. Guedj, R. Thi&#xe9;baut
    Subjects: Applications
    Abstract

    During primary HIV infection, the kinetics of plasma virus concentrations and
    CD4+ cell counts is very complex. Parametric and nonparametric models have been
    suggested for fitting repeated measurements of these markers. Alternatively,
    mechanistic approaches based on ordinary differential equations have also been
    proposed. These latter models are constructed according to biological knowledge
    and take into account the complex nonlinear interactions between viruses and
    cells. However, estimating the parameters of these models is difficult.

  194. Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data.

    Authors: Dominique-Laurent Couturier, Maria-Pia Victoria-Feser
    Subjects: Applications
    Abstract

    Extreme value data with a high clump-at-zero occur in many domains. Moreover,
    it might happen that the observed data are either truncated below a given
    threshold and/or might not be reliable enough below that threshold because of
    the recording devices. These situations occur, in particular, with radio
    audience data measured using personal meters that record environmental noise
    every minute, that is then matched to one of the several radio programs.

  195. Liquid chromatography mass spectrometry-based proteomics: Biological and technological aspects.

    Authors: Yuliya V. Karpievitch, Ashoka D. Polpitiya, Gordon A. Anderson, Richard D. Smith, Alan R. Dabney
    Subjects: Applications
    Abstract

    Mass spectrometry-based proteomics has become the tool of choice for
    identifying and quantifying the proteome of an organism. Though recent years
    have seen a tremendous improvement in instrument performance and the
    computational tools used, significant challenges remain, and there are many
    opportunities for statisticians to make important contributions.

  196. Remembering Leo Breiman.

    Authors: Richard A. Olshen
    Subjects: Applications
    Abstract

    I published an interview of Leo Breiman in Statistical Science [Olshen
    (2001)], and also the solution to a problem concerning almost sure convergence
    of binary tree-structured estimators in regression [Olshen (2007)]. The former
    summarized much of my thinking about Leo up to five years before his death. I
    discussed the latter with Leo and dedicated that paper to his memory.
    Therefore, this note is on other topics. In preparing it I am reminded how much
    I miss this man of so many talents and interests.

  197. Remembering Leo.

    Authors: Bin Yu
    Subjects: Applications
    Abstract

    I do not remember when was the first time that I met Leo, but I have a clear
    memory of going to Leo's office on the 4th floor of Evans Hall to talk to him
    in my second year in Berkeley's Ph.D. program in 1986. The details of the
    conversation are not retained but a visual image of his clean and orderly
    office remains, in a stark contrast to a high entropy state of the same office
    now being used by myself.

  198. Exit polling and racial bloc voting: Combining individual-level and R$\times$C ecological data.

    Authors: D. James Greiner, Kevin M. Quinn
    Subjects: Applications
    Abstract

    Despite its shortcomings, cross-level or ecological inference remains a
    necessary part of some areas of quantitative inference, including in United
    States voting rights litigation. Ecological inference suffers from a lack of
    identification that, most agree, is best addressed by incorporating
    individual-level data into the model.

  199. Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data.

    Authors: Marc A. Suchard, Jennifer A. Tom, Janet S. Sinsheimer
    Subjects: Applications
    Abstract

    Massive datasets in the gigabyte and terabyte range combined with the
    availability of increasingly sophisticated statistical tools yield analyses at
    the boundary of what is computationally feasible. Compromising in the face of
    this computational burden by partitioning the dataset into more tractable sizes
    results in stratified analyses, removed from the context that justified the
    initial data collection.

  200. Multicategory vertex discriminant analysis for high-dimensional data.

    Authors: Kenneth Lange, Tong Tong Wu
    Subjects: Applications
    Abstract

    In response to the challenges of data mining, discriminant analysis continues
    to evolve as a vital branch of statistics. Our recently introduced method of
    vertex discriminant analysis (VDA) is ideally suited to handle multiple
    categories and an excess of predictors over training cases. The current paper
    explores an elaboration of VDA that conducts classification and variable
    selection simultaneously. Adding lasso ($\ell_1$-norm) and Euclidean penalties
    to the VDA loss function eliminates unnecessary predictors.

  201. Subsampling Methods for genomic inference.

    Authors: Peter J. Bickel, Nathan Boley, James B. Brown, Haiyan Huang, Nancy R. Zhang
    Subjects: Applications
    Abstract

    Large-scale statistical analysis of data sets associated with genome
    sequences plays an important role in modern biology. A key component of such
    statistical analyses is the computation of $p$-values and confidence bounds for
    statistics defined on the genome. Currently such computation is commonly
    achieved through ad hoc simulation measures. The method of randomization, which
    is at the heart of these simulation procedures, can significantly affect the
    resulting statistical conclusions.

  202. Leo and me.

    Authors: Jacob Feldman
    Subjects: Applications
    Abstract

    I arrived in Berkeley in 1957, at which time Leo was an Acting Assistant
    Professor of Mathematics here. He had recently proven the "individual ergodic
    theorem of information theory"---a triumph---and since this was becoming
    central to my own interests, it would have been natural for us to work
    together. However, Leo's interests shifted to more applied work, specifically
    statistics, and he soon moved to UCLA. So we never became collaborators, but we
    did became good friends, especially after 1980 when he returned to Berkeley as
    a Professor of Statistics.

  203. Selected recollections of my relationship with Leo Breiman.

    Authors: Charles J. Stone
    Subjects: Applications
    Abstract

    During the period 1962--1964, I had a tenure track Assistant Professorship in
    Mathematics at Cornell University in Ithaca, New York, where I did research in
    probability theory, especially on linear diffusion processes. Being somewhat
    lonely there and not liking the cold winter weather, I decided around the
    beginning of 1964 to try to get a job in the Mathematics Department at UCLA, in
    the city in which I was born and raised. At that time, Leo Breiman was an
    Associate Professor in that department.

  204. Leo Breiman: An important intellectual and personal force in statistics, my life and that of many others.

    Authors: Peter J. Bickel
    Subjects: Applications
    Abstract

    I first met Leo Breiman in 1979 at the beginning of his third career,
    Professor of Statistics at Berkeley. He obtained his PhD with Lo\'eve at
    Berkeley in 1957. His first career was as a probabilist in the Mathematics
    Department at UCLA. After distinguished research, including the
    Shannon--Breiman--MacMillan Theorem and getting tenure, he decided that his
    real interest was in applied statistics, so he resigned his position at UCLA
    and set up as a consultant. Before doing so he produced two classic texts,
    Probability, now reprinted as a SIAM Classic in Applied Mathematics, and
    Statistics.

  205. Remembering Leo.

    Authors: Jerome H. Friedman
    Subjects: Applications
    Abstract

    Leo Breiman was a unique character. There will not be another like him. I
    consider it one of my great fortunes in life to have know and worked with him.
    Along with John Tukey, Leo had the greatest influence on shaping my approach to
    statistical problems. I did some of my best work collaborating with Leo, but
    more importantly, we both had great fun doing it. I look back on those years
    when we worked closely together with great fondness and regard them as among
    the happiest and most fruitful of my professional career.

  206. Leo Breiman.

    Authors: Michael I. Jordan
    Subjects: Applications
    Abstract

    Statistics is a uniquely difficult field to convey to the uninitiated. It
    sits astride the abstract and the concrete, the theoretical and the applied. It
    has a mathematical flavor and yet it is not simply a branch of mathematics. Its
    core problems blend into those of the disciplines that probe into the nature of
    intelligence and thought, in particular philosophy, psychology and artificial
    intelligence. Debates over foundational issues have waxed and waned, but the
    field has not yet arrived at a single foundational perspective.

  207. Remembrance of Leo Breiman.

    Authors: Peter B&#xfc;hlmann
    Subjects: Applications
    Abstract

    In 1994, I came to Berkeley and was fortunate to stay there three years,
    first as a postdoctoral researcher and then as Neyman Visiting Assistant
    Professor. For me, this period was a unique opportunity to see other aspects
    and learn many more things about statistics: the Department of Statistics at
    Berkeley was much bigger and hence broader than my home at ETH Z\"urich and I
    enjoyed very much that the science was perhaps a bit more speculative.

  208. Remembering Leo Breiman.

    Authors: Adele Cutler
    Subjects: Applications
    Abstract

    Leo Breiman was a highly creative, influential researcher with a
    down-to-earth personal style and an insistence on working on important real
    world problems and producing useful solutions. This paper is a short review of
    Breiman's extensive contributions to the field of applied statistics.

  209. Evolution of Chinese airport network.

    Authors: Jun Zhang, Xian-Bin Cao, Wen-Bo Du, Kai-Quan Cai
    Subjects: Applications
    Abstract

    With the rapid development of economy and the accelerated globalization
    process, the aviation industry plays more and more critical role in today's
    world, in both developed and developing countries. As the infrastructure of
    aviation industry, the airport network is one of the most important indicators
    of economic growth. In this paper, we investigate the evolution of Chinese
    airport network (CAN) via complex network theory.

  210. Frame theory in directional statistics.

    Authors: Martin Ehler, Jennifer Galanis
    Subjects: Applications
    Abstract

    Distinguishing between uniform and non-uniform sample distributions is a
    common problem in directional data analysis; however for many tests,
    non-uniform distributions exist that fail uniformity rejection. By merging
    directional statistics with frame theory, we find that probabilistic tight
    frames yield non-uniform distributions that minimize directional potentials,
    leading to failure of uniformity rejection for the Bingham test. Finally, we
    apply our results to model patterns found in granular rod experiments.

  211. Scalable Inference of Customer Similarities from Interactions Data using Dirichlet Processes.

    Authors: Michael Braun, Andr&#xe9; Bonfrer
    Subjects: Applications
    Abstract

    Under the sociological theory of homophily, people who are similar to one
    another are more likely to interact with one another. Marketers often have
    access to data on interactions among customers from which, with homophily as a
    guiding principle, inferences could be made about the underlying similarities.
    However, larger networks face a quadratic explosion in the number of potential
    interactions that need to be modeled. This scalability problem renders
    probability models of social interactions computationally infeasible for all
    but the smallest networks.

  212. Experience Rating with Poisson Mixtures.

    Authors: Garfield Brown, Winston Buckley, Steve Brooks
    Subjects: Applications
    Abstract

    We present a mixture Poisson model for claims counts in which the number of
    components in the mixture are estimated by reversible jump MCMC methods.

  213. Discrimination for Two Way Models with Insurance Application.

    Authors: Garfield Brown, Winston Buckley
    Subjects: Applications
    Abstract

    In this paper, we review and apply several approaches to model selection for
    analysis of variance models which are used in a credibility and insurance
    context. The reversible jump algorithm is employed for model selection, where
    posterior model probabilities are computed. We then apply this method to
    insurance data from workers' compensation insurance schemes. The reversible
    jump results are compared with the Deviance Information Criterion, and are
    shown to be consistent.

  214. The Effect of Differential Recruitment, Non-response and Non-recruitment on Estimators for Respondent-Driven Sampling.

    Authors: Krista J. Gile, Amber Tomas
    Subjects: Applications
    Abstract

    Respondent-driven sampling is a widely-used network sampling technique,
    designed to sample from hard-to-reach populations. Estimation from the
    resulting samples is an area of active research, with software available to
    compute at least four estimators of a population proportion. Each estimator is
    claimed to address deficiencies in previous estimators, however those claims
    are often unsubstantiated. In this study we provide a simulation-based
    comparison of five existing estimators, focussing on sampling conditions which
    a recent estimator is designed to address.

  215. Multiple change-point Poisson model for threshold exceedances of air pollution concentrations.

    Authors: Janos Gyarmati-Szabo, Leonid V. Bogachev, Haibo Chen
    Subjects: Applications
    Abstract

    A Bayesian multiple change-point model is proposed to analyse violations of
    air quality standards by pollutants such as nitrogen oxides (NO2 and NO) and
    carbon monoxide (CO). The model is built on the assumption that the occurrence
    of threshold exceedances may be described by a non-homogeneous Poisson process
    with a step rate function. Unlike earlier approaches, our model is not
    restricted by a predetermined number of change-points, nor does it involve any
    covariates.

  216. A Bayesian Statistical Approach for Inference on Static Origin-Destination Matrices.

    Authors: Luis Carvalho
    Subjects: Applications
    Abstract

    We address the problem of static OD matrix estimation from a formal
    statistical viewpoint. We adopt a novel Bayesian framework to develop a class
    of models that explicitly cast trip configurations in the study region as
    random variables. As a consequence, classical solutions from growth factor,
    gravity, and maximum entropy models are identified to specific estimators under
    the proposed models.

  217. Seasonal fractional long-memory processes. A semiparametric estimation approach.

    Authors: Valderio A. Reisen, Wilfredo Palma, Josu Arteche, Bartolomeu Zamprogno
    Subjects: Applications
    Abstract

    This paper explores seasonal and long-memory time series properties by using
    the seasonal fractional ARIMA model when the seasonal data has one and two
    seasonal periods and short-memory counterparts. The stationarity and
    invertibility parameter conditions are established for the model studied. To
    estimate the memory parameters, the method given in Reisen, Rodrigues and Palma
    (2006 a,b) is generalized here to deal with a time series with two seasonal
    fractional long-memory parameters.

  218. Random stress and Omori's law.

    Authors: Yan Y. Kagan
    Subjects: Applications
    Abstract

    We consider two statistical regularities that were used to explain Omori's
    law of the aftershock rate decay: the Levy and Inverse Gaussian (IGD)
    distributions. These distributions are thought to describe stress behavior
    influenced by various random factors: post-earthquake stress time history is
    described by a Brownian motion. Both distributions decay to zero for time
    intervals close to zero. But this feature contradicts the high immediate
    aftershock level according to Omori's law.

  219. Detection of network motifs by local concentration.

    Authors: Etienne Birmele
    Subjects: Applications
    Abstract

    Studying the topology of so-called {\em real networks}, that is networks
    obtained from sociological or biological data for instance, has become a major
    field of interest in the last decade. One way to deal with it is to consider
    that networks are built from small functional units called {\em motifs}, which
    can be found by looking for small subgraphs whose numbers of occurrences in the
    whole network of interest are surprisingly high.

  220. Characterization of differentially expressed genes using high-dimensional co-expression networks.

    Authors: Gabriel C. G. de Abreu, Rodrigo Labouriau
    Subjects: Applications
    Abstract

    We present a technique to characterize differentially expressed genes in
    terms of their position in a high-dimensional co-expression network. The set-up
    of Gaussian graphical models is used to construct representations of the
    co-expression network in such a way that redundancy and the propagation of
    spurious information along the network are avoided. The proposed inference
    procedure is based on the minimization of the Bayesian Information Criterion
    (BIC) in the class of decomposable graphical models.

  221. Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space.

    Authors: Sofia C. Olhede, Brandon Whitcher
    Subjects: Applications
    Abstract

    High angular resolution diffusion imaging data is the observed characteristic
    function for the local diffusion of water molecules in tissue. This data is
    used to infer structural information in brain imaging. Non-parametric scalar
    measures are proposed to summarize such data, and to locally characterize
    spatial features of the diffusion probability density function (PDF), relying
    on the geometry of the characteristic function. Summary statistics are defined
    so that their distributions are, to first order, both independent of nuisance
    parameters and also analytically tractable.

  222. Backward estimation of stochastic processes with failure events as time origins.

    Authors: Kwun Chuen Gary Chan, Mei-Cheng Wang
    Subjects: Applications
    Abstract

    Stochastic processes often exhibit sudden systematic changes in pattern a
    short time before certain failure events. Examples include increase in medical
    costs before death and decrease in CD4 counts before AIDS diagnosis. To study
    such terminal behavior of stochastic processes, a natural and direct way is to
    align the processes using failure events as time origins.

  223. Sparse logistic principal components analysis for binary data.

    Authors: Jianhua Z. Huang, Seokho Lee, Jianhua Hu
    Subjects: Applications
    Abstract

    We develop a new principal components analysis (PCA) type dimension reduction
    method for binary data. Different from the standard PCA which is defined on the
    observed data, the proposed PCA is defined on the logit transform of the
    success probabilities of the binary observations. Sparsity is introduced to the
    principal component (PC) loading vectors for enhanced interpretability and more
    stable extraction of the principal components. Our sparse PCA is formulated as
    solving an optimization problem with a criterion function motivated from a
    penalized Bernoulli likelihood.

  224. Accounting for choice of measurement scale in extreme value modeling.

    Authors: J. L. Wadsworth, J. A. Tawn, P. Jonathan
    Subjects: Applications
    Abstract

    We investigate the effect that the choice of measurement scale has upon
    inference and extrapolation in extreme value analysis. Separate analyses of
    variables from a single process on scales which are linked by a nonlinear
    transformation may lead to discrepant conclusions concerning the tail behavior
    of the process. We propose the use of a Box--Cox power transformation
    incorporated as part of the inference procedure to account parametrically for
    the uncertainty surrounding the scale of extrapolation.

  225. Bayesian inference for double Pareto lognormal queues.

    Authors: Pepa Ramirez-Cobo, Rosa E. Lillo, Simon Wilson, Michael P. Wiper
    Subjects: Applications
    Abstract

    In this article we describe a method for carrying out Bayesian estimation for
    the double Pareto lognormal (dPlN) distribution which has been proposed as a
    model for heavy-tailed phenomena. We apply our approach to estimate the
    $\mathit{dPlN}/M/1$ and $M/\mathit{dPlN}/1$ queueing systems. These systems
    cannot be analyzed using standard techniques due to the fact that the dPlN
    distribution does not possess a Laplace transform in closed form.

  226. An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data.

    Authors: Paul S. Albert, Joanna H. Shih
    Subjects: Applications
    Abstract

    In many medical studies, patients are followed longitudinally and interest is
    on assessing the relationship between longitudinal measurements and time to an
    event. Recently, various authors have proposed joint modeling approaches for
    longitudinal and time-to-event data for a single longitudinal variable. These
    joint modeling approaches become intractable with even a few longitudinal
    variables. In this paper we propose a regression calibration approach for
    jointly modeling multiple longitudinal measurements and discrete time-to-event
    data.

  227. A smoothing approach for masking spatial data.

    Authors: Thomas A. Louis, Yijie Zhou, Francesca Dominici
    Subjects: Applications
    Abstract

    Individual-level health data are often not publicly available due to
    confidentiality; masked data are released instead. Therefore, it is important
    to evaluate the utility of using the masked data in statistical analyses such
    as regression. In this paper we propose a data masking method which is based on
    spatial smoothing techniques. The proposed method allows for selecting both the
    form and the degree of masking, thus resulting in a large degree of
    flexibility.

  228. Variable selection and regression analysis for graph-structured covariates with an application to genomics.

    Authors: Caiyan Li, Hongzhe Li
    Subjects: Applications
    Abstract

    Graphs and networks are common ways of depicting biological information. In
    biology, many different biological processes are represented by graphs, such as
    regulatory networks, metabolic pathways and protein--protein interaction
    networks. This kind of a priori use of graphs is a useful supplement to the
    standard numerical data such as microarray gene expression data. In this paper
    we consider the problem of regression analysis and variable selection when the
    covariates are linked on a graph.

  229. Optimal designs for random effect models with correlated errors with applications in population pharmacokinetics.

    Authors: Holger Dette, Tim Holland-Letz, Andrey Pepelyshev
    Subjects: Applications
    Abstract

    We consider the problem of constructing optimal designs for population
    pharmacokinetics which use random effect models. It is common practice in the
    design of experiments in such studies to assume uncorrelated errors for each
    subject.

  230. Prediction-based classification for longitudinal biomarkers.

    Authors: Andrea S. Foulkes, Livio Azzoni, Xiaohong Li, Margaret A. Johnson, Colette Smith, Karam Mounzer, Luis J. Montaner
    Subjects: Applications
    Abstract

    Assessment of circulating CD4 count change over time in HIV-infected subjects
    on antiretroviral therapy (ART) is a central component of disease monitoring.
    The increasing number of HIV-infected subjects starting therapy and the limited
    capacity to support CD4 count testing within resource-limited settings have
    fueled interest in identifying correlates of CD4 count change such as total
    lymphocyte count, among others.

  231. Modeling large scale species abundance with latent spatial processes.

    Authors: Alan E. Gelfand, Avishek Chakraborty, Adam M. Wilson, Andrew M. Latimer, John A. Silander Jr
    Subjects: Applications
    Abstract

    Modeling species abundance patterns using local environmental features is an
    important, current problem in ecology. The Cape Floristic Region (CFR) in South
    Africa is a global hot spot of diversity and endemism, and provides a rich
    class of species abundance data for such modeling. Here, we propose a
    multi-stage Bayesian hierarchical model for explaining species abundance over
    this region. Our model is specified at areal level, where the CFR is divided
    into roughly $37{,}000$ one minute grid cells; species abundance is observed at
    some locations within some cells.

  232. Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology.

    Authors: David I. Warton, Leah C. Shepherd
    Subjects: Applications
    Abstract

    Presence-only data, point locations where a species has been recorded as
    being present, are often used in modeling the distribution of a species as a
    function of a set of explanatory variables---whether to map species occurrence,
    to understand its association with the environment, or to predict its response
    to environmental change.

  233. Analysis of spatial distribution of marker expression in cells using boundary distance plots.

    Authors: Kingshuk Roy Choudhury, Limian Zheng, John J. Mackrill
    Subjects: Applications
    Abstract

    Boundary distance (BD) plotting is a technique for making orientation
    invariant comparisons of the spatial distribution of biochemical markers within
    and across cells/nuclei. Marker expression is aggregated over points with the
    same distance from the boundary. We present a suite of tools for improved data
    analysis and statistical inference using BD plotting. BD is computed using the
    Euclidean distance transform after presmoothing and oversampling of nuclear
    boundaries. Marker distribution profiles are averaged using smoothing with
    linearly decreasing bandwidth.

  234. Intrinsic Geometric Analysis of the Network Reliability and Voltage Stability.

    Authors: N. Gupta, B. N. Tiwari, S. Bellucci
    Subjects: Applications
    Abstract

    This paper presents the intrinsic geometric model for the solution of power
    system planning and its operation. This problem is large-scale and nonlinear,
    in general. Thus, we have developed the intrinsic geometric model for the
    network reliability and voltage stability, and examined it for the IEEE 5 bus
    system. The robustness of the proposed model is illustrated by introducing
    variations of the network parameters. Exact analytical results show the
    accuracy as well as the efficiency of the proposed solution technique.

  235. Geometric Design and Stability of Power Networks.

    Authors: Neeraj Gupta, Bhupendra Nath Tiwari, Stefano Bellucci
    Subjects: Applications
    Abstract

    From the perspective of the network theory, the present work illustrates how
    the parametric intrinsic geometric description exhibits an exact set of pair
    correction functions and global correlation volume with and without the
    inclusion of the imaginary power flow. The Gaussian fluctuations about the
    equilibrium basis accomplish a well-defined, non-degenerate, curved regular
    intrinsic Riemannian surfaces for the purely real and the purely imaginary
    power flows and their linear combinations.

  236. Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata.

    Authors: Natalie Shlomo, Chris Skinner
    Subjects: Applications
    Abstract

    Government statistical agencies often apply statistical disclosure limitation
    techniques to survey microdata to protect the confidentiality of respondents.
    There is a need for valid and practical ways to assess the protection provided.
    This paper develops some simple methods for disclosure limitation techniques
    which perturb the values of categorical identifying variables. The methods are
    applied in numerical experiments based upon census data from the United Kingdom
    which are subject to two perturbation techniques: data swapping (random and
    targeted) and the post randomization method.

  237. Topological inference for EEG and MEG.

    Authors: James M. Kilner, Karl J. Friston
    Subjects: Applications
    Abstract

    Neuroimaging produces data that are continuous in one or more dimensions.
    This calls for an inference framework that can handle data that approximate
    functions of space, for example, anatomical images, time--frequency maps and
    distributed source reconstructions of electromagnetic recordings over time.
    Statistical parametric mapping (SPM) is the standard framework for whole-brain
    inference in neuroimaging: SPM uses random field theory to furnish $p$-values
    that are adjusted to control family-wise error or false discovery rates, when
    making topological inferences over large volumes of space.

  238. Detection of radioactive material entering national ports: A Bayesian approach to radiation portal data.

    Authors: Siddhartha R. Dalal, Bing Han
    Subjects: Applications
    Abstract

    Given the potential for illicit nuclear material being used for terrorism,
    most ports now inspect a large number of goods entering national borders for
    radioactive cargo. The U.S. Department of Homeland Security is moving toward
    one hundred percent inspection of all containers entering the U.S. at various
    ports of entry for nuclear material. We propose a Bayesian classification
    approach for the real-time data collected by the inline Polyvinyl Toluene
    radiation portal monitors.

  239. Small area estimation of the homeless in Los Angeles: An application of cost-sensitive stochastic gradient boosting.

    Authors: Brian Kriegler, Richard Berk
    Subjects: Applications
    Abstract

    In many metropolitan areas efforts are made to count the homeless to ensure
    proper provision of social services. Some areas are very large, which makes
    spatial sampling a viable alternative to an enumeration of the entire terrain.
    Counts are observed in sampled regions but must be imputed in unvisited areas.
    Along with the imputation process, the costs of underestimating and
    overestimating may be different. For example, if precise estimation in areas
    with large homeless c ounts is critical, then underestimation should be
    penalized more than overestimation in the loss function.

  240. A general statistical framework for dissecting parent-of-origin effects underlying endosperm traits in flowering plants.

    Authors: Gengxin Li, Yuehua Cui
    Subjects: Applications
    Abstract

    Genomic imprinting has been thought to play an important role in seed
    development in flowering plants. Seed in a flowering plant normally contains
    diploid embryo and triploid endosperm. Empirical studies have shown that some
    economically important endosperm traits are genetically controlled by imprinted
    genes. However, the exact number and location of the imprinted genes are
    largely unknown due to the lack of efficient statistical mapping methods.

  241. Using linear predictors to impute allele frequencies from summary or pooled genotype data.

    Authors: Xiaoquan Wen, Matthew Stephens
    Subjects: Applications
    Abstract

    Recently-developed genotype imputation methods are a powerful tool for
    detecting untyped genetic variants that affect disease susceptibility in
    genetic association studies. However, existing imputation methods require
    individual-level genotype data, whereas, in practice, it is often the case that
    only summary data are available. For example, this may occur because, for
    reasons of privacy or politics, only summary data are made available to the
    research community at large; or because only summary data are collected, as in
    DNA pooling experiments.

  242. Age- and time-varying proportional hazards models for employment discrimination.

    Authors: George Woodworth, Joseph Kadane
    Subjects: Applications
    Abstract

    We use a discrete-time proportional hazards model of time to involuntary
    employment termination. This model enables us to examine both the continuous
    effect of the age of an employee and whether that effect has varied over time,
    generalizing earlier work [Kadane and Woodworth J. Bus. Econom. Statist. 22
    (2004) 182--193]. We model the log hazard surface (over age and time) as a
    thin-plate spline, a Bayesian smoothness-prior implementation of penalized
    likelihood methods of surface-fitting [Wahba (1990) Spline Models for
    Observational Data. SIAM].

  243. A hierarchical Bayesian approach to record linkage and size population problems.

    Authors: Brunero Liseo, Andrea Tancredi
    Subjects: Applications
    Abstract

    We propose and illustrate a hierarchical Bayesian approach for matching
    statistical records observed in different occasions. We show how this model can
    be profitably adopted both in record linkage problems and in capture-recapture
    setups, where the size of a finite population is the real object of interest.
    There are at least two important differences among the proposed model-based
    approach and the current practice in record linkage.

  244. Intervention analysis with state-space models to estimate discontinuities due to a survey redesign.

    Authors: Jan van den Brakel, Joeri Roels
    Subjects: Applications
    Abstract

    An important quality aspect of official statistics produced by national
    statistical institutes is comparability over time. To maintain uninterrupted
    time series, surveys conducted by national statistical institutes are often
    kept unchanged as long as possible. To improve the quality or efficiency of a
    survey process, however, it remains inevitable to adjust methods or redesign
    this process from time to time. Adjustments in the survey process generally
    affect survey characteristics such as response bias and therefore have a
    systematic effect on the parameter estimates of a sample survey.

  245. Exact asymptotic distribution of change-point mle for change in the mean of Gaussian sequences.

    Authors: Stergios B. Fotopoulos, Venkata K. Jandhyala, Elena Khapalova
    Subjects: Applications
    Abstract

    We derive exact computable expressions for the asymptotic distribution of the
    change-point mle when a change in the mean occurred at an unknown point of a
    sequence of time-ordered independent Gaussian random variables. The derivation,
    which assumes that nuisance parameters such as the amount of change and
    variance are known, is based on ladder heights of Gaussian random walks hitting
    the half-line. We then show that the exact distribution easily extends to the
    distribution of the change-point mle when a change occurs in the mean vector of
    a multivariate Gaussian process.

  246. Feature selection guided by structural information.

    Authors: Martin Slawski, Wolfgang zu Castell, Gerhard Tutz
    Subjects: Applications
    Abstract

    In generalized linear regression problems with an abundant number of
    features, lasso-type regularization which imposes an $\ell^1$-constraint on the
    regression coefficients has become a widely established technique. Deficiencies
    of the lasso in certain scenarios, notably strongly correlated design, were
    unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301--320]
    introduced the elastic net. In this paper we propose to extend the elastic net
    by admitting general nonnegative quadratic constraints as a second form of
    regularization.

  247. A geometric interpretation of the permutation $p$-value and its application in eQTL studies.

    Authors: Wei Sun, Fred A. Wright
    Subjects: Applications
    Abstract

    Permutation $p$-values have been widely used to assess the significance of
    linkage or association in genetic studies. However, the application in
    large-scale studies is hindered by a heavy computational burden. We propose a
    geometric interpretation of permutation $p$-values, and based on this geometric
    interpretation, we develop an efficient permutation $p$-value estimation method
    in the context of regression with binary predictors.

  248. DISCO analysis: A nonparametric extension of analysis of variance.

    Authors: G&#xe1;bor J. Sz&#xe9;kely, Maria L. Rizzo
    Subjects: Applications
    Abstract

    In classical analysis of variance, dispersion is measured by considering
    squared distances of sample elements from the sample mean. We consider a
    measure of dispersion for univariate or multivariate response based on all
    pairwise distances between-sample elements, and derive an analogous distance
    components (DISCO) decomposition for powers of distance in $(0,2]$. The ANOVA F
    statistic is obtained when the index (exponent) is 2.

  249. Censored Gamma Regression Models for Limited Dependent Variables with an Application to Loss Given Default.

    Authors: Fabio Sigrist, Werner A. Stahel
    Subjects: Applications
    Abstract

    Regression models for limited continuous dependent variables having a
    non-negligible probability of attaining exactly their limits are presented. The
    models differ in the number of parameters and in their flexibility. It is shown
    how to fit these models and they are applied to a Loss Given Default dataset
    from insurance to which they provide a good fit.

  250. Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing.

    Authors: Sudipto Banerjee, Yufen Zhang, James S. Hodges
    Subjects: Applications
    Abstract

    Rapid developments in geographical information systems (GIS) continue to
    generate interest in analyzing complex spatial datasets. One area of activity
    is in creating smoothed disease maps to describe the geographic variation of
    disease and generate hypotheses for apparent differences in risk. With multiple
    diseases, a multivariate conditionally autoregressive (MCAR) model is often
    used to smooth across space while accounting for associations between the
    diseases. The MCAR, however, imposes complex covariance structures that are
    difficult to interpret and estimate.

  251. Semi-parametric dynamic time series modelling with applications to detecting neural dynamics.

    Authors: Jim Q. Smith, Fabio Rigat
    Subjects: Applications
    Abstract

    This paper illustrates novel methods for nonstationary time series modeling
    along with their applications to selected problems in neuroscience. These
    methods are semi-parametric in that inferences are derived by combining
    sequential Bayesian updating with a non-parametric change-point test. As a test
    statistic, we propose a Kullback--Leibler (KL) divergence between posterior
    distributions arising from different sets of data.

  252. Detecting and handling outlying trajectories in irregularly sampled functional datasets.

    Authors: Daniel Gervini
    Subjects: Applications
    Abstract

    Outlying curves often occur in functional or longitudinal datasets, and can
    be very influential on parameter estimators and very hard to detect visually.
    In this article we introduce estimators of the mean and the principal
    components that are resistant to, and then can be used for detection of,
    outlying sample trajectories.

  253. Bayesian inference and model choice in a hidden stochastic two-compartment model of hematopoietic stem cell fate decisions.

    Authors: Youyi Fong, Peter Guttorp, Janis Abkowitz
    Subjects: Applications
    Abstract

    Despite rapid advances in experimental cell biology, the in vivo behavior of
    hematopoietic stem cells (HSC) cannot be directly observed and measured.
    Previously we modeled feline hematopoiesis using a two-compartment hidden
    Markov process that had birth and emigration events in the first compartment.
    Here we perform Bayesian statistical inference on models which contain two
    additional events in the first compartment in order to determine if HSC fate
    decisions are linked to cell division or occur independently.

  254. Structured variable selection and estimation.

    Authors: Ming Yuan, V. Roshan Joseph, Hui Zou
    Subjects: Applications
    Abstract

    In linear regression problems with related predictors, it is desirable to do
    variable selection and estimation by maintaining the hierarchical or structural
    relationships among predictors. In this paper we propose non-negative garrote
    methods that can naturally incorporate such relationships defined through
    effect heredity principles or marginality principles. We show that the methods
    are very easy to compute and enjoy nice theoretical properties. We also show
    that the methods can be easily extended to deal with more general regression
    problems such as generalized linear models.

  255. Improving the precision of classification trees.

    Authors: Wei-Yin Loh
    Subjects: Applications
    Abstract

    Besides serving as prediction models, classification trees are useful for
    finding important predictor variables and identifying interesting subgroups in
    the data. These functions can be compromised by weak split selection algorithms
    that have variable selection biases or that fail to search beyond local main
    effects at each node of the tree. The resulting models may include many
    irrelevant variables or select too few of the important ones. Either
    eventuality can lead to erroneous conclusions.

  256. Adaptive Density Estimation in the Pile-up Model Involving Measurement Errors.

    Authors: Fabienne Comte, Tabea Rebafka
    Subjects: Applications
    Abstract

    Motivated by fluorescence lifetime measurements this paper considers the
    problem of nonparametric density estimation in the pile-up model. Adaptive
    nonparametric estimators are proposed for the pile-up model in its simple form
    as well as in the case of additional measurement errors. Furthermore, oracle
    type risk bounds for the mean integrated squared error (MISE) are provided.
    Finally, the estimation methods are assessed by a simulation study and the
    application to real fluorescence lifetime data.

  257. Two switching multiple disorder problems for Brownian motions.

    Authors: Pavel V. Gapeev
    Subjects: Applications
    Abstract

    The multiple disorder problem seeks to determine a sequence of stopping times
    which are as close as possible to the unknown times of disorders at which the
    observation process changes its probability characteristics. We derive closed
    form solutions in two formulations of the multiple disorder problem for an
    observable Brownian motion with switching constant drift rates. The method of
    proof is based on the reduction of the initial problems to appropriate optimal
    switching problems and the analysis of the associated coupled free-boundary
    problems.

  258. Introduction to papers on the modeling and analysis of network data.

    Authors: Stephen E. Fienberg
    Subjects: Applications
    Abstract

    Introduction to papers on the modeling and analysis of network data

  259. A nonparametric urn-based approach to interacting failing systems with an application to credit risk modeling.

    Authors: Pasquale Cirillo, J&#xfc;rg H&#xfc;sler, Pietro Muliere
    Subjects: Applications
    Abstract

    In this paper we propose a new nonparametric approach to interacting failing
    systems (FS), that is systems whose probability of failure is not negligible in
    a fixed time horizon, a typical example being firms and financial bonds. The
    main purpose when studying a FS is to calculate the probability of default and
    the distribution of the number of failures that may occur during the
    observation period. A model used to study a failing system is defined default
    model. In particular, we present a general recursive model constructed by the
    means of inter- acting urns.

  260. Introducing the discussion paper by Sz\'{e}kely and Rizzo.

    Authors: Michael A. Newton
    Subjects: Applications
    Abstract

    Introducing the discussion paper by Sz\'{e}kely and Rizzo

  261. Forecasting with Neural Networks: A comparative study using the data of emergency service.

    Authors: Muhammad Noor-Ul-Amin
    Subjects: Applications
    Abstract

    This is a case study discussing the supervised artificial neural network for
    the purpose of forecasting with comparison of the Box-Jenkins methodology by
    using the data of well known emergency service Rescue 1122. We fits a variety
    of neural network (NN) models and many problems were revealed while fitting the
    ANNs model to achieve the local minima. Moreover ANNs model is giving much
    better out of sample forecasts as compare to the ARIMA model. However we use
    diagnostic checks for the comparison of models.

  262. Predicting Inflation: Professional Experts Versus No-Change Forecasts.

    Authors: Tilmann Gneiting, Thordis L. Thorarinsdottir
    Subjects: Applications
    Abstract

    We compare forecasts of United States inflation from the Survey of
    Professional Forecasters (SPF) to predictions made by simple statistical
    techniques. In nowcasting, economic expertise is persuasive. When projecting
    beyond the current quarter, novel yet simplistic probabilistic no-change
    forecasts are equally competitive. We further interpret surveys as ensembles of
    forecasts, and show that they can be used similarly to the ways in which
    ensemble prediction systems have transformed weather forecasting.

  263. Feature selection in omics prediction problems using cat scores and false nondiscovery rate control.

    Authors: Korbinian Strimmer, Miika Ahdesm&#xe4;ki
    Subjects: Applications
    Abstract

    We revisit the problem of feature selection in linear discriminant analysis
    (LDA), that is, when features are correlated. First, we introduce a pooled
    centroids formulation of the multiclass LDA predictor function, in which the
    relative weights of Mahalanobis-transformed predictors are given by
    correlation-adjusted $t$-scores (cat scores).

  264. E-loyalty networks in online auctions.

    Authors: Wolfgang Jank, Inbal Yahav
    Subjects: Applications
    Abstract

    Creating a loyal customer base is one of the most important, and at the same
    time, most difficult tasks a company faces. Creating loyalty online (e-loyalty)
    is especially difficult since customers can ``switch'' to a competitor with the
    click of a mouse. In this paper we investigate e-loyalty in online auctions.
    Using a unique data set of over 30,000 auctions from one of the main
    consumer-to-consumer online auction houses, we propose a novel measure of
    e-loyalty via the associated network of transactions between bidders and
    sellers.

  265. Nonparametric inference procedure for percentiles of the random effects distribution in meta-analysis.

    Authors: Rui Wang, Lu Tian, Tianxi Cai, L. J. Wei
    Subjects: Applications
    Abstract

    To investigate whether treating cancer patients with
    erythropoiesis-stimulating agents (ESAs) would increase the mortality risk,
    Bennett et al. [Journal of the American Medical Association 299 (2008)
    914--924] conducted a meta-analysis with the data from 52 phase III trials
    comparing ESAs with placebo or standard of care. With a standard parametric
    random effects modeling approach, the study concluded that ESA administration
    was significantly associated with increased average mortality risk.

  266. Downscaling extremes: A comparison of extreme value distributions in point-source and gridded precipitation data.

    Authors: Daniel Cooley, Elizabeth C. Mannshardt-Shamseldin, Richard L. Smith, Stephan R. Sain, Linda O. Mearns
    Subjects: Applications
    Abstract

    There is substantial empirical and climatological evidence that precipitation
    extremes have become more extreme during the twentieth century, and that this
    trend is likely to continue as global warming becomes more intense. However,
    understanding these issues is limited by a fundamental issue of spatial
    scaling: most evidence of past trends comes from rain gauge data, whereas
    trends into the future are produced by climate models, which rely on gridded
    aggregates.

  267. A multivariate adaptive stochastic search method for dimensionality reduction in classification.

    Authors: Gareth M. James, Tian Siva Tian, Rand R. Wilcox
    Subjects: Applications
    Abstract

    High-dimensional classification has become an increasingly important problem.
    In this paper we propose a "Multivariate Adaptive Stochastic Search" (MASS)
    approach which first reduces the dimension of the data space and then applies a
    standard classification method to the reduced space. One key advantage of MASS
    is that it automatically adjusts to mimic variable selection type methods, such
    as the Lasso, variable combination methods, such as PCA, or methods that
    combine these two approaches.

  268. Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes.

    Authors: Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, Brian S. Yandell
    Subjects: Applications
    Abstract

    Causal inference approaches in systems genetics exploit quantitative trait
    loci (QTL) genotypes to infer causal relationships among phenotypes. The
    genetic architecture of each phenotype may be complex, and poorly estimated
    genetic architectures may compromise the inference of causal relationships
    among phenotypes. Existing methods assume QTLs are known or inferred without
    regard to the phenotype network structure.

  269. An MDL approach to the climate segmentation problem.

    Authors: QiQi Lu, Robert Lund, Thomas C. M. Lee
    Subjects: Applications
    Abstract

    This paper proposes an information theory approach to estimate the number of
    changepoints and their locations in a climatic time series. A model is
    introduced that has an unknown number of changepoints and allows for series
    autocorrelations, periodic dynamics, and a mean shift at each changepoint time.
    An objective function gauging the number of changepoints and their locations,
    based on a minimum description length (MDL) information criterion, is derived.
    A genetic algorithm is then developed to optimize the objective function.

  270. Sequential Monte Carlo pricing of American-style options under stochastic volatility models.

    Authors: Bhojnarine R. Rambharat, Anthony E. Brockwell
    Subjects: Applications
    Abstract

    We introduce a new method to price American-style options on underlying
    investments governed by stochastic volatility (SV) models. The method does not
    require the volatility process to be observed. Instead, it exploits the fact
    that the optimal decision functions in the corresponding dynamic programming
    problem can be expressed as functions of conditional distributions of
    volatility, given observed data. By constructing statistics summarizing
    information about these conditional distributions, one can obtain high quality
    approximate solutions.

  271. Model misspecification in peaks over threshold analysis.

    Authors: Anthony C. Davison, M&#xe1;ria S&#xfc;veges
    Subjects: Applications
    Abstract

    Classical peaks over threshold analysis is widely used for statistical
    modeling of sample extremes, and can be supplemented by a model for the sizes
    of clusters of exceedances. Under mild conditions a compound Poisson process
    model allows the estimation of the marginal distribution of threshold
    exceedances and of the mean cluster size, but requires the choice of a
    threshold and of a run parameter, $K$, that determines how exceedances are
    declustered.

  272. Modeling social networks from sampled data.

    Authors: Mark S. Handcock, Krista J. Gile
    Subjects: Applications
    Abstract

    Network models are widely used to represent relational information among
    interacting units and the structural implications of these relations. Recently,
    social network studies have focused a great deal of attention on random graph
    models of networks whose nodes represent individual social actors and whose
    edges represent a specified relationship between the actors. Most inference for
    social network models assumes that the presence or absence of all possible
    links is observed, that the information is completely reliable, and that there
    are no measurement (e.g., recording) errors.

  273. Rejoinder: Brownian distance covariance.

    Authors: G&#xe1;bor J. Sz&#xe9;kely, Maria L. Rizzo
    Subjects: Applications
    Abstract

    Rejoinder to "Brownian distance covariance" by G\'abor J. Sz\'ekely and Maria
    L. Rizzo [arXiv:1010.0297]

  274. Discussion of: Brownian distance covariance.

    Authors: Bruno R&#xe9;millard
    Subjects: Applications
    Abstract

    Discussion on "Brownian distance covariance" by G\'abor J. Sz\'ekely and
    Maria L. Rizzo [arXiv:1010.0297]

  275. Discussion of: Brownian distance covariance.

    Authors: Christopher R. Genovese
    Subjects: Applications
    Abstract

    Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and
    Maria L. Rizzo [arXiv:1010.0297]

  276. Discussion of: Brownian distance covariance.

    Authors: Kenji Fukumizu, Bharath K. Sriperumbudur, Arthur Gretton
    Subjects: Applications
    Abstract

    Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely and
    Maria L. Rizzo [arXiv:1010.0297]

  277. Discussion of: Brownian distance covariance.

    Authors: Andrey Feuerverger
    Subjects: Applications
    Abstract

    Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely,
    Maria L. Rizzo [arXiv:1010.0297]

  278. Discussion of: Brownian distance covariance.

    Authors: Leslie Cope
    Subjects: Applications
    Abstract

    Discussion on "Brownian distance covariance" by G\'{a}bor J. Sz\'{e}kely,
    Maria L. Rizzo [arXiv:1010.0297]

  279. Discussion of: Brownian distance covariance.

    Authors: Michael R. Kosorok
    Subjects: Applications
    Abstract

    We discuss briefly the very interesting concept of Brownian distance
    covariance developed by Sz\'{e}kely and Rizzo [Ann. Appl. Statist. (2009), to
    appear] and describe two possible extensions. The first extension is for high
    dimensional data that can be coerced into a Hilbert space, including certain
    high throughput screening and functional data settings. The second extension
    involves very simple modifications that may yield increased power in some
    settings.

  280. Brownian distance covariance.

    Authors: G&#xe1;bor J. Sz&#xe9;kely, Maria L. Rizzo
    Subjects: Applications
    Abstract

    Distance correlation is a new class of multivariate dependence coefficients
    applicable to random vectors of arbitrary and not necessarily equal dimension.
    Distance covariance and distance correlation are analogous to product-moment
    covariance and correlation, but generalize and extend these classical bivariate
    measures of dependence. Distance correlation characterizes independence: it is
    zero if and only if the random vectors are independent.

  281. A model selection approach to genome wide association studies.

    Authors: Malgorzata Bogdan, Florian Frommlet, Felix Ruhaltinger, Piotr Twarog
    Subjects: Applications
    Abstract

    For the vast majority of genome wide association studies (GWAS) published so
    far, statistical analysis was performed by testing markers individually. In
    this article we present some elementary statistical considerations which
    clearly show that in case of complex traits the approach based on multiple
    regression or generalized linear models is preferable to multiple testing. We
    introduce a model selection approach to GWAS based on modifications of Bayesian
    Information Criterion (BIC) and develop some simple search strategies to deal
    with the huge number of potential models.

  282. Nonparametric Bayesian multiple testing for longitudinal performance stratification.

    Authors: James G. Scott
    Subjects: Applications
    Abstract

    This paper describes a framework for flexible multiple hypothesis testing of
    autoregressive time series. The modeling approach is Bayesian, though a blend
    of frequentist and Bayesian reasoning is used to evaluate procedures.
    Nonparametric characterizations of both the null and alternative hypotheses
    will be shown to be the key robustification step necessary to ensure reasonable
    Type-I error performance.

  283. Inference on low-rank data matrices with applications to microarray data.

    Authors: Xuming He, Xingdong Feng
    Subjects: Applications
    Abstract

    Probe-level microarray data are usually stored in matrices, where the row and
    column correspond to array and probe, respectively. Scientists routinely
    summarize each array by a single index as the expression level of each probe
    set (gene). We examine the adequacy of a unidimensional summary for
    characterizing the data matrix of each probe set. To do so, we propose a
    low-rank matrix model for the probe-level intensities, and develop a useful
    framework for testing the adequacy of unidimensionality against targeted
    alternatives.

  284. Approximate null distribution of the largest root in multivariate analysis.

    Authors: Iain M. Johnstone
    Subjects: Applications
    Abstract

    The greatest root distribution occurs everywhere in classical multivariate
    analysis, but even under the null hypothesis the exact distribution has
    required extensive tables or special purpose software. We describe a simple
    approximation, based on the Tracy--Widom distribution, that in many cases can
    be used instead of tables or software, at least for initial screening. The
    quality of approximation is studied, and its use illustrated in a variety of
    setttings.

  285. An outlier map for Support Vector Machine classification.

    Authors: Michiel Debruyne
    Subjects: Applications
    Abstract

    Support Vector Machines are a widely used classification technique. They are
    computationally efficient and provide excellent predictions even for
    high-dimensional data. Moreover, Support Vector Machines are very flexible due
    to the incorporation of kernel functions. The latter allow to model
    nonlinearity, but also to deal with nonnumerical data such as protein strings.
    However, Support Vector Machines can suffer a lot from unclean data containing,
    for example, outliers or mislabeled observations.

  286. Profiling time course expression of virus genes---an illustration of Bayesian inference under shape restrictions.

    Authors: Li-Chu Chien, I-Shou Chang, Shih Sheng Jiang, Pramod K. Gupta, Chi-Chung Wen, Yuh-Jenn Wu, Chao A. Hsiung
    Subjects: Applications
    Abstract

    There have been several studies of the genome-wide temporal transcriptional
    program of viruses, based on microarray experiments, which are generally useful
    in the construction of gene regulation network. It seems that biological
    interpretations in these studies are directly based on the normalized data and
    some crude statistics, which provide rough estimates of limited features of the
    profile and may incur biases. This paper introduces a hierarchical Bayesian
    shape restricted regression method for making inference on the time course
    expression of virus genes.

  287. Hierarchical mixture models for assessing fingerprint individuality.

    Authors: Sarat C. Dass, Mingfei Li
    Subjects: Applications
    Abstract

    The study of fingerprint individuality aims to determine to what extent a
    fingerprint uniquely identifies an individual. Recent court cases have
    highlighted the need for measures of fingerprint individuality when a person is
    identified based on fingerprint evidence. The main challenge in studies of
    fingerprint individuality is to adequately capture the variability of
    fingerprint features in a population. In this paper hierarchical mixture models
    are introduced to infer the extent of individualization.

  288. Use of multiple singular value decompositions to analyze complex intracellular calcium ion signals.

    Authors: Josue G. Martinez, Jianhua Z. Huang, Robert C. Burghardt, Rola Barhoumi, Raymond J. Carroll
    Subjects: Applications
    Abstract

    We compare calcium ion signaling ($\mathrm {Ca}^{2+}$) between two exposures;
    the data are present as movies, or, more prosaically, time series of images.
    This paper describes novel uses of singular value decompositions (SVD) and
    weighted versions of them (WSVD) to extract the signals from such movies, in a
    way that is semi-automatic and tuned closely to the actual data and their many
    complexities. These complexities include the following.

  289. A branching process model for flow cytometry and budding index measurements in cell synchrony experiments.

    Authors: David A. Orlando, Edwin S. Iversen Jr., Alexander J. Hartemink, Steven B. Haase
    Subjects: Applications
    Abstract

    We present a flexible branching process model for cell population dynamics in
    synchrony/time-series experiments used to study important cellular processes.
    Its formulation is constructive, based on an accounting of the unique cohorts
    in the population as they arise and evolve over time, allowing it to be written
    in closed form. The model can attribute effects to subsets of the population,
    providing flexibility not available using the models historically applied to
    these populations.

  290. Discovering influential variables: A method of partitions.

    Authors: Herman Chernoff, Shaw-Hwa Lo, Tian Zheng
    Subjects: Applications
    Abstract

    A trend in all scientific disciplines, based on advances in technology, is
    the increasing availability of high dimensional data in which are buried
    important information. A current urgent challenge to statisticians is to
    develop effective methods of finding the useful information from the vast
    amounts of messy and noisy data available, most of which are noninformative.
    This paper presents a general computer intensive approach, based on a method
    pioneered by Lo and Zheng for detecting which, of many potential explanatory
    variables, have an influence on a dependent variable $Y$.

  291. Assessing uncertainty in the American Indian Trust Fund.

    Authors: Edward Mulrow, Hee-Choon Shin, Fritz Scheuren
    Subjects: Applications
    Abstract

    Fiscal year-end balances of the Individual Indian Money System (a part of the
    Indian Trust) were constructed from data related to money collected in the
    system and disbursed by the system from 1887 to 2007. The data set of fiscal
    year accounting information had a high proportion of missing values, and much
    of the available data did not satisfy basic accounting relationships.

  292. Functional data analytic approach of modeling ECG T-wave shape to measure cardiovascular behavior.

    Authors: Yingchun Zhou, Nell Sedransk
    Subjects: Applications
    Abstract

    The T-wave of an electrocardiogram (ECG) represents the ventricular
    repolarization that is critical in restoration of the heart muscle to a
    pre-contractile state prior to the next beat. Alterations in the T-wave reflect
    various cardiac conditions; and links between abnormal (prolonged) ventricular
    repolarization and malignant arrhythmias have been documented. Cardiac safety
    testing prior to approval of any new drug currently relies on two points of the
    ECG waveform: onset of the Q-wave and termination of the T-wave; and only a few
    beats are measured.

  293. Workload forecasting for a call center: Methodology and a case study.

    Authors: Avishai Mandelbaum, Sivan Aldor-Noiman, Paul D. Feigin
    Subjects: Applications
    Abstract

    Today's call center managers face multiple operational decision-making tasks.
    One of the most common is determining the weekly staffing levels to ensure
    customer satisfaction and meeting their needs while minimizing service costs.
    An initial step for producing the weekly schedule is forecasting the future
    system loads which involves predicting both arrival counts and average service
    times. We introduce an arrival count model which is based on a mixed Poisson
    process approach. The model is applied to data from an Israeli Telecom company
    call center.

  294. Efficient delay-tolerant particle filtering.

    Authors: Boris N. Oreshkin, Mark J. Coates, Xuan Liu
    Subjects: Applications
    Abstract

    This paper proposes a novel framework for delay-tolerant particle filtering
    that is computationally efficient and has limited memory requirements. Within
    this framework the informativeness of a delayed (out-of-sequence) measurement
    (OOSM) is estimated using a lightweight procedure and uninformative
    measurements are immediately discarded.

  295. Using epidemic prevalence data to jointly estimate reproduction and removal.

    Authors: Jan van den Broek, Hiroshi Nishiura
    Subjects: Applications
    Abstract

    This study proposes a nonhomogeneous birth--death model which captures the
    dynamics of a directly transmitted infectious disease. Our model accounts for
    an important aspect of observed epidemic data in which only symptomatic
    infecteds are observed. The nonhomogeneous birth--death process depends on
    survival distributions of reproduction and removal, which jointly yield an
    estimate of the effective reproduction number $R(t)$ as a function of epidemic
    time.

  296. On the Role of Decision Theory in Uncertainty Analysis.

    Authors: Merlin Keller, Alberto Pasanisi, Eric Parent
    Subjects: Applications
    Abstract

    Maximum likelihood estimation (MLE) and heuristic predictive estimation (HPE)
    are two widely used approaches in industrial uncertainty analysis. We review
    them from the point of view of decision theory, using Bayesian inference as a
    gold standard for comparison. The main drawback of MLE is that it may fail to
    properly account for the uncertainty on the physical process generating the
    data, especially when only a small amount of data are available. HPE offers an
    improvement in that it takes this uncertainty into account.

  297. The bullwhip effect under a generalized demand process: an R implementation.

    Authors: Marlene Silva Marchena
    Subjects: Applications
    Abstract

    The measure of the bullwhip effect, a phenomenon in which demand variability
    increases as one moves up the supply chain, is a major issue in Supply Chain
    Management. Although it is simply defined (it is the ratio of the unconditional
    variance of the order process to that of the demand process), explicit formulas
    are difficult to obtain. In this paper we investigate the theoretical and
    practical issues of Zhang (2004b) with the purpose of quantifying the bullwhip
    effect.

  298. Predicting phenological events using event-history analysis.

    Authors: Song Cai, James V. Zidek, Nathaniel Newlands
    Subjects: Applications
    Abstract

    This paper presents an approach to phenology, one based on the use of a
    method developed by the authors for event history data. Of specific interest is
    the prediction of the so-called "bloom--date" of fruit trees in the agriculture
    industry and it is this application which we consider, although the method is
    much more broadly applicable. Our approach provides sensible estimate for a
    parameter that interests phenologists -- Tbase, the thresholding parameter in
    the definition of the growing degree days (GDD).

  299. Full Open Population Capture-Recapture Models with Individual Covariates.

    Authors: Matthew R. Schofield, Richard J. Barker
    Subjects: Applications
    Abstract

    Traditional analyses of capture-recapture data are based on likelihood
    functions that explicitly integrate out all missing data. We use a complete
    data likelihood (CDL) to show how a wide range of capture-recapture models can
    be easily fitted using readily available software JAGS/BUGS even when there are
    individual-specific time-varying covariates. The models we describe extend
    those that condition on first capture to include abundance parameters, or
    parameters related to abundance, such as population size, birth rates or
    lifetime.

  300. Data Augmentation and Reversible Jump MCMC for Multinomial Index Problems.

    Authors: Matthew R. Schofield, Richard J. Barker
    Subjects: Applications
    Abstract

    A feature of multinomial models with unknown index $N$ is that the dimension
    of the parameter space potentially depends on $N$, a complication when fitting
    models by Markov chain Monte Carlo (MCMC). Two commonly used approaches to this
    problem are: (i) trans-dimensional reversible jump MCMC and (ii)
    superpopulation data augmentation. A distinguishing feature of the two
    approaches is that $N$, and combinatorial terms involving $N$, are not explicit
    in the superpopulation likelihood. To resolve ambiguity about the relationship
    between the two approaches we compare them analytically.

  301. The "Unfriending" Problem: The Consequences of Homophily in Friendship Retention for Causal Estimates of Social Influence.

    Authors: Hans Noel, Brendan Nyhan
    Subjects: Applications
    Abstract

    An increasing number of scholars are using longitudinal social network data
    to try to obtain estimates of peer or social influence effects. These data may
    provide additional statistical leverage, but they can introduce new inferential
    problems. In particular, while the confounding effects of homophily in
    friendship formation are widely appreciated, homophily in friendship retention
    may also confound causal estimates of social influence in longitudinal network
    data.

  302. The 155-day periodicity of the sunspot area fluctuations in the solar cycle 16 is an alias.

    Authors: Ryszarda Getko
    Subjects: Applications
    Abstract

    The short-term periodicities of the daily sunspot area fluctuations from
    August 1923 to October 1933 are discussed. For these data the correlative
    analysis indicates negative correlation for the periodicity of about 155 days,
    but the power spectrum analysis indicates a statistically significant peak in
    this time interval. A new method of the diagnosis of an echo-effect in spectrum
    is proposed and it is stated that the 155-day periodicity is a harmonic of the
    periodicities from the interval of [400,500] days.

  303. User Interest and Interaction Structure in Online Forums.

    Authors: Stephen E. Fienberg, Daniel Percival, Di Liu
    Subjects: Applications
    Abstract

    We present a new similarity measure tailored to posts in an online forum. Our
    measure takes into account all the available information about user interest
    and interaction --- the content of posts, the threads in the forum, and the
    author of the posts. We use this post similarity to build a similarity between
    users, based on principal coordinate analysis. This allows easy visualization
    of the user activity as well. Similarity between users has numerous
    applications, such as clustering or classification.

  304. Incompatibility of trends in multi-year estimates from the American Community Survey.

    Authors: Tucker McElroy
    Subjects: Applications
    Abstract

    The American Community Survey (ACS) provides one-year (1y), three-year (3y)
    and five-year (5y) multi-year estimates (MYEs) of various demographic and
    economic variables for each "community", although the 1y and 3y may not be
    available for communities with a small population. These survey estimates are
    not truly measuring the same quantities, since they each cover different time
    spans. Using some simplistic models, we demonstrate that comparing different
    period-length MYEs results in spurious conclusions about trend movements.

  305. Elementary Statistics on Trial (the case of Lucia de Berk).

    Authors: Piet Groeneboom, Richard D. Gill, Peter de Jong
    Subjects: Applications
    Abstract

    In the conviction of Lucia de Berk an important role was played by a simple
    hypergeometric model, used by the expert consulted by the court, which produced
    very small probabilities of occurrences of certain numbers of incidents. We
    want to draw attention to the fact that, if we take into account the variation
    among nurses in incidents they experience during their shifts, these
    probabilities can become considerably larger. This points to the danger of
    using an oversimplified discrete probability model in these circumstances.

  306. A Novel Chronic Disease Policy Model.

    Authors: Nathan Green, Duncan Smith, Matthew Sperrin, Iain Buchan
    Subjects: Applications
    Abstract

    We develop a simulation tool to support policy-decisions about healthcare for
    chronic diseases in defined populations. Incident disease-cases are generated
    in-silico from an age-sex characterised general population using standard
    epidemiological approaches. A novel disease-treatment model then simulates
    continuous life courses for each patient using discrete event simulation.
    Ideally, the discrete event simulation model would be inferred from complete
    longitudinal healthcare data via a likelihood or Bayesian approach.

  307. Detection of brain functional-connectivity difference in post-stroke patients using group-level covariance modeling.

    Authors: Ga&#xeb;l Varoquaux, Bertrand Thirion, Flore Baronnet, Andreas Kleinschmidt, Pierre Fillard
    Subjects: Applications
    Abstract

    Functional brain connectivity, as revealed through distant correlations in
    the signals measured by functional Magnetic Resonance Imaging (fMRI), is a
    promising source of biomarkers of brain pathologies. However, establishing and
    using diagnostic markers requires probabilistic inter-subject comparisons.
    Principled comparison of functional-connectivity structures is still a
    challenging issue. We give a new matrix-variate probabilistic model suitable
    for inter-subject comparison of functional connectivity matrices on the
    manifold of Symmetric Positive Definite (SPD) matrices.

  308. An investigation of the discriminant power and dimensionality of items used for assessing health condition of elderly people.

    Authors: Francesco Bartolucci, Giorgio d&#x27;Agostino, Giorgio E. Montanari
    Subjects: Applications
    Abstract

    With reference to the questionnaire adopted within the Italian project
    "Ulisse" to assess health condition of elderly people, we investigate two
    important issues: discriminant power and actual number of dimensions measured
    by the items composing the questionnaire. The adopted statistical approach is
    based on the joint use of the latent class model and a multidimensional item
    response theory model based on the 2PL parametrization. The latter allows us to
    account for the different discriminant power of these items.

  309. Inference and Optimal Design for Nearest-Neighbour Interaction Models.

    Authors: Andrei Iu. Bejan, Gavin J. Gibson, Stan Zachary
    Subjects: Applications
    Abstract

    We consider problems of Bayesian inference for a spatial epidemic on a graph,
    where the final state of the epidemic corresponds to bond percolation, and
    where only the set or number of finally infected sites is observed. We develop
    appropriate Markov chain Monte Carlo algorithms, demonstrating their
    effectiveness, and we study problems of optimal experimental design. In
    particular, we demonstrate that for lattice-based processes an experiment on a
    sparsified lattice can yield more information on model parameters than one
    conducted on a complete lattice.

  310. Universal effect of preferential selection on consensus in opinion dynamics.

    Authors: Bing-Hong Wang, Han-Xin Yang, Wen-Xu Wang, Ying-Cheng Lai
    Subjects: Applications
    Abstract

    We investigate the opinion dynamics by extending the majority rule model to a
    preferential selection model, in which agents choose opinions with some
    probability rather than absolutely follow the majority. In the model, agent $i$
    agrees with one of binary opinions with the probability that is a power
    function of the number of agents holding this opinion among agent $i$ and its
    nearest neighbors, where an adjustable parameter $\alpha$ controls the degree
    of preferential selection. We find that global consensus is unable to be
    reached if $\alpha<1$.

  311. On Estimating the Ability of NBA Players.

    Authors: Paul Fearnhead, Benjamin M. Taylor
    Subjects: Applications
    Abstract

    This paper introduces a new model and methodology for estimating the ability
    of NBA players. The main idea is to directly measure how good a player is by
    comparing how their team performs when they are on the court as opposed to when
    they are off it. This is achieved in a such a way as to control for the
    changing abilities of the other players on court at different times during a
    match.

  312. Bayesian inference for exponential random graph models.

    Authors: Nial Friel, Alberto Caimo
    Subjects: Applications
    Abstract

    Exponential random graph models are extremely difficult models to handle from
    a statistical viewpoint, since their normalising constant, which depends on
    model parameters, is available only in very trivial cases. We show how
    inference can be carried out in a Bayesian framework using a MCMC algorithm,
    which circumvents the need to calculate the normalising constants. We use a
    population MCMC approach which accelerates convergence and improves mixing of
    the Markov chain.

  313. Bayesian Post-Processing Methods for Jitter Mitigation in Sampling.

    Authors: Vivek K Goyal, Daniel S. Weller
    Subjects: Applications
    Abstract

    Minimum mean squared error (MMSE) estimators of signals from samples
    corrupted by jitter (timing noise) and additive noise are nonlinear, even when
    the signal prior and additive noise have normal distributions. This paper
    develops stochastic algorithms based on Gibbs sampling and slice sampling to
    approximate optimal MMSE estimators in this Bayesian formulation. Simulations
    demonstrate that these nonlinear algorithms can improve significantly upon the
    linear MMSE estimator.

  314. On the Estimation of Nonrandom Signal Coefficients from Jittered Samples.

    Authors: Vivek K Goyal, Daniel S. Weller
    Subjects: Applications
    Abstract

    This paper examines the problem of estimating the parameters of a bandlimited
    signal from samples corrupted by random jitter (timing noise) and additive iid
    Gaussian noise, where the signal lies in the span of a finite basis. For the
    presented classical estimation problem, the Cramer-Rao lower bound (CRB) is
    computed, and an Expectation-Maximization (EM) algorithm approximating the
    maximum likelihood (ML) estimator is developed. Simulations are performed to
    study the convergence properties of the EM algorithm and compare the
    performance both against the CRB and a basic linear estimator.

  315. Bayesian Segmentation of Oceanic SAR Images: Application to Oil Spill Detection.

    Authors: Jos&#xe9; M. Bioucas-Dias, S&#xf3;nia Pelizzari
    Subjects: Applications
    Abstract

    This paper introduces Bayesian supervised and unsupervised segmentation
    algorithms aimed at oceanic segmentation of SAR images. The data term,
    \emph{i.e}., the density of the observed backscattered signal given the region,
    is modeled by a finite mixture of Gamma densities with a given predefined
    number of components. To estimate the parameters of the class conditional
    densities, a new expectation maximization algorithm was developed. The prior is
    a multi-level logistic Markov random field enforcing local continuity in a
    statistical sense.

  316. Bayesian Symbol Detection in Wireless Relay Networks via Likelihood-Free Inference.

    Authors: Jinhong Yuan, Ido Nevat, Gareth W.Peters, Scott A. Sisson, Yanan Fan
    Subjects: Applications
    Abstract

    This paper presents a general stochastic model developed for a class of
    cooperative wireless relay networks, in which imperfect knowledge of the
    channel state information at the destination node is assumed. The framework
    incorporates multiple relay nodes operating under general known non-linear
    processing functions. When a non-linear relay function is considered, the
    likelihood function is generally intractable resulting in the maximum
    likelihood and the maximum a posteriori detectors not admitting closed form
    solutions.

  317. Development and Validation of a Teaching Practice Scale (TISS) for Instructors of Introductory Statistics at the College Level.

    Authors: Rossi A. Hassad
    Subjects: Applications
    Abstract

    This study examined the teaching practices of 227 college instructors of
    introductory statistics (from the health and behavioral sciences). Using
    primarily multidimensional scaling (MDS) techniques, a two-dimensional, 10-item
    teaching practice scale, TISS (Teaching of Introductory Statistics Scale), was
    developed and validated. The two dimensions (subscales) were characterized as
    constructivist, and behaviorist, and are orthogonal to each other.

  318. Detecting local network motifs.

    Authors: Etienne Birmele
    Subjects: Applications
    Abstract

    Studying the topology of so-called real networks, that is networks obtained
    from sociological or biological data for instance, has become a major field of
    interest in the last decade. One way to deal with it is to consider that
    networks are built from small functional units called motifs, which can be
    found by looking for small subgraphs whose numbers of occurrences in the whole
    network are surprisingly high. In this article, we propose to define motifs
    through a local overrepresentation in the network and develop a statistic to
    detect them without relying on simulations.

  319. On building and fitting a spatio-temporal change-point model for settlement and growth at Bourewa, Fiji Islands.

    Authors: Geoff K. Nicholls, Patrick D. Nunn
    Subjects: Applications
    Abstract

    The Bourewa beach site on the Rove Peninsula of Viti Levu is the earliest
    known human settlement in the Fiji Islands. How did the settlement at Bourewa
    develop in space and time? We have radiocarbon dates on sixty specimens, found
    in association with evidence for human presence, taken from pits across the
    site. Owing to the lack of diagnostic stratigraphy, there is no direct
    archaeological evidence for distinct phases of occupation through the period of
    interest.

  320. Validation of credit default probabilities via multiple testing procedures.

    Authors: Sebastian D&#xf6;hler
    Subjects: Applications
    Abstract

    We apply multiple testing procedures to the validation of estimated default
    probabilities in credit rating systems. The goal is to identify rating classes
    for which the probability of default is estimated inaccurately, while still
    maintaining a predefined level of committing type I errors as measured by the
    familywise error rate (FWER) and the false discovery rate (FDR). For FWER, we
    also consider procedures that take possible discreteness of the data resp. test
    statistics into account.

  321. Detecting epistasis via Markov bases.

    Authors: Anna-Sapfo Malaspinas, Caroline Uhler
    Subjects: Applications
    Abstract

    Rapid research progress in genotyping techniques have allowed large
    genome-wide association studies. Existing methods often focus on determining
    associations between single loci and a specific phenotype. However, a
    particular phenotype is usually the result of complex relationships between
    multiple loci and the environment. In this paper, we describe a two-stage
    method for detecting epistasis by combining the traditionally used single-locus
    search with a search for multiway interactions. Our method is based on an
    extended version of Fisher's exact test.

  322. Large gaps imputation in remote sensed imagery of the environment.

    Authors: Valeria Rulloni, Oscar Bustos, Ana Georgina Flesia
    Subjects: Applications
    Abstract

    Imputation of missing data in large regions of satellite imagery is necessary
    when the acquired image has been damaged by shadows due to clouds, or
    information gaps produced by sensor failure.

  323. Using Integrated Nested Laplace Approximation for Modeling Spatial Healthcare Utilization.

    Authors: Erik A. Sauleau, Valentina Mameli, Monica Musio
    Subjects: Applications
    Abstract

    In recent years, spatial and spatio-temporal modeling have become an
    important area of research in many fields (epidemiology, environmental studies,
    disease mapping). In this work we propose different spatial models to study
    hospital recruitment, including some potentially explicative variables.
    Interest is on the distribution per geographical unit of the ratio between the
    number of patients living in this geographical unit and the population in the
    same unit. Models considered are within the framework of Bayesian Latent
    Gaussian models.

  324. ICA-based sparse feature recovery from fMRI datasets.

    Authors: Philippe Ciuciu, Ga&#xeb;l Varoquaux, Jean Baptiste Poline, Bertrand Thirion, Merlin Keller
    Subjects: Applications
    Abstract

    Spatial Independent Components Analysis (ICA) is increasingly used in the
    context of functional Magnetic Resonance Imaging (fMRI) to study cognition and
    brain pathologies. Salient features present in some of the extracted
    Independent Components (ICs) can be interpreted as brain networks, but the
    segmentation of the corresponding regions from ICs is still ill-controlled.
    Here we propose a new ICA-based procedure for extraction of sparse features
    from fMRI datasets. Specifically, we introduce a new thresholding procedure
    that controls the deviation from isotropy in the ICA mixing model.

  325. A group model for stable multi-subject ICA on fMRI datasets.

    Authors: G. Varoquaux, S. Sadaghiani, P. Pinel, A. Kleinschmidt, J. B. Poline, B. Thirion
    Subjects: Applications
    Abstract

    Spatial Independent Component Analysis (ICA) is an increasingly used
    data-driven method to analyze functional Magnetic Resonance Imaging (fMRI)
    data. To date, it has been used to extract sets of mutually correlated brain
    regions without prior information on the time course of these regions. Some of
    these sets of regions, interpreted as functional networks, have recently been
    used to provide markers of brain diseases and open the road to paradigm-free
    population comparisons.

  326. Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.

    Authors: Susan Holmes, John Chakerian
    Subjects: Applications
    Abstract

    Inferential summaries of tree estimates are useful in the setting of
    evolutionary biology, where phylogenetic trees have been built from DNA data
    since the 1960's. In bioinformatics, psychometrics and data mining,
    hierarchical clustering techniques output the same mathematical objects, and
    practitioners have similar questions about the stability and `generalizability'
    of these summaries.

  327. Development of the signals complex elaboration system (application for analysis of occupational injuries process).

    Authors: Boris A. Zyryanov
    Subjects: Applications
    Abstract

    The theoretical base of the research of occupational injuries is the idea of
    the process as Markov chain of random variables. However the exact proof of
    this position was not carried out whereas the experimental passing of the
    hypothesis is connected always with the determined confidence limits and
    consequently it gives the space for alternative assumptions. In this research
    some databases of occupational injuries had been studied using spectral
    analysis techniques and the presentation of the occupational injuries as the
    temporal sequence of the cases ("telegraph wave" process type).

  328. Error Analysis of Approximated PCRLBs for Nonlinear Dynamics.

    Authors: Pierre Del Moral, Ming Lei, Christophe Baehr
    Subjects: Applications
    Abstract

    In practical nonlinear filtering, the assessment of achievable filtering
    performance is important. In this paper, we focus on the problem of efficiently
    approximate the posterior Cramer-Rao lower bound (CRLB) in a recursive manner.
    By using Gaussian assumptions, two types of approximations for calculating the
    CRLB are proposed: An exact model using the state estimate as well as a
    Taylor-series-expanded model using both of the state estimate and its error
    covariance, are derived. Moreover, the difference between the two approximated
    CRLBs is also formulated analytically.

  329. Network-wide Statistical Modeling and Prediction of Computer Traffic.

    Authors: Stilian A. Stoev, George Michailidis, Joel Vaughan
    Subjects: Applications
    Abstract

    In order to maintain consistent quality of service, computer network
    engineers face the task of monitoring the traffic fluctuations on the
    individual links making up the network.

  330. Profile Likelihood Intervals for Quantiles in Extreme Value Distributions.

    Authors: A. Bol&#xed;var, E. D&#xed;az-Franc&#xe9;s, J. Ortega, E. Vilchis
    Subjects: Applications
    Abstract

    Profile likelihood intervals of large quantiles in Extreme Value
    distributions provide a good way to estimate these parameters of interest since
    they take into account the asymmetry of the likelihood surface in the case of
    small and moderate sample sizes; however they are seldom used in practice. In
    contrast, maximum likelihood asymptotic (mla) intervals are commonly used
    without respect to sample size.

  331. Selection of a Model of Cerebral Activity for fMRI Group Data Analysis.

    Authors: Merlin Keller, Alexis Roche, Marc Lavielle
    Subjects: Applications
    Abstract

    This thesis is dedicated to the statistical analysis of multi-sub ject fMRI
    data, with the purpose of identifying bain structures involved in certain
    cognitive or sensori-motor tasks, in a reproducible way across sub jects. To
    overcome certain limitations of standard voxel-based testing methods, as
    implemented in the Statistical Parametric Mapping (SPM) software, we introduce
    a Bayesian model selection approach to this problem, meaning that the most
    probable model of cerebral activity given the data is selected from a
    pre-defined collection of possible models.

  332. The construction of variance estimators for particulate material sampling.

    Authors: B. Geelhoed
    Subjects: Applications
    Abstract

    The variance of the concentration in a sample can be estimated using
    knowledge of the particle masses, concentrations and the parameter for the
    dependent selection of particles. A number of variance estimators are
    constructed including a class of hybrid estimators.

  333. HIV with contact-tracing: a case study in Approximate Bayesian Computation.

    Authors: Viet Chi Tran, Michael G.B. Blum
    Subjects: Applications
    Abstract

    Missing data is a recurrent issue in epidemiology where the infection process
    may be partially observed. Approximate Bayesian Computation, an alternative to
    data imputation methods such as Markov Chain Monte Carlo integration, is
    proposed for making inference in epidemiological models. It is a
    likelihood-free method that relies exclusively on numerical simulations. ABC
    consists in computing a distance between simulated and observed summary
    statistics and weighting the simulations according to this distance.

  334. The Best Linear Unbiased Estimator for Continuation of a Function.

    Authors: Ya&#x27;Acov Ritov, Yair Goldberg, Avishai Mandelbaum
    Subjects: Applications
    Abstract

    We show how to construct the best linear unbiased predictor (BLUP) for the
    continuation of a curve in a spline-function model. We assume that the entire
    curve is drawn from some smooth random process and that the curve is given up
    to some cut point. We demonstrate how to compute the BLUP efficiently.
    Confidence bands for the BLUP are discussed. Finally, we apply the proposed
    BLUP to real-world call center data. Specifically, we forecast the continuation
    of both the call arrival counts and the workload process at the call center of
    a commercial bank.

  335. Extracting abundance indices from longline surveys : method to account for hook competition and unbaited hooks.

    Authors: Marie-Pierre Etienne, Shannon Obradovich, Lynne Yamanaka, Murdoch Mcallister
    Subjects: Applications
    Abstract

    The most commonly used relative abundance index in stock assessments of
    longline fisheries is catch per unit effort (CPUE), here defined as the number
    of fish of the targeted species caught per hook and minute of soak time.
    Longline CPUE can be affected by interspecific competition and the retrieval of
    unbaited or empty hooks, and interannual variation in these can lead to biases
    in the apparent abundance trends in the CPUE. Interspecific competition on
    longlines has been previously studied but the return of empty hooks is ignored
    in all current treatments of longline CPUE.

  336. Spatial clustering of array CGH features in combination with hierarchical multiple testing.

    Authors: Etienne Roquain, Kyung In Kim, Mark Van De Wiel
    Subjects: Applications
    Abstract

    We propose a new approach for clustering DNA features using array CGH data
    from multiple tumor samples. We distinguish data-collapsing: joining contiguous
    DNA clones or probes with extremely similar data into regions, from clustering:
    joining contiguous, correlated regions based on a maximum likelihood principle.
    The model-based clustering algorithm accounts for the apparent spatial patterns
    in the data. We evaluate the randomness of the clustering result by a cluster
    stability score in combination with cross-validation.

  337. Evidence and Evolution: A Review.

    Authors: Christian P. Robert
    Subjects: Applications
    Abstract

    "Evidence and Evolution: the Logic behind the Science" was published in 2008
    by Elliott Sober. It examines the philosophical foundations of the statistical
    arguments used to evaluate hypotheses in evolutionary biology, based on simple
    examples and likelihood ratios. The difficulty with reading the book from a
    statistician's perspective is the reluctance of the author to engage into model
    building and even less into parameter estimation.

  338. Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection.

    Authors: Jianqing Fan, Yingying Li, Ke Yu
    Subjects: Applications
    Abstract

    Portfolio allocation with gross-exposure constraint is an effective method to
    increase the efficiency and stability of selected portfolios among a vast pool
    of assets, as demonstrated in Fan et al (2008). The required high-dimensional
    volatility matrix can be estimated by using high frequency financial data. This
    enables us to better adapt to the local volatilities and local correlations
    among vast number of assets and to increase significantly the sample size for
    estimating the volatility matrix.

  339. Homophily and Contagion Are Generically Confounded in Observational Social Network Studies.

    Authors: Cosma Rohilla Shalizi, Andrew C. Thomas
    Subjects: Applications
    Abstract

    We consider processes on social networks that can potentially involve three
    phenomena: homophily, or the formation of social ties due to matching
    individual traits; social contagion, also known as social influence; and the
    causal effect of an individual's covariates on their behavior or other
    measurable responses. We show that, generically, all of these are confounded
    with each other. Distinguishing them from one another requires strong
    assumptions on the parametrization of the social process or on the adequacy of
    the covariates used (or both).

  340. Positive and Negative Affect Balance Trajectories in the Treatment of Depression.

    Authors: Jonathan Touboul, Robert M. Schwartz
    Subjects: Applications
    Abstract

    To account for the complex interplay between positive and negative dimensions
    of experience in a well-defined framework, psychology needs theoretical models
    associated with mathematical tools that integrate both dimensions. To this end,
    we drew upon the Balanced States of Mind Model, an information-processing model
    that relates quantitatively precise emotional balances of positive and negative
    affects to psychopathology and optimal functioning.

  341. A bivariate space-time downscaler under space and time misalignment.

    Authors: Alan E. Gelfand, Veronica J. Berrocal, David M. Holland
    Subjects: Applications
    Abstract

    Ozone and particulate matter PM2.5 are co-pollutants that have long been
    associated with increased public health risks. Information on concentration
    levels for both pollutants come from two sources: monitoring sites and output
    from complex numerical models that produce concentration surfaces over large
    spatial regions.

  342. Strategic Random Networks: Why Social Networking Technology Matters.

    Authors: Benjamin Golub, Yair Livne
    Subjects: Applications
    Abstract

    This paper develops strategic foundations for an important statistical model
    of random networks with heterogeneous expected degrees. Based on this, we show
    how social networking services that subtly alter the costs and indirect
    benefits of relationships can cause large changes in behavior and welfare. In
    the model, agents who value friends and friends of friends choose how much to
    socialize, which increases the probabilities of links but is costly.

  343. Accuracy and Decision Time for a Class of Sequential Decision Aggregation Rule.

    Authors: Francesco Bullo, Sandra H. Dandach, Ruggero Carli
    Subjects: Applications
    Abstract

    This work focuses on decentralized decision making in a population of
    individuals each implementing the sequential probability ratio test. The
    individual decisions are combined into a decentralized decision via an
    aggregation rule chosen from a family of aggregation rules, denoted as q out of
    N rule. We study how the population size affects the performance of the
    decentralized decision making, i.e., the decision accuracy and time. In a group
    applying the q out of N, a global decision is reached as soon as q out of the N
    decision makers agree on an answer.

  344. The use of statistical methods in management research: a critique and some suggestions based on a case study.

    Authors: Michael Wood
    Subjects: Applications
    Abstract

    I discuss the statistical methods used in a paper in a respected management
    journal, in order to present a critique of how statistics is typically used in
    this type of research. Three themes emerge. The value of any statistical
    approach is limited by various factors, especially the restricted nature of the
    population sampled.

  345. Statistical File Matching of Flow Cytometry Data.

    Authors: Clayton Scott, Gyemin Lee, William Finn
    Subjects: Applications
    Abstract

    Flow cytometry is a technology that rapidly measures antigen-based markers
    associated to cells in a cell population. Although analysis of flow cytometry
    data has traditionally considered one or two markers at a time, there has been
    increasing interest in multidimensional analysis. However, flow cytometers are
    limited in the number of markers they can jointly observe, which is typically a
    fraction of the number of markers of interest. For this reason, practitioners
    often perform multiple assays based on different, overlapping combinations of
    markers.

  346. An Extreme Value Theory approach for the early detection of time clusters with application to the surveillance of Salmonella.

    Authors: Armelle Guillou, Marie Kratz, Yann Le Strat
    Subjects: Applications
    Abstract

    We propose a method to generate a warning system for the early detection of
    time clusters applied to public health surveillance data. This new method
    relies on the evaluation of a return period associated to any new count of a
    particular infection reported to a surveillance system. The method is applied
    to Salmonella surveillance in France and compared to the model developed by
    Farrington et al.

  347. Axiomatic Quantification of Co-authors' Relative Contributions.

    Authors: Ge Wang, Jiansheng Yang
    Subjects: Applications
    Abstract

    Over the past decades, the competition for academic resources has gradually
    intensified, and worsened with the current financial crisis. To optimize the
    resource allocation, individualized assessment of research results is being
    actively studied but the current indices, such as the number of papers, the
    number of citations, the h-factor and its variants have limitations, especially
    their inability of determining co-authors' credit shares fairly.

  348. Semiparametric curve alignment and shift density estimation for biological data.

    Authors: T. Trigano, U. Isserles, Y. Ritov
    Subjects: Applications
    Abstract

    Assume that we observe a large number of curves, all of them with identical,
    although unknown, shape, but with a different random shift. The objective is to
    estimate the individual time shifts and their distribution. Such an objective
    appears in several biological applications like neuroscience or ECG signal
    processing, in which the estimation of the distribution of the elapsed time
    between repetitive pulses with a possibly low signal-noise ratio, and without a
    knowledge of the pulse shape is of interest.

  349. Approaches for Multi-step Density Forecasts with Application to Aggregated Wind Power.

    Authors: Ada Lau, Patrick McSharry
    Subjects: Applications
    Abstract

    The generation of multi-step density forecasts for non-Gaussian data mostly
    relies on Monte-Carlo simulations which are computationally intensive. Using
    aggregated wind power in Ireland, we study two approaches of multi-step density
    forecasts which can be obtained from simple iterations so that intensive
    computations are avoided. In the first approach, we apply a logistic
    transformation to normalize the data approximately and describe the transformed
    data using ARIMA-GARCH models so that multi-step forecasts can be iterated
    easily.

  350. Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns.

    Authors: Richard D. Gill, Niels Keiding
    Subjects: Applications
    Abstract

    Nonparametric estimation of the gap time distribution in a simple renewal
    process may be considered a problem in survival analysis under particular
    sampling frames corresponding to how the renewal process is observed. This note
    describes several such situations where simple product limit estimators, though
    inefficient, may still be useful.

  351. Non-Central Limit Theorem Statistical Analysis for the "Long-tailed" Internet Society.

    Authors: Kazutaka Kurihara, Yohei Tutiya
    Subjects: Applications
    Abstract

    This article presents a statistical analysis method and introduces the
    corresponding software package "tailstat," which is believed to be widely
    applicable to today's internet society. The proposed method facilitates
    statistical analyses with small sample sets from given populations, which
    render the central limit theorem inapplicable. A large-scale case study
    demonstrates the effectiveness of the method and provides implications for
    applying similar analyses to other cases.

  352. The Three Doors Problem...-s.

    Authors: Richard D. Gill
    Subjects: Applications
    Abstract

    I argue that we must distinguish between:

    (0) the Three-Doors-Problem Problem [sic], which is to make sense of some
    real world question of a real person.

    (1) a large number of solutions to this meta-problem, i.e., many specific
    Three-Doors-Problem problems, which are competing mathematizations of the
    meta-problem (0).

  353. Sparse Regression Learning by Aggregation and Langevin Monte-Carlo.

    Authors: Alexandre B. Tsybakov, Arnak Dalalyan
    Subjects: Applications
    Abstract

    We consider the problem of regression learning for deterministic design and
    independent random errors. We start by proving a sharp PAC-Bayesian type bound
    for the exponentially weighted aggregate (EWA) under the expected squared
    empirical loss. For a broad class of noise distributions the presented bound is
    valid whenever the temperature parameter $\beta$ of the EWA is larger than or
    equal to $4\sigma^2$, where $\sigma^2$ is the noise variance.

  354. Risk Quantification Associated with Wind Energy Intermittency in California.

    Authors: Sam O. George, H. Bola George, Scott V. Nguyen
    Subjects: Applications
    Abstract

    As compared to load demand, frequent wind energy intermittencies produce
    large short-term (sub 1-hr to 3-hr) deficits (and surpluses) in the energy
    supply. These intermittent deficits pose systemic and structural risks that
    will likely lead to energy deficits that have significant reliability
    implications for energy system operators and consumers. This work provides a
    toolset to help policy makers quantify these first-order risks. The thinking
    methodology / framework shows that increasing wind energy penetration
    significantly increases the risk of loss in California.

  355. Compressed Sensing for Sparse Underwater Channel Estimation: Some Practical Considerations.

    Authors: Sushil Subramanian
    Subjects: Applications
    Abstract

    We examine the use of a structured thresholding algorithm for sparse
    underwater channel estimation using compressed sensing. This method shows some
    improvements over standard algorithms for sparse channel estimation such as
    matching pursuit, iterative detection and least squares.

  356. The Sensitivity of Respondent-driven Sampling Method.

    Authors: Tom Britton, Xin Lu, Linus Bengtsson, Martin Camitz, Beom Jun Kim, Anna Thorson, Fredrik Liljeros
    Subjects: Applications
    Abstract

    Researchers in many scientific fields make inferences from individuals to
    larger groups. For many groups however, there is no list of members from which
    to take a random sample. Respondent-driven sampling (RDS) is a relatively new
    sampling methodology that circumvents this difficulty by using the social
    networks of the groups under study. The RDS method has been shown to provide
    unbiased estimates of population proportions given certain conditions. The
    method is now widely used in the study of HIV-related high-risk populations
    globally.

  357. Effect of Wind Intermittency on the Electric Grid: Mitigating the Risk of Energy Deficits.

    Authors: Sam O. George, H. Bola George, Scott V. Nguyen
    Subjects: Applications
    Abstract

    Successful implementation of California's Renewable Portfolio Standard (RPS)
    mandating 33 percent renewable energy generation by 2020 requires inclusion of
    a robust strategy to mitigate increased risk of energy deficits (blackouts) due
    to short time-scale (sub 1 hour) intermittencies in renewable energy sources.
    Of these RPS sources, wind energy has the fastest growth rate--over 25%
    year-over-year. If these growth trends continue, wind energy could make up 15
    percent of California's energy portfolio by 2016 (wRPS15).

  358. Reference priors for high energy physics.

    Authors: Luc Demortier, Supriya Jain, Harrison B. Prosper
    Subjects: Applications
    Abstract

    Bayesian inferences in high energy physics often use uniform prior
    distributions for parameters about which little or no information is available
    before data are collected. The resulting posterior distributions are therefore
    sensitive to the choice of parametrization for the problem and may even be
    improper if this choice is not carefully considered. Here we describe an
    extensively tested methodology, known as reference analysis, which allows one
    to construct parametrization-invariant priors that embody the notion of minimal
    informativeness in a mathematically well-defined sense.

  359. Efficient Bayesian Learning in Social Networks with Gaussian Estimators.

    Authors: Elchanan Mossel, Omer Tamuz
    Subjects: Applications
    Abstract

    We propose a simple and efficient Bayesian model of iterative learning on
    social networks. This model is efficient in two senses: the process both
    results in an optimal belief, and can be carried out with modest computational
    resources for large networks. This result extends Condorcet's Jury Theorem to
    general social networks, while preserving rationality and computational
    feasibility.

  360. A Spectral Analysis of Business Cycle Patterns in UK Sectoral Output.

    Authors: Peijie Wang, Trefor Jones
    Subjects: Applications
    Abstract

    This paper studies business cycle patterns in UK sectoral output. It analyzes
    the distinction between white noise processes and their non-white noise
    counterparts in the frequency domain and further examines the associated
    features and patterns for the process where white noise conditions are
    violated. The characteristics of these sectors, arising from their
    institutional features that may influence business cycles behavior and
    patterns, are discussed.

  361. Visualizing the Structure of Large Trees.

    Authors: Burcu Aydin, Gabor Pataki, Haonan Wang, Alim Ladha, Elizabeth Bullitt, J.S. Marron
    Subjects: Applications
    Abstract

    This study introduces a new method of visualizing complex tree structured
    objects. The usefulness of this method is illustrated in the context of
    detecting unexpected features in a data set of very large trees. The major
    contribution is a novel two-dimensional graphical representation of each tree,
    with a covariate coded by color. The motivating data set contains three
    dimensional representations of brain artery systems of 105 subjects.

  362. Lambert W Random Variables - A New Family of Generalized Skewed Distributions.

    Authors: Georg M. Goerg
    Subjects: Applications
    Abstract

    A lot of financial data present slight negative skewness and excess kurtosis.
    Whereas the kurtosis is usually addressed via student-t distributions, the
    evident asymmetry in the data is often tacitly ignored. Here I introduce a new
    class of generalized skewed distribution functions, which allows a very
    flexible approach to model skewed data. Originating from a system-theory and an
    input/output point of view, a non-linear transformation converts a random
    variable X into a so called Lambert W random variable Y. Its skewness depends
    on the skewness of X and a skew parameter delta.

  363. Contact intervals, survival analysis of epidemic data, and estimation of R_0.

    Authors: Eben Kenah
    Subjects: Applications
    Abstract

    We argue that the time from the onset of infectiousness to infectious
    contact, which we call the contact interval, is a better basis for inference in
    epidemic data than generation or serial intervals. Since an infectious person
    might recover before making infectious contact or make infectious contact with
    previously infected persons, infectious contact intervals can be right-censored
    and survival analysis is the natural approach to estimation.

  364. Assessing a mixture model for graphs with a non asymptotic approximation of the marginal likelihood.

    Authors: Christophe Ambroise, Pierre Latouche, Etienne Birmele
    Subjects: Applications
    Abstract

    It is now widely accepted that knowledge can be acquired from networks by
    clustering their vertices according to connection profiles. Many methods have
    been proposed. In this paper, we concentrate on a mixture model for graphs, the
    so-called MixNet model, which is closely related to the stochastic block model.
    The clustering of vertices and the estimation of MixNet model parameters have
    been subject to previous work and numerous inference strategies such as
    variational Expectation Maximization (EM) and classification EM have been
    proposed.

  365. Bayesian model search and multilevel inference for SNP association studies.

    Authors: Melanie A. Wilson, Edwin S. Iversen, Merlise A. Clyde, Scott C. Schmidler, Joellen M. Schildkraut
    Subjects: Applications
    Abstract

    Technological advances in genotyping have given rise to hypothesis-based
    association studies of increasing scope. As a result, the scientific hypotheses
    addressed by these studies have become more complex and more difficult to
    address using existing analytic methodologies. Obstacles to analysis include
    inference in the face of multiple comparisons, complications arising from
    correlations among the SNPs (single nucleotide polymorphisms), choice of their
    genetic parameterization and missing data.

  366. A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics.

    Authors: Shane T. Jensen, Blakeley B. McShane, Alexander Braunstein, James Piette
    Subjects: Applications
    Abstract

    Numerous statistics have been proposed for the measure of offensive ability
    in major league baseball. While some of these measures may offer moderate
    predictive power in certain situations, it is unclear which simple offensive
    metrics are the most reliable or consistent. We address this issue with a
    Bayesian hierarchical model for variable selection to capture which offensive
    metrics are most predictive within players across time.

  367. A Markovian event-based framework for stochastic spiking neural networks.

    Authors: Olivier Faugeras, Jonathan Touboul
    Subjects: Applications
    Abstract

    In this article we introduce and study a mathematical framework for
    characterizing and simulating networks of noisy integrate-and-fire neurons
    based on the spike times. We show that the firing times of the neurons in the
    networks constitute a Markov chain, whose transition probability is related to
    the probability distribution of the interspike interval of the neurons in the
    network.

  368. Transposable Regularized Covariance Models with an Application to Missing Data Imputation.

    Authors: Genevera I. Allen, Robert Tibshirani
    Subjects: Applications
    Abstract

    Missing data estimation is an important challenge with high-dimensional data
    arranged in the form of a matrix. Typically this data matrix is transposable,
    meaning that either the rows, columns or both can be treated as features. To
    model transposable data, we present a modification of the matrix-variate
    normal, the mean-restricted matrix-variate normal, in which the rows and
    columns each have a separate mean vector and covariance matrix.

  369. Time-Varying Autoregressions in Speech: Detection Theory and Applications.

    Authors: Patrick J. Wolfe, Daniel Rudoy, Thomas F. Quatieri
    Subjects: Applications
    Abstract

    This article develops a general detection theory for speech analysis based on
    time-varying autoregressive models, which themselves generalize the classical
    linear predictive speech analysis framework. This theory leads to a
    computationally efficient decision-theoretic procedure that may be applied to
    detect the presence of vocal tract variation in speech waveform data.

  370. Variable Second-Order Inclusion Probabilities as a Tool to Predict the Sampling Variance.

    Authors: Bastiaan Geelhoed
    Subjects: Applications
    Abstract

    A generalization of Gy's theory for the variance of the fundamental sampling
    error is reviewed. Practical situations where the generalized model potentially
    leads to more accurate variance estimates are identified as: clustering of
    particles, differences in densities or sizes of the particles or repulsive
    inter-particle forces. Two general approaches for estimating an input parameter
    for the generalized model are discussed. The first approach consists of
    modelling based on physical properties of particles such as size, density and
    electrostatic forces between particles.

  371. Point target detection and subpixel position estimation in optical imagery.

    Authors: Vincent Samson, Fr&#xe9;d&#xe9;ric Champagnat, Jean-Fran&#xe7;ois Giovannelli
    Subjects: Applications
    Abstract

    This paper addresses the issue of detecting point objects in a clutter
    background and estimating their position by image processing. We are interested
    in the specific context where the object signature significantly varies with
    its random subpixel location because of aliasing. Conventional matched filter
    neglects this phenomenon and causes consistent loss of detection performance.
    Thus, alternative detectors are proposed and numerical results show the
    improvement brought by approximate and generalized likelihood ratio tests in
    comparison with pixel matched filtering.

  372. Straight to the Source: Detecting Aggregate Objects in Astronomical Images with Proper Error Control.

    Authors: Christopher R. Genovese, David A. Friedenberg
    Subjects: Applications
    Abstract

    The next generation of telescopes will acquire terabytes of image data on a
    nightly basis. Collectively, these large images will contain billions of
    interesting objects, which astronomers call sources. The astronomers' task is
    to construct a catalog detailing the coordinates and other properties of the
    sources. The source catalog is the primary data product for most telescopes and
    is an important input for testing new astrophysical theories, but to construct
    the catalog one must first detect the sources.

  373. Identification and Characterisation of Technological Topics in the Field of Molecular Biology.

    Authors: Ivana Roche, Dominique Besagni, Claire Fran&#xe7;ois, Marianne H&#xf6;rlesberger, Edgar L Schiebel
    Subjects: Applications
    Abstract

    This paper focuses on methodological approaches for characterising the
    specific topics within a technological field based on scientific literature
    data. We introduce a diachronic clustering analysis approach and some
    bibliometric indicators. The results are visualised with the software-tool
    Stanalyst [1]. We are applying our methods to the field "Molecular Biology".
    This field has grown a great deal in the last decade.

  374. Clustering based on Random Graph Model embedding Vertex Features.

    Authors: Christophe Ambroise, Hugo Zanghi, Stevenn Volant
    Subjects: Applications
    Abstract

    Large datasets with interactions between objects are common to numerous
    scientific fields (i.e. social science, internet, biology...). The interactions
    naturally define a graph and a common way to explore or summarize such dataset
    is graph clustering. Most techniques for clustering graph vertices just use the
    topology of connections ignoring informations in the vertices features.

  375. Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays.

    Authors: W. Evan Johnson, X. Shirley Liu, Jun S. Liu
    Subjects: Applications
    Abstract

    Microarrays have been developed that tile the entire nonrepetitive genomes of
    many different organisms, allowing for the unbiased mapping of active
    transcription regions or protein binding sites across the entire genome. These
    tiling array experiments produce massive correlated data sets that have many
    experimental artifacts, presenting many challenges to researchers that require
    innovative analysis methods and efficient computational algorithms.

  376. Strategies for Online Inference of Model-Based Clustering in large Networks.

    Authors: Christophe Ambroise, Hugo Zanghi, Franck Picard, Vincent Miele
    Subjects: Applications
    Abstract

    The statistical analysis of complex networks is a challenging task, given
    that appropriate statistical models and efficient computational procedures are
    required in order for structures to be learned. One line of research has aimed
    at developing mixture models for random graphs, and this strategy has been
    successful in revealing structures in social and biological networks. The
    principle of these models is to assume that the distribution of the edge values
    follows a parametric distribution, conditionally on a latent structure which is
    used to detect connectivity patterns.

  377. Weighted-Lasso for Structured Network Inference from Time Course Data.

    Authors: Camille Charbonnier, Julien Chiquet, Christophe Ambroise
    Subjects: Applications
    Abstract

    We present a weighted-Lasso method to infer the parameters of a first-order
    vector auto-regressive model that describes time course expression data
    generated by directed gene-to-gene regulation networks. These networks are
    assumed to own a priori internal structures of connectivity which drive the
    inference method. Solution to the optimization problem is efficiently computed
    using an active-set algorithm. We illustrate the performance both on synthetic
    data and on the yeast regulation network by analyzing Spellman et al's dataset.

  378. Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution.

    Authors: Asger Hobolth, Eric A. Stone
    Subjects: Applications
    Abstract

    Analyses of serially-sampled data often begin with the assumption that the
    observations represent discrete samples from a latent continuous-time
    stochastic process. The continuous-time Markov chain (CTMC) is one such
    generative model whose popularity extends to a variety of disciplines ranging
    from computational finance to human genetics and genomics. A common theme among
    these diverse applications is the need to simulate sample paths of a CTMC
    conditional on realized data that is discretely observed.

  379. Assessing the association between trends in a biomarker and risk of event with an application in pediatric HIV/AIDS.

    Authors: Elizabeth R. Brown
    Subjects: Applications
    Abstract

    We present a new joint longitudinal and survival model aimed at estimating
    the association between the risk of an event and the change in and history of a
    biomarker that is repeatedly measured over time. We use cubic B-splines models
    for the longitudinal component that lend themselves to straight-forward
    formulations of the slope and integral of the trajectory of the biomarker. The
    model is applied to data collected in a long term follow-up study of HIV
    infected infants in Uganda. Estimation is carried out using MCMC methods.

  380. Maximum likelihood estimates under $\mathbf{k}$-allele models with selection can be numerically unstable.

    Authors: Erkan Ozge Buzbas, Paul Joyce
    Subjects: Applications
    Abstract

    The stationary distribution of allele frequencies under a variety of
    Wright--Fisher $k$-allele models with selection and parent independent mutation
    is well studied. However, the statistical properties of maximum likelihood
    estimates of parameters under these models are not well understood. Under each
    of these models there is a point in data space which carries the strongest
    possible signal for selection, yet, at this point, the likelihood is unbounded.
    This result remains valid even if all of the mutation parameters are assumed to
    be known.

  381. Maximum likelihood estimates under $\mathbf{k}$-allele models with selection can be numerically unstable.

    Authors: Erkan Ozge Buzbas, Paul Joyce
    Subjects: Applications
    Abstract

    The stationary distribution of allele frequencies under a variety of
    Wright--Fisher $k$-allele models with selection and parent independent mutation
    is well studied. However, the statistical properties of maximum likelihood
    estimates of parameters under these models are not well understood. Under each
    of these models there is a point in data space which carries the strongest
    possible signal for selection, yet, at this point, the likelihood is unbounded.
    This result remains valid even if all of the mutation parameters are assumed to
    be known.

  382. A new latent cure rate marker model for survival data.

    Authors: Sungduk Kim, Yingmei Xi, Ming-Hui Chen
    Subjects: Applications
    Abstract

    To address an important risk classification issue that arises in clinical
    practice, we propose a new mixture model via latent cure rate markers for
    survival data with a cure fraction. In the proposed model, the latent cure rate
    markers are modeled via a multinomial logistic regression and patients who
    share the same cure rate are classified into the same risk group. Compared to
    available cure rate models, the proposed model fits better to data from a
    prostate cancer clinical trial.

  383. A new latent cure rate marker model for survival data.

    Authors: Sungduk Kim, Yingmei Xi, Ming-Hui Chen
    Subjects: Applications
    Abstract

    To address an important risk classification issue that arises in clinical
    practice, we propose a new mixture model via latent cure rate markers for
    survival data with a cure fraction. In the proposed model, the latent cure rate
    markers are modeled via a multinomial logistic regression and patients who
    share the same cure rate are classified into the same risk group. Compared to
    available cure rate models, the proposed model fits better to data from a
    prostate cancer clinical trial.

  384. Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging.

    Authors: Ian L. Dryden, Alexey Koloydenko, Diwei Zhou
    Subjects: Applications
    Abstract

    The statistical analysis of covariance matrix data is considered and, in
    particular, methodology is discussed which takes into account the non-Euclidean
    nature of the space of positive semi-definite symmetric matrices. The main
    motivation for the work is the analysis of diffusion tensors in medical image
    analysis. The primary focus is on estimation of a mean covariance matrix and,
    in particular, on the use of Procrustes size-and-shape space. Comparisons are
    made with other estimation techniques, including using the matrix logarithm,
    matrix square root and Cholesky decomposition.

  385. Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging.

    Authors: Ian L. Dryden, Alexey Koloydenko, Diwei Zhou
    Subjects: Applications
    Abstract

    The statistical analysis of covariance matrix data is considered and, in
    particular, methodology is discussed which takes into account the non-Euclidean
    nature of the space of positive semi-definite symmetric matrices. The main
    motivation for the work is the analysis of diffusion tensors in medical image
    analysis. The primary focus is on estimation of a mean covariance matrix and,
    in particular, on the use of Procrustes size-and-shape space. Comparisons are
    made with other estimation techniques, including using the matrix logarithm,
    matrix square root and Cholesky decomposition.

  386. Hierarchical spatial models for predicting tree species assemblages across large domains.

    Authors: Andrew O. Finley, Sudipto Banerjee, Ronald E. McRoberts
    Subjects: Applications
    Abstract

    Spatially explicit data layers of tree species assemblages, referred to as
    forest types or forest type groups, are a key component in large-scale
    assessments of forest sustainability, biodiversity, timber biomass, carbon
    sinks and forest health monitoring. This paper explores the utility of coupling
    georeferenced national forest inventory (NFI) data with readily available and
    spatially complete environmental predictor variables through spatially-varying
    multinomial logistic regression models to predict forest type groups across
    large forested landscapes.

  387. GaGa: A parsimonious and flexible model for differential expression analysis.

    Authors: David Rossell
    Subjects: Applications
    Abstract

    Hierarchical models are a powerful tool for high-throughput data with a small
    to moderate number of replicates, as they allow sharing information across
    units of information, for example, genes. We propose two such models and show
    its increased sensitivity in microarray differential expression applications.
    We build on the gamma--gamma hierarchical model introduced by Kendziorski et
    al. [Statist. Med. 22 (2003) 3899--3914] and Newton et al. [Biostatistics 5
    (2004) 155--176], by addressing important limitations that may have hampered
    its performance and its more widespread use.

  388. Statistical modeling of the time course of tantrum anger.

    Authors: Peihua Qiu, Rong Yang, Michael Potegal
    Subjects: Applications
    Abstract

    Although anger is an important emotion that underlies much overt aggression
    at great social cost, little is known about how to quantify anger or to specify
    the relationship between anger and the overt behaviors that express it. This
    paper proposes a novel statistical model which provides both a metric for the
    intensity of anger and an approach to determining the quantitative relationship
    between anger intensity and the specific behaviors that it controls.

  389. Analysis of Minnesota colon and rectum cancer point patterns with spatial and nonspatial covariate information.

    Authors: Shengde Liang, Bradley P. Carlin, Alan E. Gelfand
    Subjects: Applications
    Abstract

    Colon and rectum cancer share many risk factors, and are often tabulated
    together as ``colorectal cancer'' in published summaries. However, recent work
    indicating that exercise, diet, and family history may have differential
    impacts on the two cancers encourages analyzing them separately, so that
    corresponding public health interventions can be more efficiently targeted. We
    analyze colon and rectum cancer data from the Minnesota Cancer Surveillance
    System from 1998--2002 over the 16-county Twin Cities (Minneapolis--St. Paul)
    metro and exurban area.

  390. State price density estimation via nonparametric mixtures.

    Authors: Ming Yuan
    Subjects: Applications
    Abstract

    We consider nonparametric estimation of the state price density encapsulated
    in option prices. Unlike usual density estimation problems, we only observe
    option prices and their corresponding strike prices rather than samples from
    the state price density. We propose to model the state price density directly
    with a nonparametric mixture and estimate it using least squares. We show that
    although the minimization is taken over an infinitely dimensional function
    space, the minimizer always admits a finite dimensional representation and can
    be computed efficiently.

  391. Are a set of microarrays independent of each other?.

    Authors: Bradley Efron
    Subjects: Applications
    Abstract

    Having observed an $m\times n$ matrix $X$ whose rows are possibly correlated,
    we wish to test the hypothesis that the columns are independent of each other.
    Our motivation comes from microarray studies, where the rows of $X$ record
    expression levels for $m$ different genes, often highly correlated, while the
    columns represent $n$ individual microarrays, presumably obtained
    independently. The presumption of independence underlies all the familiar
    permutation, cross-validation and bootstrap methods for microarray analysis, so
    it is important to know when independence fails.

  392. Maximum likelihood estimation of cloud height from multi-angle satellite imagery.

    Authors: E. Anderes, B. Yu, V. Jovanovic, C. Moroney, M. Garay, A. Braverman, E. Clothiaux
    Subjects: Applications
    Abstract

    We develop a new estimation technique for recovering depth-of-field from
    multiple stereo images. Depth-of-field is estimated by determining the shift in
    image location resulting from different camera viewpoints. When this shift is
    not divisible by pixel width, the multiple stereo images can be combined to
    form a super-resolution image. By modeling this super-resolution image as a
    realization of a random field, one can view the recovery of depth as a
    likelihood estimation problem.

  393. Error-free milestones in error prone measurements.

    Authors: Dylan S. Small, Paul R. Rosenbaum
    Subjects: Applications
    Abstract

    A predictor variable or dose that is measured with substantial error may
    possess an error-free milestone, such that it is known with negligible error
    whether the value of the variable is to the left or right of the milestone.
    Such a milestone provides a basis for estimating a linear relationship between
    the true but unknown value of the error-free predictor and an outcome, because
    the milestone creates a strong and valid instrumental variable. The inferences
    are nonparametric and robust, and in the simplest cases, they are exact and
    distribution free.

  394. A New Approach to Modeling Choice with Limited Data.

    Authors: Devavrat Shah, Vivek F. Farias, Srikanth Jagabathula
    Subjects: Applications
    Abstract

    We visit the following problem: For a `generic' model of consumer choice
    (namely, distributions over preference lists) and a limited amount of data on
    how consumers actually make decisions (such as marginal preference
    information), how may one predict revenues from offering a particular
    assortment of choices? This is a central problem in operations research and
    marketing. We present a framework to answer such questions and design a number
    of tractable algorithms from a data and computational standpoint for the same.

  395. Distributed detection/localization of change-points in high-dimensional network traffic data.

    Authors: Olivier Capp&#xe9;, Alexandre Lung-Yut-Fong, C&#xe9;line L&#xe9;vy-Leduc
    Subjects: Applications
    Abstract

    We propose a novel approach for distributed statistical detection of
    change-points in high-volume network traffic. We consider more specifically the
    task of detecting and identifying the targets of Distributed Denial of Service
    (DDoS) attacks. The proposed algorithm, called DTopRank, performs distributed
    network anomaly detection by aggregating the partial information gathered in a
    set of network monitors.

  396. Bayesian changepoint analysis for atomic force microscopy and soft material indentation.

    Authors: Patrick J. Wolfe, Daniel Rudoy, Shelten G. Yuen, Robert D. Howe
    Subjects: Applications
    Abstract

    Material indentation studies, in which a probe is brought into controlled
    physical contact with an experimental sample, have long been a primary means by
    which scientists characterize the mechanical properties of materials. More
    recently, the advent of atomic force microscopy, which operates on the same
    fundamental principle, has in turn revolutionized the nanoscale analysis of
    soft biomaterials such as cells and tissues.

  397. Assessment of school performance through a multilevel latent Markov Rasch model.

    Authors: Francesco Bartolucci, Fulvia Pennoni, Giorgio Vittadini
    Subjects: Applications
    Abstract

    An extension of the latent Markov Rasch model is described for the analysis
    of binary longitudinal data with covariates when subjects are collected in
    clusters, e.g. students clustered in classes. For each subject, the latent
    process is used to represent the characteristic of interest (e.g. ability)
    conditional on the effect of the cluster to which he/she belongs. The latter
    effect is modeled by a discrete latent variable associated with each cluster.
    For the maximum likelihood estimation of the model parameters we outline an EM
    algorithm.

  398. Hierarchical Relational Models for Document Networks.

    Authors: David M. Blei, Jonathan Chang
    Subjects: Applications
    Abstract

    We develop the relational topic model (RTM), a hierarchical model of both
    network structure and node attributes. We focus on document networks, where the
    attributes of each document are its words, i.e., discrete observations taken
    from a fixed vocabulary. For each pair of documents, the RTM models their link
    as a binary random variable that is conditioned on their contents. The model
    can be used to summarize a network of documents, predict links between them,
    and predict words within them.

  399. Efficient Calculation of P-value and Power for Quadratic Form Statistics in Multilocus Association Testing.

    Authors: Liping Tong, Jie Yang, Richard S. Cooper
    Subjects: Applications
    Abstract

    We address the asymptotic and approximate distributions of a large class of
    test statistics with quadratic forms used in association studies. The
    statistics of interest do not necessarily follow a chi-square distribution and
    take the general form $D=X^T A X$, where $X$ follows the multivariate normal
    distribution, and $A$ is a general similarity matrix which may or may not be
    positive semi-definite.

  400. On Goodness of Fit Tests For Models of Neuronal Spike Trains Considered as Counting Processes.

    Authors: Christophe Pouzat, Antoine Chaffiol
    Subjects: Applications
    Abstract

    After an elementary derivation of the "time transformation", mapping a
    counting process onto a homogeneous Poisson process with rate one, a brief
    review of Ogata's goodness of fit tests is presented and a new test, the
    "Wiener process test", is proposed. This test is based on a straightforward
    application of Donsker's Theorem to the intervals of time transformed counting
    processes. The finite sample properties of the test are studied by Monte Carlo
    simulations.

  401. Parameter Estimation in multiple-hidden i.i.d. models from biological multiple alignment.

    Authors: Ana Arribas-Gil
    Subjects: Applications
    Abstract

    In this work we deal with parameter estimation in a latent variable model,
    namely the multiple-hidden i.i.d. model, which is derived from multiple
    alignment algorithms. We first provide a rigorous formalism for the homology
    structure of k sequences related by a star-shaped phylogenetic tree in the
    context of multiple alignment based on indel evolution models. We discuss
    possible definitions of likelihoods and compare them to the criterion used in
    multiple alignment algorithms.

  402. Co-occurrence Matrix and Fractal Dimension for Image Segmentation.

    Authors: Beatriz Marron
    Subjects: Applications
    Abstract

    One of the most important tasks in image processing problem and machine
    vision is object recognition, and the success of many proposed methods relies
    on a suitable choice of algorithm for the segmentation of an image. This paper
    focuses on how to apply texture operators based on the concept of fractal
    dimension and cooccurence matrix, to the problem of object recognition and a
    new method based on fractal dimension is introduced.

  403. New approaches for increasing the reliability of the h index research performance measurement.

    Authors: Lutz Bornmann, Ruediger Mutz, Hans-Dieter Daniel
    Subjects: Applications
    Abstract

    In the year 2005 Jorge Hirsch introduced the h index for quantifying the
    research output of scientists. Today, the h index is a widely accepted
    indicator of research performance. The h index has been criticized for its
    insufficient reliability - the ability to discriminate reliably between
    meaningful amounts of research performance.

  404. Using administrative data to improve the estimation of immigration to local areas in England.

    Authors: Peter Boden, Phil Rees
    Subjects: Applications
    Abstract

    International migration is now a significant driver of population change
    across Europe but the methods available to estimate its true impact upon
    sub-national areas remain inconsistent, constrained by inadequate systems of
    measurement and data capture. In the absence of a population register for
    England, official statistics on immigration and emigration are derived from a
    combination of survey and census sources.

  405. Estimating limits from Poisson counting data using Dempster--Shafer analysis.

    Authors: Paul T. Edlefsen, Chuanhai Liu, Arthur P. Dempster
    Subjects: Applications
    Abstract

    We present a Dempster--Shafer (DS) approach to estimating limits from Poisson
    counting data with nuisance parameters. Dempster--Shafer is a statistical
    framework that generalizes Bayesian statistics. DS calculus augments
    traditional probability by allowing mass to be distributed over power sets of
    the event space. This eliminates the Bayesian dependence on prior distributions
    while allowing the incorporation of prior information when it is available. We
    use the Poisson Dempster--Shafer model (DSM) to derive a posterior DSM for the
    ``Banff upper limits challenge'' three-Poisson model.

  406. Simple Error Scattering Model for improved Information Reconciliation.

    Authors: Stefan Rass
    Subjects: Applications
    Abstract

    Implementations of quantum key distribution as available nowadays suffer from
    inefficiencies due to post processing of the raw key that severely cuts down
    the final secure key rate. We present a simple model for the error scattering
    across the raw key and derive "closed form" expressions for the probability of
    a parity check failure, or experiencing more than some fixed number of errors.
    Our results can serve for improvement for key establishment, as information
    reconciliation via interactive error correction and privacy amplification rests
    on mostly unproven assumptions.

Syndicate content