Research about attitudes and opinions is central to social science and relies
on two common methodological approaches: surveys and interviews. While surveys
enable the quantification of large amounts of information quickly and at a
reasonable cost, they are routinely criticized for being "top-down" and rigid.
In contrast, interviews allow unanticipated information to "bubble up" directly
from respondents, but are slow, expensive, and difficult to quantify.
This paper introduces a popular dimension reduction method, sliced inverse
regression (SIR), into multivariate statistical process monitoring. Provides an
extension of SIR for the single-index model by adopting the idea from partial
least squares (PLS). Our partial sliced inverse regression (PSIR) method has
the merit of incorporating information from both predictors (x) and responses
(y), and it has capability of handling large, nonlinear, or "n<p" dataset.
Equivalence testing is of emerging importance in genomics studies but has
hitherto been little studied in this content. In this paper, we define the
notion of equivalence of gene expression and determine a `strength of evidence'
measure for gene equivalence. It is common practice in genome-wide studies to
rank genes according to observed gene-specific P-values or adjusted P-values,
which are assumed to measure the strength of evidence against the null
hypothesis of no differential gene expression.
Propensity score matching is a tool for causal inference in non-randomized
studies that allows for conditioning on large sets of covariates. The use of
propensity scores in the social sciences is currently experiencing a tremendous
increase; however it is far from a commonly used tool. One impediment towards a
more wide-spread use of propensity score methods is the reliance on specialized
software, because many social scientists still use SPSS as their main analysis
tool. The current paper presents an implementation of various propensity score
matching methods in SPSS.
We introduce the optimal obstacle placement with disambiguations problem
wherein the goal is to place true obstacles in an environment cluttered with
false obstacles so as to maximize the total traversal length of a navigating
agent (NAVA). Prior to the traversal, NAVA is given location information and
probabilistic estimates of each disk-shaped hindrance (hereinafter referred to
as disk) being a true obstacle. The NAVA can disambiguate a disk's status only
when situated on its boundary. There exists an obstacle placing agent (OPA)
that locates obstacles prior to NAVA's traversal.
The relationship between short-term exposure to air pollution and mortality
or morbidity has been the subject of much recent research, in which the
standard method of analysis uses Poisson linear or additive models.
Array comparative genomic hybridization(CGH) is a high resolution technique
to assess DNA copy number variation. Identifying breakpoints where copy number
changes will enhance the understanding of the pathogenesis of human diseases,
such as cancers. However, the biological variation and experimental errors
contained in array CGH data may lead to false positive identification of
breakpoints.
Fossil fuels are major sources of energy, and have several advantages over
other primary energy sources. Without extensive dependence on fossil fuels, it
is questionable whether our economic prosperity can continue or not. This paper
analyzes cointegration and causality between fossil fuel consumption and
economic growth in the world over the period 1971--2008. The estimation results
indicate that fossil fuel consumption and GDP are cointegrated and there exists
long-run unidirectional causality from fossil fuel consumption to GDP.
Familial Searching is the process of searching in a DNA database for
relatives of a given individual. It is well known that in order to evaluate the
genetic evidence in favour of a certain given form of relatedness between two
individuals, one needs to calculate the appropriate likelihood ratio, which is
in this context called a Kinship Index. Suppose that the database contains, for
a given type of relative, at most one related individual.
We present a framework for sequential decision making in problems described
by graphical models. The setting is given by dependent discrete random
variables with associated costs or revenues. In our examples, the dependent
variables are the potential outcomes (oil, gas or dry) when drilling a
petroleum well. The goal is to develop an optimal selection strategy that
incorporates a chosen utility function within an approximated dynamic
programming scheme.
The optimal and robust design of structures has gained much attention in the
past ten years due to the ever increasing need for manufacturers to build
robust systems at the lowest cost. Reliability-based design optimization (RBDO)
allows the analyst to minimize some cost function while ensuring some minimal
performances cast as admissible probabilities of failure for a set of
performance functions. In order to address real-world problems in which the
performance is assessed through computational models (e.g.
We consider the problem of comparing two diagnostic tests based on a sample
of paired test results without true state determinations, in cases where the
second test can reasonably be assumed to be at least as specific as the first.
For such cases, we provide two informative confidence bounds: A lower one for
the prevalence times the sensitivity gain of the second test with respect to
the first, and an upper one for the sensitivity of the first test. Neither
conditional independence of the two tests nor perfectness of any of them needs
to be assumd.
Sparsity in the eigenvectors of signal covariance matrices is exploited in
this paper for compression and denoising. Dimensionality reduction (DR) and
quantization modules present in many practical compression schemes such as
transform codecs, are designed to capitalize on this form of sparsity and
achieve improved reconstruction performance compared to existing
sparsity-agnostic codecs.
Recently, Li et al. (Bioinformatics 27(19), 2686-91, 2011) proposed a method,
called Differential Equation-based Local Dynamic Bayesian Network (DELDBN), for
reverse engineering gene regulatory networks from time-course data. We commend
the authors for an interesting paper that draws attention to the close
relationship between dynamic Bayesian networks (DBNs) and differential
equations (DEs). Their central claim is that modifying a DBN to model Euler
approximations to the gradient rather than expression levels themselves is
beneficial for network inference.
We present a new model for the electricity spot price dynamics, which is able
to capture seasonality, low-frequency dynamics and the extreme spikes in the
market. Instead of the usual purely deterministic trend we introduce a
non-stationary independent increments process for the low-frequency dynamics,
and model the large fluctuations by a non-Gaussian stable CARMA process. The
model allows for analytic futures prices, and we apply these to model and
estimate the whole market consistently.
Parallel MRI is a fast imaging technique that enables the acquisition of
highly resolved images in space or/and in time. The performance of parallel
imaging strongly depends on the reconstruction algorithm, which can proceed
either in the original k-space (GRAPPA, SMASH) or in the image domain
(SENSE-like methods). To improve the performance of the widely used SENSE
algorithm, 2D- or slice-specific regularization in the wavelet domain has been
deeply investigated.
Many scholars have recently begun to dispute the assumed link between
individual wellbeing and economic conditions and the extent to which the latter
matters (Easterlin, 1995; Stevenson and Wolfers 2008; Tella and MacCulloch
2008). This dilemma is empirically demonstrated in the Latin America Public
Opinion Project (LAPOP, 2011), which surveyed North and Latin America in terms
of perceived life satisfaction. Higher measures found in the less developed
countries of Brazil, Costa Rica, and Panama than in North America pose an
intriguing quandary to traditional economic theory.
Motivation: NMR spectra are widely used in metabolomics to obtain metabolite
profiles in complex biological mixtures. Common methods used to assign and
estimate concentrations of metabolite involve either an expert manual peak
fitting or extra pre-processing steps, such as peak alignment and binning. Peak
fitting is very time consuming and is subject to human error. Conversely,
alignment and binning can introduce artifacts and limit immediate biological
interpretation of models.
Mixtures of linear mixed models (MLMMs) are useful for clustering grouped
data in applications such as gene expression time course experiments. These
models can be estimated by likelihood maximization through the EM algorithm and
the optimal number of components determined by comparing different mixture
models using penalized log-likelihood criteria such as BIC. In this paper, we
propose fitting MLMMs with variational methods which can perform parameter
estimation and model selection simultaneously.
Spatial capture-recapture (SCR) methods represent a major advance over
traditional capture-capture methods because they yield explicit estimates of
animal density instead of population size within an unknown area, and they
account for heterogeneity in capture probability arising from the juxtaposition
of individuals and sample locations. However, the requirement that all
individuals can be uniquely identified excludes their use in many contexts.
Joinpoint regression is used to determine the number of segments needed to
adequately explain the relationship between two variables. This methodology can
be widely applied to real problems, but we focus on epidemiological data, the
main goal being to uncover changes in the mortality time trend of a specific
disease under study. Traditionally, Joinpoint regression problems have paid
little or no attention to the quantification of uncertainty in the estimation
of the number of change-points. In this context, we found a satisfactory way to
handle the problem in the Bayesian methodology.
In various situations in the insurance industry, in finance, in epidemiology,
etc., one needs to represent the joint evolution of the number of occurrences
of an event. In this paper, we present a multivariate integer-valued
autoregressive (MINAR) model, derive its properties and apply the model to
earthquake occurrences across various pairs of tectonic plates. The model is an
extension of Pedelis & Karlis (2011) where cross autocorrelation (spatial
contagion in a seismic context) is considered.
This paper proposes confidence regions for the identified set in conditional
moment inequality models using Kolmogorov-Smirnov statistics with a truncated
inverse variance weighting with increasing truncation points. The new weighting
differs from those proposed in the literature in two important ways. First,
confidence regions based on KS tests with the weighting function I propose
converge to the identified set at a faster rate than existing procedures based
on bounded weight functions in a broad class of models.
This paper derives the rate of convergence and asymptotic distribution for a
class of Kolmogorov-Smirnov style test statistics for conditional moment
inequality models for parameters on the boundary of the identified set under
general conditions. In contrast to other moment inequality settings, the rate
of convergence is faster than root-$n$, and the asymptotic distribution depends
entirely on nonbinding moments. The results require the development of new
techniques that draw a connection between moment selection, irregular
identification, bandwidth selection and nonstandard M-estimation.
Network inference approaches are now widely used in biological applications
to probe regulatory relationships between molecular components such as genes or
proteins. Many methods have been proposed for this setting, but the connections
and differences between their statistical formulations have received less
attention. In this paper, we show how a broad class of statistical network
inference methods, including a number of existing approaches, can be described
in terms of variable selection for the linear model.
Optical systems which measure independent random projections of a scene
according to compressed sensing (CS) theory face a myriad of practical
challenges related to the size of the physical platform, photon efficiency, the
need for high temporal resolution, and fast reconstruction in video settings.
This paper describes a coded aperture and keyed exposure approach to
compressive measurement in optical systems.
We propose a semiparametric model for autonomous nonlinear dynamical systems
and devise an estimation procedure for model fitting. This model incorporates
subject-specific effects and can be viewed as a nonlinear semiparametric mixed
effects model. We also propose a computationally efficient model selection
procedure. We show by simulation studies that the proposed estimation as well
as model selection procedures can efficiently handle sparse and noisy
measurements.
The spatial modeling of extreme snow is important for adequate risk
management in Alpine and high altitude countries. A natural approach to such
modeling is through the theory of max-stable processes, an infinite-dimensional
extension of multivariate extreme value theory. In this paper we describe the
application of such processes in modeling the spatial dependence of extreme
snow depth in Switzerland, based on data for the winters 1966--2008 at 101
stations.
Monte Carlo approaches have recently been proposed to quantify connectivity
in neuronal networks. The key problem is to sample from the conditional
distribution of a single neuronal spike train, given the activity of the other
neurons in the network. Dependencies between neurons are usually relatively
weak; however, temporal dependencies within the spike train of a single neuron
are typically strong. In this paper we develop several specialized
Metropolis--Hastings samplers which take advantage of this dependency
structure.
Landscape classification of the well-known biodiversity hotspot, Western
Ghats (mountains), on the west coast of India, is an important part of a
world-wide program of monitoring biodiversity. To this end, a massive
vegetation data set, consisting of 51,834 4-variate observations has been
clustered into different landscapes by Nagendra and Gadgil [Current Sci. 75
(1998) 264--271]. But a study of such importance may be affected by
nonuniqueness of cluster analysis and the lack of methods for quantifying
uncertainty of the clusterings obtained.
Research in examining the equity of service accessibility has emerged as
economic and social equity advocates recognized that where people live
influences their opportunities for economic development, access to quality
health care and political participation. In this research paper service
accessibility equity is concerned with where and when services have been and
are accessed by different groups of people, identified by location or
underlying socioeconomic variables.
In order to find previously unknown subgroups in biomedical data and generate
testable hypotheses, visually guided exploratory analysis can be of tremendous
importance. In this paper we propose a new dissimilarity measure that can be
used within the Multidimensional Scaling framework to obtain a joint
low-dimensional representation of both the samples and variables of a
multivariate data set, thereby providing an alternative to conventional
biplots.
Covariance matrix estimates are an essential part of many signal processing
algorithms, and are often used to determine a low-dimensional principal
subspace via their spectral decomposition. However, exact eigenanalysis is
computationally intractable for sufficiently high-dimensional matrices, and in
the case of small sample sizes, sample eigenvalues and eigenvectors are known
to be poor estimators of their true counterparts. To address these issues, we
propose a covariance estimator that is computationally efficient while also
performing shrinkage on the sample eigenvalues.
There are very significant changes taking place in the university sector and
in related higher education institutes in many parts of the world. In this work
we look at financial data from 2010 and 2011 from the UK higher education
sector. Situating ourselves to begin with in the context of teaching versus
research in universities, we look at the data in order to explore the new
divergence between the broad agendas of teaching and research in universities.
The innovation agenda has become at least equal to the research and teaching
objectives of universities.
Comonotonicity had been a extreme case of dependency between random
variables. This article consider an extension of single life model under
multiple dependent decrement causes to the case of comonotonic group-life.
Motivated by the problem of identifying correlations between genes or
features of two related biological systems, we propose a model of \emph{feature
selection} in which only a subset of the predictors $X_t$ are dependent on the
multidimensional variate $Y$, and the remainder of the predictors constitute a
"noise set" $X_u$ independent of $Y$.
Interactions among multiple genes across the genome may contribute to the
risks of many complex human diseases. Whole-genome single nucleotide
polymorphisms (SNPs) data collected for many thousands of SNP markers from
thousands of individuals under the case--control design promise to shed light
on our understanding of such interactions. However, nearby SNPs are highly
correlated due to linkage disequilibrium (LD) and the number of possible
interactions is too large for exhaustive evaluation.
Acute respiratory diseases are transmitted over networks of social contacts.
Large-scale simulation models are used to predict epidemic dynamics and
evaluate the impact of various interventions, but the contact behavior in these
models is based on simplistic and strong assumptions which are not informed by
survey data. These assumptions are also used for estimating transmission
measures such as the basic reproductive number and secondary attack rates.
Development of methodology to infer contact networks from survey data could
improve these models and estimation methods.
Biomedical studies have a common interest in assessing relationships between
multiple related health outcomes and high-dimensional predictors. For example,
in reproductive epidemiology, one may collect pregnancy outcomes such as length
of gestation and birth weight and predictors such as single nucleotide
polymorphisms in multiple candidate genes and environmental exposures. In such
settings, there is a need for simple yet flexible methods for selecting true
predictors of adverse health responses from a high-dimensional set of candidate
predictors.
Admixture mapping is a popular tool to identify regions of the genome
associated with traits in a recently admixed population. Existing methods have
been developed primarily for identification of a single locus influencing a
dichotomous trait within a case-control study design. We propose a generalized
admixture mapping (GLEAM) approach, a flexible and powerful regression method
for both quantitative and qualitative traits, which is able to test for
association between the trait and local ancestries in multiple loci
simultaneously and adjust for covariates.
Genetic association study is an essential step to discover genetic factors
that are associated with a complex trait of interest. In this paper we present
a novel generalized quasi-likelihood score (GQLS) test that is suitable for a
study with either a quantitative trait or a binary trait. We use a logistic
regression model to link the phenotypic value of the trait to the distribution
of allelic frequencies. In our model, the allele frequencies are treated as a
response and the trait is treated as a covariate that allows us to leave the
distribution of the trait values unspecified.
In biomedical studies it is of substantial interest to develop risk
prediction scores using high-dimensional data such as gene expression data for
clinical endpoints that are subject to censoring. In the presence of
well-established clinical risk factors, investigators often prefer a procedure
that also adjusts for these clinical variables. While accelerated failure time
(AFT) models are a useful tool for the analysis of censored outcome data, it
assumes that covariate effects on the logarithm of time-to-event are linear,
which is often unrealistic in practice.
The vast amount of biological knowledge accumulated over the years has
allowed researchers to identify various biochemical interactions and define
different families of pathways. There is an increased interest in identifying
pathways and pathway elements involved in particular biological processes. Drug
discovery efforts, for example, are focused on identifying biomarkers as well
as pathways related to a disease. We propose a Bayesian model that addresses
this question by incorporating information on pathways and gene networks in the
analysis of DNA microarray data.
We propose a new framework for cooperative spectrum sensing in cognitive
radio networks, that is based on a novel class of non-uniform samplers, called
the event-triggered samplers, and sequential detection. In the proposed scheme,
each secondary user computes its local sensing decision statistic based on its
own channel output; and whenever such decision statistic crosses certain
predefined threshold values, the secondary user will send one (or several) bit
of information to the fusion center.
In this paper we introduce a novel method to conduct inference with models
defined through a continuous-time Markov process, and we apply these results to
a classical stochastic SIR model as a case study. Using the inverse-size
expansion of van Kampen we obtain approximations for first and second moments
for the state variables. These approximate moments are in turn matched to the
moments of an inputed generic discrete distribution aimed at generating an
approximate likelihood that is valid both for low count or high count data.
Signal averaging is the process that consists in computing a mean shape from
a set of noisy signals. In the presence of geometric variability in time in the
data, the usual Euclidean mean of the raw data yields a mean pattern that does
not reflect the typical shape of the observed signals. In this setting, it is
necessary to use alignment techniques for a precise synchronization of the
signals, and then to average the aligned data to obtain a consistent mean
shape.
In this paper, we consider capture-recapture experiments with heterogenous
catchability. In the setting we consider, the widespread Huggins-Alho estimator
is not very suitable and we introduce and study a new generalized
Horvitz-Thompson estimator. Our motivation is Respondent Driven Sampling (RDS),
a prime example for such a setting where the capture probability is dependent
on both the unknown population size as well as on an observable covariate, the
network degree of an individual, due to peer recruitment.
Motivated by a potential-outcomes perspective, the idea of principal
stratification has been widely recognized for its relevance in settings
susceptible to posttreatment selection bias such as randomized clinical trials
where treatment received can differ from treatment assigned. In one such
setting, we address subtleties involved in inference for causal effects when
using a key covariate to predict membership in latent principal strata.
We consider the problem of algorithmically recommending items to users on a
Yahoo! front page module. Our approach is based on a novel multilevel
hierarchical model that we refer to as a User Profile Model with Graphical
Lasso (UPG). The UPG provides a personalized recommendation to users by
simultaneously incorporating both user covariates and historical user
interactions with items in a model based way. In fact, we build a per-item
regression model based on a rich set of user covariates and estimate individual
user affinity to items by introducing a latent random vector for each user.
Acute infectious diseases are transmitted over networks of social contacts.
Epidemic models are used to predict the spread of emergent pathogens and
compare intervention strategies. Many of these models assume equal probability
of contact within mixing groups (homes, schools, etc.), but little work has
inferred the actual contact network, which may influence epidemic estimates. We
develop a penalized likelihood method to infer contact networks within
households, a key area for disease transmission.
This paper proposes a model of financial contagion that accounts for
explosive, mutually exciting shocks to market volatility. We fit the model
using country-level data during the European sovereign debt crisis, which has
its roots in the period 2008--2010, and was continuing to affect global markets
as of October, 2011.
How does dynamic price information flow among Northern European electricity
spot prices and prices of major electricity generation fuel sources? We use
time series models combined with new advances in causal inference to answer
these questions. Applying our methods to weekly Nordic and German electricity
prices, and oil, gas and coal prices, with German wind power and Nordic water
reservoir levels as exogenous variables, we estimate a causal model for the
price dynamics, both for contemporaneous and lagged relationships.
In this paper, we study the target tracking problem in wireless sensor
networks (WSNs) using quantized sensor measurements under limited bandwidth
availability. At each time step of tracking, the available bandwidth $R$ needs
to be distributed among the $N$ sensors in the WSN for the next time step. The
optimal solution for the bandwidth allocation problem can be obtained by using
a combinatorial search which may become computationally prohibitive for large
$N$ and $R$.
Reproducibility is essential to reliable scientific discovery in
high-throughput experiments. In this work we propose a unified approach to
measure the reproducibility of findings identified from replicate experiments
and identify putative discoveries using reproducibility. Unlike the usual
scalar measures of reproducibility, our approach creates a curve, which
quantitatively assesses when the findings are no longer consistent across
replicates.
Ambient concentrations of many pollutants are associated with emissions due
to human activity, such as road transport and other combustion sources. In this
paper we consider air pollution as a multi--level phenomenon within a Bayesian
hierarchical model. We examine different scales of variation in pollution
concentrations ranging from large scale transboundary effects to more localised
effects which are directly related to human activity.
The main purpose of this paper is to provide an asymptotically optimal test.
The proposed statistic is of Neyman-Pearson-type when the parameters are
estimated with a particular kind of estimators. It is shown that the proposed
estimators enable us to achieve this end. Two particular cases, AR(1) and ARCH
models were studied and the asymptotic power function was derived.
Demand functions for goods are generally cyclical in nature with
characteristics such as trend or stochasticity. Most existing demand
forecasting techniques in literature are designed to manage and forecast this
type of demand functions. However, if the demand function is lumpy in nature,
then the general demand forecasting techniques may fail given the unusual
characteristics of the function.
We use a minimum requirement approach to derive the number of jobs of
proximity services per inhabitant in a municipality from its number of
inhabitants. We apply this approach to four different subsets of
municipalities, each defined by a specific range of distance to the
municipality where the inhabitants go the most frequently to get services
(called MFM). For each subset, we get satisfactory results in regression.
We present a consensus-based distributed particle filter (PF) for wireless
sensor networks. Each sensor runs a local PF to compute a global state estimate
that takes into account the measurements of all sensors. The local PFs use the
joint (all-sensors) likelihood function, which is calculated in a distributed
way by a novel generalization of the likelihood consensus scheme. A performance
improvement (or a reduction of the required number of particles) is achieved by
a novel distributed, consensus-based method for adapting the proposal densities
of the local PFs.
It is widely recognized nowadays that complex diseases are caused by, amongst
the others, multiple genetic factors. The recent advent of genome-wide
association study (GWA) has triggered a wave of research aimed at discovering
genetic factors underlying common complex diseases. While the number of
reported susceptible genetic variants is increasing steadily, the application
of such findings into diseases prognosis for the general population is still
unclear, and there are doubts about whether the size of the contribution by
such factors is significant.
Presented is an evolutionary model of consumer non-durable markets, which is
an extension of a previously published paper on consumer durables. The model
suggests that the repurchase process is governed by preferential growth.
Applying statistical methods it can be shown that in a competitive market the
mean price declines according to an exponential law towards a natural price,
while the corresponding price distribution is approximately given by a Laplace
distribution for independent price decisions of the manufacturers.
Recently, the concept of tail dependence has been discussed in financial
applications related to market or credit risk. The multivariate extreme value
theory is a proper tool to measure and model dependence, for example, of large
loss events. A common measure of tail dependence is given by the so-called
tail-dependence coefficient. We present a simple estimator of this latter that
avoids the drawbacks of the estimation procedure that has been used so far. We
prove strong consistency and asymptotic normality and analyze the finite sample
behavior through simulation.
Association networks represent systems of interacting elements, where a link
between two different elements indicates a sufficient level of similarity
between element attributes. While in reality relational ties between elements
can be expected to be based on similarity across multiple attributes, the vast
majority of work to date on association networks involves ties defined with
respect to only a single attribute.
In this paper, we provide R-estimators of the location of a rotationally
symmetric distribution on the unit sphere of $R^k$. In order to do so we ?first
prove the local asymptotic normality property of a sequence of rotationally
symmetric models; this is a non standard result due to the curved nature of the
unit sphere. We then construct our estimators by adapting the Le Cam one-step
methodology to spherical statistics and ranks. We show that they are
asymptotically normal under any rotationally symmetric distribution and achieve
the efficiency bound under a specific density.
In this paper, we study a non-linear model used to estimate and forecast the
electricity load, that usually requires four or more years worth of data to
avoid any overfitting phenomenon. We first propose a non-informative prior to
be used when the number of observations is large enough. When the observations
are too few, we propose a hierarchical prior to include information coming from
another bigger, similar, sample. The posterior densities associated with these
two priors are derived and a MCMC algorithm is provided in each case.
Topic modeling is a mixed-membership framework for dimension reduction that
is widely applied in text-mining, among other areas. This article describes an
algorithm for posterior maximization under such models, identifying
computational and conceptual gains that come from working with an alternative
model parameterization. We then show that fitted parameters can be used as the
basis for a novel approach to marginal likelihood estimation, founded on
block-diagonal approximation to the information matrix, that facilitates
choosing the number of latent topics.
We consider pricing weather derivatives for use as protection against weather
extremes. The method described utilizes results from spatial statistics and
extreme value theory to first model extremes in the weather as a max-stable
process, and then use these models to simulate payments for a general
collection of weather derivatives. These simulations capture the spatial
dependence of payments. Incorporating results from catastrophe ratemaking, we
show how this method can be used to compute risk loads and premiums for weather
derivatives which are renewal-additive.
Sensor systems typically operate under resource constraints that prevent the
simultaneous use of all resources all of the time. Sensor management becomes
relevant when the sensing system has the capability of actively managing these
resources; i.e., changing its operating configuration during deployment in
reaction to previous measurements. Examples of systems in which sensor
management is currently used or is likely to be used in the near future include
autonomous robots, surveillance and reconnaissance networks, and waveform-agile
radars.
Gene regulatory networks are collections of genes that interact with one
other and with other substances in the cell. By measuring gene expression over
time using high-throughput technologies, it may be possible to reverse
engineer, or infer, the structure of the gene network involved in a particular
cellular process.
We consider distributed state estimation in a wireless sensor network without
a fusion center. Each sensor performs a global estimation task - based on the
past and current measurements of all sensors - using only local processing and
local communications with its neighbors. In this task, the joint (all-sensors)
likelihood function (JLF) plays a central role as it epitomizes the
measurements of all sensors. We propose a distributed method for computing an
approximation of the JLF by means of consensus algorithms.
In radar systems, tracking targets in low signal-to-noise ratio (SNR)
environments is a very important task. There are some algorithms designed for
multitarget tracking. Their performances, however, are not satisfactory in low
SNR environments. Track-before-detect (TBD) algorithms have been developed as a
class of improved methods for tracking in low SNR environments. However,
multitarget TBD is still an open issue. In this paper, multitarget TBD
measurements are modeled, and a highly efficient filter in the framework of
finite set statistics (FISST) is designed.
This paper is focused on solving the narrowband direction of arrival
estimation problem from a sparse signal reconstruction perspective. Existing
sparsity-based methods have shown advantages over conventional ones but exhibit
limitations in practical situations where the true directions are not in the
sampling grid. A so-called off-grid model is broached to reduce the modeling
error caused by the off-grid directions.
This paper constructs dynamical models and estimation algorithms for the
concentration of target molecules in a fluid flow using an array of novel
biosensors. Each biosensor is constructed out of protein molecules embedded in
a synthetic cell membrane. The concentration evolves according to an
advection-diffusion partial differential equation which is coupled with
chemical reaction equations on the biosensor surface.
Sentiment analysis is a new area in text analytics where it focuses on the
analysis and understanding of the emotions from the text patterns. This new
form of analysis has been widely adopted in customer relation management
especially in the context of complaint management. With increasing level of
interest in this technology, more and more companies are adopting it and using
it to champion their marketing efforts. However, sentiment analysis using
twitter has remained extremely difficult to manage due to the sampling bias.
Covariance is used as an inner product on a formal vector space built on n
random variables to define measures of correlation Md across a set of vectors
in a d-dimensional space. For d = 1, one has the diameter; for d = 2, one has
an area. These concepts are directly applied to correlation studies in climate
science.
We show how the newly developed dynamic tree model can support variable
selection and a sensitivity analysis of inputs, two tasks usually requiring
disparate model structure. To this end, we adapt methods used in conjunction
with static tree models and Gaussian process models (GPs).
Chaos and oscillations continue to capture the interest of both the
scientific and public domains. Yet despite the importance of these qualitative
features, most attempts at constructing mathematical models of such phenomena
have taken an indirect, quantitative approach, e.g. by fitting models to a
finite number of data-points. Here we develop a qualitative inference framework
that allows us to both reverse engineer and design systems exhibiting these and
other dynamical behaviours by directly specifying the desired characteristics
of the underlying dynamical attractor.
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions.
We develop a new statistical method for estimating functional connectivity
between neurophysiological signals represented by a multivariate time series.
We use partial coherence as the measure of functional connectivity. Partial
coherence identifies the frequency bands that drive the direct linear
association between any pair of channels. To estimate partial coherence, one
would first need an estimate of the spectral density matrix of the multivariate
time series.
Global expression analyses using microarray technologies are becoming more
common in genomic research, therefore, new statistical challenges associated
with combining information from multiple studies must be addressed. In this
paper we will describe our proposal for an adaptively weighted (AW) statistic
to combine multiple genomic studies for detecting differentially expressed
genes. We will also present our results from comparisons of our proposed AW
statistic to Fisher's equally weighted (EW), Tippett's minimum $p$-value (minP)
and Pearson's (PR) statistics.
Given a set of aligned sequences of independent noisy observations, we are
concerned with detecting intervals where the mean values of the observations
change simultaneously in a subset of the sequences. The intervals of changed
means are typically short relative to the length of the sequences, the subset
where the change occurs, the "carriers," can be relatively small, and the sizes
of the changes can vary from one sequence to another. This problem is motivated
by the scientific problem of detecting inherited copy number variants in
aligned DNA samples.
We develop a Bayesian statistical model and estimation methodology based on
Forward Projection Adaptive Markov chain Monte Carlo in order to perform the
calibration of a high-dimensional non-linear system of Ordinary Differential
Equations representing an epidemic model for Human Papillomavirus types 6 and
11 (HPV-6, HPV-11). The model is compartmental and involves stratification by
age, gender and sexual activity-group.
In this paper we bring to bear some new tools from statistical learning on
the analysis of roll call data. We present a new data-driven model for roll
call voting that is geometric in nature. We construct the model by adapting the
"Partition Decoupling Method," an unsupervised learning technique originally
developed for the analysis of families of time series, to produce a multiscale
geometric description of a weighted network associated to a set of roll call
votes.
We develop a geometrical interpretation of ternary probabilistic forecasts in
which forecasts and observations are regarded as points inside a triangle.
Within the triangle, we define a continuous colour palette in which hue and
colour saturation are defined with reference to the observed climatology. In
contrast to current methods, forecast maps created with this colour scheme
convey all of the information present in each ternary forecast.
Unbiased, label-free proteomics is becoming a powerful technique for
measuring protein expression in almost any biological sample. The output of
these measurements after preprocessing are a collection of features (10's to
100's of thousands) and their associated intensities for each sample. Subsets
of features within the data are from the same peptide, subsets of peptides are
from the same protein, and subsets of proteins are in the same biological
pathways, therefore there is the potential for very complex and informative
correlational structure inherent in this data.
Longitudinal imaging studies are essential to understanding the neural
development of neuropsychiatric disorders, substance use disorders, and the
normal brain. The main objective of this paper is to develop a two-stage
adjusted exponentially tilted empirical likelihood (TETEL) for the spatial
analysis of neuroimaging data from longitudinal studies. The TETEL method as a
frequentist approach allows us to efficiently analyze longitudinal data without
modeling temporal correlation and to classify different time-dependent
covariate types.
Spatial Independent Component Analysis (ICA) decomposes the time by space
functional MRI (fMRI) matrix into a set of 1-D basis time courses and their
associated 3-D spatial maps that are optimized for mutual independence. When
applied to resting state fMRI (rsfMRI), ICA produces several spatial
independent components (ICs) that seem to have biological relevance - the
so-called resting state networks (RSNs). The ICA problem is well posed when the
true data generating process follows a linear mixture of ICs model in terms of
the identifiability of the mixing matrix.
The increasing availability of longitudinal student achievement data has
heightened interest among researchers, educators and policy makers in using
these data to evaluate educational inputs, as well as for school and possibly
teacher accountability. Researchers have developed elaborate "value-added
models" of these longitudinal data to estimate the effects of educational
inputs (e.g., teachers or schools) on student achievement while using prior
achievement to adjust for nonrandom assignment of students to schools and
classes.
Confounding of three binary-variables counterfactual model is discussed in
this paper. According to the effect between the control variable and the
covariate variable, we investigate three counterfactual models: the control
variable is independent of the covariate variable, the control variable has the
effect on the covariate variable and the covariate variable affects the control
variable.
This manuscript considers the following "graph classification" question:
given a collection of graphs and associated classes, how can one predict the
class of a newly observed graph? To address this question we propose a
statistical model for graph/class pairs. This model naturally leads to a set of
estimators to identify the class-conditional signal, or "signal subgraph,"
defined as the collection of edges that are probabilistically different between
the classes.
Variance estimation for estimators of state, county, and school district
quantities derived from the Census 2000 long form are discussed. The variance
estimator must account for (1) uncertainty due to imputation, and (2) raking to
census population controls.
Markowitz's celebrated mean--variance portfolio optimization theory assumes
that the means and covariances of the underlying asset returns are known.
We study the causal effect of winning an Oscar Award on an actor or actress's
survival. Does the increase in social rank from a performer winning an Oscar
increase the performer's life expectancy? Previous studies of this issue have
suffered from healthy performer survivor bias, that is, candidates who are
healthier will be able to act in more films and have more chance to win Oscar
Awards. To correct this bias, we adapt Robins' rank preserving structural
accelerated failure time model and $g$-estimation method.
Recent technological advances have made it possible to simultaneously measure
multiple protein activities at the single cell level. With such data collected
under different stimulatory or inhibitory conditions, it is possible to infer
the causal relationships among proteins from single cell interventional data.
In this article we propose a Bayesian hierarchical modeling framework to infer
the signaling pathway based on the posterior distributions of parameters in the
model.
Several approaches have been developed for forecasting mortality using the
stochastic model. In particular, the Lee-Carter model has become widely used
and there have been various extensions and modifications proposed to attain a
broader interpretation and to capture the main features of the dynamics of the
mortality intensity.
Contemporary scientific studies often rely on the understanding of complex
quantum systems via computer simulation. This paper initiates the statistical
study of quantum simulation and proposes a Monte Carlo method for estimating
analytically intractable quantities. We derive the bias and variance for the
proposed Monte Carlo quantum simulation estimator and establish the asymptotic
theory for the estimator. The theory is used to design a computational scheme
for minimizing the mean square error of the estimator.
The Burning Index (BI) produced daily by the United States government's
National Fire Danger Rating System is commonly used in forecasting the hazard
of wildfire activity in the United States. However, recent evaluations have
shown the BI to be less effective at predicting wildfires in Los Angeles
County, compared to simple point process models incorporating similar
meteorological information.
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging.
We consider the estimation of wealth inequality measures with their
confidence interval, based on survey data with interval censoring. We rely on a
Bayesian hierarchical model. It consists of a model where, due to survey
sampling and unit nonresponse, the summaries of the wealth distribution of
households are observed with error; a mixture of multivariate models for the
wealth components where groups correspond to portfolios of assets; and a prior
on the parameters. A Gibbs sampler is used for numerical purposes to do the
inference. We apply this strategy to the French 2004 Wealth Survey.
Dose-finding studies are frequently conducted to evaluate the effect of
different doses or concentration levels of a compound on a response of
interest. Applications include the investigation of a new medicinal drug, a
herbicide or fertilizer, a molecular entity, an environmental toxin, or an
industrial chemical. In pharmaceutical drug development, dose-finding studies
are of critical importance because of regulatory requirements that marketed
doses are safe and provide clinically relevant efficacy.
Neural spike trains, which are sequences of very brief jumps in voltage
across the cell membrane, were one of the motivating applications for the
development of point process methodology. Early work required the assumption of
stationarity, but contemporary experiments often use time-varying stimuli and
produce time-varying neural responses. More recently, many statistical methods
have been developed for nonstationary neural point process data.
Sensor-based degradation signals measure the accumulation of damage of an
engineering system using sensor technology. Degradation signals can be used to
estimate, for example, the distribution of the remaining life of partially
degraded systems and/or their components. In this paper we present a
nonparametric degradation modeling framework for making inference on the
evolution of degradation signals that are observed sparsely or over short
intervals of times.
For many decades, ultra-high energy charged particles have been a puzzle for
particle physicists and astrophysicists. Nor the sites of production, nor the
mechanism responsible for the generation of these ultra-energetic `cosmic rays'
(CR) are currently known. They seem to arrive from random direction in the sky,
although the most energetic ones, which are not deflected much by the magnetic
fields, are supposed to point towards their source with good accuracy.
Discussion of "Network routing in a dynamic environment" by N.D. Singpurwalla
[arXiv:1107.4852]
Deducing the structure of neural circuits is one of the central problems of
modern neuroscience. Recently-introduced calcium fluorescent imaging methods
permit experimentalists to observe network activity in large populations of
neurons, but these techniques provide only indirect observations of neural
spike trains, with limited time resolution and signal quality. In this work we
present a Bayesian approach for inferring neural circuitry given this type of
imaging data.
Determining the magnitude and location of neural sources within the brain
that are responsible for generating magnetoencephalography (MEG) signals
measured on the surface of the head is a challenging problem in functional
neuroimaging. The number of potential sources within the brain exceeds by an
order of magnitude the number of recording sites. As a consequence, the
estimates for the magnitude and location of the neural sources will be
ill-conditioned because of the underdetermined nature of the problem.
Effective connectivity analysis provides an understanding of the functional
organization of the brain by studying how activated regions influence one
other. We propose a nonparametric Bayesian approach to model effective
connectivity assuming a dynamic nonstationary neuronal system. Our approach
uses the Dirichlet process to specify an appropriate (most plausible according
to our prior beliefs) dynamic model as the "expectation" of a set of plausible
models upon which we assign a probability distribution. This addresses model
uncertainty associated with dynamic effective connectivity.
This paper proposes a simple method to evaluate batsmen and bowlers in
cricket. The idea in this paper refines "book cricket" and evaluates a batsman
by answering the question: How many runs a team consisting of same player
replicated eleven times will score?
In the game of Scrabble, letter tiles are drawn uniformly at random from a
bag. The variability of possible draws as the game progresses is a source of
variation that makes it more likely for an inferior player to win a
head-to-head match against a superior player, and more difficult to determine
the true ability of a player in a tournament or contest.
This paper presents the application of a particle filter for data
assimilation in the context of puff-based dispersion models. Particle filters
provide estimates of the higher moments, and are well suited for strongly
nonlinear and/or non-Gaussian models. The Gaussian puff model SCIPUFF, is used
in predicting the chemical concentration field after a chemical incident. This
model is highly nonlinear and evolves with variable state dimension and, after
sufficient time, high dimensionality.
While five-factor models of personality are widespread, there is still not
universal agreement on this as a structural framework. Part of the reason for
the lingering debate is its dependence on factor analysis. In particular,
derivation or refutation of the model via other statistical means is a
worthwhile project.
It has been argued by Daryl Bem in his 2011 paper that 8 out of 9 experiments
yielded statistically significant results in favour of the psi effect. It is
pointed out in this short communication that many of the results in the above
mentioned paper could be explained by using well known concepts in statistics
such as Confidence Level and Standard Error of the Sample Mean. This short
communication also discusses implied confidence level and confidence intervals
in polling results.
Vocal tract resonance characteristics in acoustic speech signals are
classically tracked using frame-by-frame point estimates of formant frequencies
followed by candidate selection and smoothing using dynamic programming methods
that minimize ad hoc cost functions. The goal of the current work is to provide
both point estimates and associated uncertainties of center frequencies and
bandwidths in a statistically principled state-space framework.
We introduce a method that defines the species (representatives) of inorganic
compounds, and studied the statistical distribution of the defined species
among space groups (distribution of space groups), by using ICSD (Inorganic
Crystal Structure Database). Here we show that the number of formula units in a
unit cell gives a natural classification to understand the statistical
distribution of crystallographic groups.
A series of ten plant species belonging to Magnoliopsida - Dicotyledons class
were analyzed in terms of chemical compounds distribution of abundance,
starting from the assumption that these distributions should give a picture of
similarities and differences between plants metabolism. From a pool of
theoretical distributions, log-normal distribution was selected giving the best
accuracy with the modeled phenomena and agreement with the observed data.
Performance period determination and bad definition for credit scorecard has
been a mix of fortune for the typical data modeler. The lack of literature on
these matters led to a proliferation of approaches and techniques to solve the
problems. However, the most commonly accepted approach involves subjective
interpretations of the performance period and bad definition as well as being
chicken and egg problem. These complications result in poorly developed credit
scorecard with minimal benefits to the banks.
The Bernoulli Factory is an algorithm that takes as input a series of i.i.d.
Bernoulli random variables with an unknown but fixed success probability $p$,
and outputs a corresponding series of Bernoulli random variables with success
probability $f(p)$, where the function $f$ is known and defined on the interval
$[0,1]$. While several practical uses of the method have been proposed in Monte
Carlo applications, these require an implementation framework that is flexible,
general and efficient.
The rupture of an abdominal aortic aneurysm (AAA) is associated with a high
mortality. When an AAA ruptures, 50% of the patients die before reaching the
hospital. Of the patients that are able to reach the operating room, only 50%
have it successfully repaired (Fillinger et al, 2003). Therefore, it is
important to find good predictors for immediate risk of rupture. Clinically,
the size of the aneurysm is the variable vascular surgeons usually use to
evaluate this risk.
We explore the use of generalized t priors on regression coefficients to help
understand the nature of association signal within "hit regions" of genome-wide
association studies. The particular generalized t distribution we adopt is a
Student distribution on the absolute value of its argument. For low degrees of
freedom we show that the generalized t exhibits 'sparsity-prior' properties
with some attractive features over other common forms of sparse priors and
includes the well known double-exponential distribution as the degrees of
freedom tends to infinity.