We develop inference procedures for policy analysis based on regression
methods. We consider policy interventions that correspond to either changes in
the distribution of covariates, or changes in the conditional distribution of
the outcome given covariates, or both. Under either of these policy scenarios,
we derive functional central limit theorems for regression-based estimators of
the status quo and counterfactual marginal distributions.
High-dimensional tests are applied to find relevant sets of variables and
relevant models. If variables are selected by analyzing the sums of products
matrices and a corresponding mean-value test is performed, there is the danger
that the nominal error of first kind is exceeded. In the paper, well-known
multivariate tests receive a new mathematical interpretation such that the
error of first kind of the combined testing and selecting procedure can more
easily be kept.
Misperceptions about extreme dependencies between different financial assets
have been an im- portant element of the recent financial crisis. This paper
studies inhomogeneity in dependence structures using Markov switching regular
vine copulas. These account for asymmetric depen- dencies and tail dependencies
in high dimensional data. We develop methods for fast maximum likelihood as
well as Bayesian inference. Our algorithms are validated in simulations and
applied to financial data.
We consider the problem of modeling the dependence among many time series. We
build high dimensional time-varying copula models by combining pair-copula
constructions (PCC) with stochastic autoregressive copula (SCAR) models to
capture dependence that changes over time. We show how the estimation of this
highly complex model can be broken down into the estimation of a sequence of
bivariate SCAR models, which can be achieved by using the method of simulated
maximum likelihood.
Since Aas et al. (2009) introduced inference of multivariate copulae
constructed through pair-copula decompositions to the statistical community,
interest in these models has been growing steadily and they are finding
successful applications in various fields. Research so far has however been
concentrating on so-called canonical and D-vine copulae. In this article, we
discuss the more general class of regular vines.
While there is substantial need for dependence models in high dimensions,
most existing models strongly suffer from the curse of dimensionality and
barely balance parsimony and flexibility. In this paper, the new class of
hierarchical Kendall copulas is proposed which tackles these problems.
Constructed with flexible copulas specified for groups of variables in
different hierarchical levels, hierarchical Kendall copulas are able to model
complex dependence patterns without severe restrictions.
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive.
A new family of tree models is proposed, which we call "differential trees."
A differential tree model is constructed from multiple data sets and aims to
detect distributional differences between them. The new methodology differs
from the existing difference and change detection techniques in its
nonparametric nature, model construction from multiple data sets, and
applicability to high-dimensional data.
We study a Bayesian approach to nonparametric estimation of the periodic
drift function of a one-dimensional diffusion from continuous-time data. We
rewrite the likelihood in terms of Riemann integrals, by introducing the local
time of the process, and specify a centered Gaussian prior on the drift with a
precision operator that is of differential form. It is proved that this is a
conjugate prior for the likelihood and hence that the posterior is also
Gaussian.
A general approach for Bayesian filtering of multi-object systems is studied,
with particular emphasis on the model where each object generates observations
independently of other objects. The approach is based on variational calculus
applied to generating functionals, using the general version of Faa di Bruno's
formula for Gateaux differentials. This result enables us to determine some
general formulae for the updated generating functional after the application of
a multi-object analogue of Bayes' rule.
Many regularization schemes for high-dimensional regression have been put
forward. Most require the choice of a tuning parameter, using model selection
criteria or cross-validation schemes. We show that a simple non-negative or
sign-constrained least squares is a very simple and effective regularization
technique for a certain class of high-dimensional regression problems. The sign
constraint has to be derived via prior knowledge or an initial estimator but no
further tuning or cross-validation is necessary. The success depends on
conditions that are easy to check in practice.
Meta-analysis involves combining summary information for related but
independent studies. It uses different relationship to combine position measure
as well as dispersion measures. The objective of this study is to discuss a
relationship among the standard deviation of a data set and the standard
deviation and mean of two part of this set. The problem was proposed in a
systematic review with meta-analysis that combined two studies with missing
data.
Don Fraser has given an interesting account of the agreements and
disagreements between Bayesian posterior probabilities and confidence levels.
In this comment I discuss some cases where the lack of such agreement is
extreme. I then discuss a few cases where it is possible to have Bayes
procedures with frequentist validity. Such frequentist-Bayesian---or
Frasian---methods deserve more attention [arXiv:1112.5582].
Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A.
S. Fraser [arXiv:1112.5582].
Discussion of "Is Bayes Posterior just Quick and Dirty Confidence?" by D. A.
S. Fraser [arXiv:1112.5582].
This paper derives the asymptotic distribution of variance weighted
Kolmogorov-Smirnov statistics for conditional moment inequality models for the
case of a one dimensional covariate. The asymptotic distribution depends on the
data generating process only through the variance of a single random variable,
leading to critical values that can be calculated analytically. By arguments in
Armstrong (2011b), the resulting tests achieve the best minimax rate for local
alternatives out of available approaches in a broad class of settings.
We consider the simulation of distributions that are a mixture of discrete
and continuous components. We extend a Metropolis-Hastings-based perfect
sampling algorithm of Corcoran and Tweedie to allow for a broader class of
transition candidate densities. The resulting algorithm, know as a "class
coupler", is fast to implement and is applicable to purely discrete or purely
continuous densities as well. Our work is motivated by the study of a composite
hypothesis test in a Bayesian setting via posterior simulation and we give
simulation results for some problems in this area.
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models.
Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within
biological pathways, the incorporation of prior pathways information into a
statistical model is expected to increase the power to detect true associations
in a genetic association study. Most existing pathways-based methods rely on
marginal SNP statistics and do not fully exploit the dependence patterns among
SNPs within pathways.
Cluster analysis of biological samples using gene expression measurements is
a common task which aids the discovery of heterogeneous biological
sub-populations having distinct mRNA profiles. Several model-based clustering
algorithms have been proposed in which the distribution of gene expression
values within each sub-group is assumed to be Gaussian. In the presence of
noise and extreme observations, a mixture of Gaussian densities may over-fit
and overestimate the true number of clusters.
Data collection at a massive scale is becoming ubiquitous in a wide variety
of settings, from vast offline databases to streaming real-time information.
Learning algorithms deployed in such contexts must rely on single-pass
inference, where the data history is never revisited. In streaming contexts,
learning must also be temporally adaptive to remain up-to-date against
unforeseen changes in the data generating mechanism. Although rapidly growing,
the online Bayesian inference literature remains challenged by massive data and
transient, evolving data streams.
We discuss a parametric family of binary distributions for modelling and
sampling high-dimensional binary data with strong dependencies. We extend the
linear conditionals family proposed by Qaqish (2003) to a non-linear
conditionals family which we show to encompass every feasible combination of
mean vector and correlation matrix. We can both sample from this parametric
family and evaluate its mass function point-wise which allows for immediate use
in the context of stochastic optimization, importance sampling or Markov chain
algorithms.
In this paper we study a bootstrap strategy for estimating the variance of a
mean taken over large multifactor crossed random effects data sets. We apply
bootstrap reweighting independently to the levels of each factor, giving each
observation the product of independently sampled factor weights. No exact
bootstrap exists for this problem (McCullagh, 2000). We show that the proposed
bootstrap is mildly conservative, meaning biased towards overestimating the
variance, under sufficient conditions that allow very unbalanced and
heteroscedastic inputs.
A pair-copula construction is a decomposition of a multivariate copula into a
structured system, called regular vine, of bivariate copulae or pair-copulae.
The standard practice is to model these pair-copulae parametrically, which
comes at the cost of a large model risk, with errors propagating throughout the
vine structure. The empirical pair-copula proposed in the paper provides a
nonparametric alternative still achieving the parametric convergence rate.
We propose a nested Gaussian process (nGP) as a locally adaptive prior for
Bayesian nonparametric regression. Specified through a set of stochastic
differential equations (SDEs), the nGP imposes a Gaussian process prior for the
function's $m$th-order derivative. The nesting comes in through including a
local instantaneous mean function, which is drawn from another Gaussian process
inducing adaptivity to locally-varying smoothness. We discuss the support of
the nGP prior in terms of the closure of a reproducing kernel Hilbert space,
and consider theoretical properties of the posterior.
Early detection of disease outbreaks is of paramount importance to
implementing intervention strategies to mitigate the severity and duration of
the outbreak. We build methodology that utilizes the characteristic profile of
disease outbreaks to reduce the time to detection and false positive rate. We
model daily counts through a Poisson distribution with additive background plus
outbreak components. The outbreak component has a parametric form with unknown
underlying parameters. A mixture likelihood ratio scan statistic is developed
to maximize parameters over a window in time.
Consider a one-way analysis of covariance model. Suppose that the parameter
of interest theta is a specified linear contrast of the expected responses, for
a given value of the covariate. Also suppose that the inference of interest is
a 1-alpha confidence interval for theta. The following two-stage procedure has
been proposed to determine the form of the model. In Stage 1, we carry out an F
test of the null hypothesis that the slopes are all zero against the
alternative hypothesis that they are not all zero.
We are concerned with the problem of detecting whether an associations of any
kind exists between random vectors of any dimension. Few tests of independence
exist to date that are consistent against all dependent alternatives. We
propose a powerful test that is applicable in all dimensions, is robust to
outliers, and is consistent against all alternatives. The test has a simple
form and is easy to implement. We demonstrate its good power properties in
simulations and on an example.
During the last decade Levy processes with jumps have received increasing
popularity for modelling market behaviour for both derviative pricing and risk
management purposes. Chan et al. (2009) introduced the use of empirical
likelihood methods to estimate the parameters of various diffusion processes
via their characteristic functions which are readily avaiable in most cases.
Return series from the market are used for estimation.
Inference for causal effects can benefit from the availability of an
instrumental variable (IV) which, by definition, is associated with the given
exposure, but not with the outcome of interest other than through a causal
exposure effect.
For the decomposability property is very a practical one in Welfare analysis,
most researchers and users favor decomposable poverty indices such as the
Foster-Greer-Thorbeck poverty index. This may lead to neglect the so important
weighted indices like the Kakwani and Shorrocks ones which have interesting
other properties in Welfare analysis.
David Ross Brillinger was born on the 27th of October 1937, in Toronto,
Canada. In 1955, he entered the University of Toronto, graduating with a B.A.
with Honours in Pure Mathematics in 1959, while also serving as a Lieutenant in
the Royal Canadian Naval Reserve. He was one of the five winners of the Putnam
mathematical competition in 1958. He then went on to obtain his M.A. and Ph.D.
in Mathematics at Princeton University, in 1960 and 1961, the latter under the
guidance of John W. Tukey.
Statistical models that include random effects are commonly used to analyze
longitudinal and correlated data, often with strong and parametric assumptions
about the random effects distribution. There is marked disagreement in the
literature as to whether such parametric assumptions are important or
innocuous.
We propose a simple and intuitive algorithm for clustering analysis. This
algorithm stands from the viewpoint of elements to be clustered, and simulates
the process of how they perform self-clustering. At the end of the process,
elements belong to the same cluster converge to the same position, which
represents the cluster's location in a p-dimensional space. The algorithm also
manages to isolate noise, therefore is able to produce satisfactory clustering
results even when the level of noise is high enough to obscure or distort the
underlying patterns in the data.
Respondent-driven sampling (RDS) is a commonly used substitute for random
sampling when studying hidden populations, such as injective drug users or men
who have sex with men, for which no sampling frame is known. The method works
like a snowball sample but can, given that some assumptions are met, generate
unbiased population estimates. One key assumption, not likely to be met, is
that the acquaintance network in which the recruitment process takes place is
undirected, meaning that all recruiters should have the potential to be
recruited by the person they recruit.
Models for distributions of shapes contained within images can be widely used
in biomedical applications ranging from tumor tracking for targeted radiation
therapy to classifying cells in a blood sample. Our focus is on hierarchical
probability models for the shape and size of simply connected 2D closed curves,
avoiding the need to specify landmarks through modeling the entire curve while
borrowing information across curves for related objects.
We use bias-reduced estimators of high quantiles, of heavy-tailed
distributions, to introduce a new estimator of the mean in the case of infinite
second moment. The asymptotic normality of the proposed estimator is
established and checked, in a simulation study, by four of the most popular
goodness-of-fit tests for different sample sizes. Moreover, we compare, in
terms of bias and mean squared error, our estimator with Peng's estimator
(Peng, 2001) and we evaluate the accuracy of some resulting confidence
intervals.
In survey statistics, the usual technique for estimating a population total
consists in summing appropriately weighted variable values for the units in the
sample. Different weighting systems exit: sampling weights, GREG weights or
calibration weights for example. In this article, we propose to use the inverse
of conditional inclusion probabilities as weighting system. We study examples
where an auxiliary information enables to perform an a posteriori
stratification of the population. We show that, in these cases, exact
computations of the conditional weights are possible.
The model for homogeneity of proportions in a two-way
contingency-table/cross-tabulation is the same as the model of independence,
except that the probabilistic process generating the data is viewed as fixing
the column totals (but not the row totals).
Goodness-of-fit tests based on the Euclidean distance often outperform
chi-square and other classical tests (including the standard exact tests) by at
least an order of magnitude when the model being tested for goodness-of-fit is
a discrete probability distribution that is not close to uniform. The present
article discusses numerous examples of this.
Rejoinder to "Feature Matching in Time Series Modeling" by Y. Xia and H. Tong
[arXiv:1104.3073]
Nowadays, the high-precision estimation of nonlinear parameters such as
quantiles, Gini indices or other measures of inequality is particularly
crucial. In the present paper, we propose a general class of estimators for
such parameters that take into account complete univariate auxiliary
information. We construct unique survey weights through a nonparametric
model-assisted approach that can be used by means of the plugg-in principle to
estimate the nonlinear parameters.
Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
Tong [arXiv:1104.3073]
Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
Tong [arXiv:1104.3073]
Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
Tong [arXiv:1104.3073]
Discussion of "Feature Matching in Time Series Modeling" by Y. Xia and H.
Tong [arXiv:1104.3073]
This report is a collection of comments on the Read Paper of Fearnhead and
Prangle (2011), to appear in the Journal of the Royal Statistical Society
Series B, along with a reply from the authors.
In applied sciences, we often deal with deterministic simulation models that
are too slow for simulation-intensive tasks such as calibration or real-time
control. In this paper, an emulator for a generic dynamic model, given by a
system of ordinary non-linear differential equations, is developed. The
non-linear differential equations are linearized and Gaussian white noise is
added to account for the non-linearities. The resulting linear stochastic
system is conditioned on a set of solutions of the non-linear equations that
have been calculated prior to the emulation.
This paper has been withdrawn by the author because it has been substantially
modified.
Bayes linear analysis and approximate Bayesian computation (ABC) are
techniques commonly used in the Bayesian analysis of complex models. In this
article we connect these ideas by demonstrating that regression-adjustment ABC
algorithms produce samples for which first and second order moment summaries
approximate adjusted expectation and variance for a Bayes linear analysis. This
gives regression-adjustment methods a useful interpretation and role in
exploratory analysis in high-dimensional problems.
Benjamini and Hochberg (1995) proposed the false discovery rate (FDR) as an
alternative to the FWER in multiple testing problems, and proposed a procedure
to control the FDR. For discrete data this procedure may be highly
conservative. We investigate alternative, more powerful, procedures that
exploit the discreteness of the tests and have FDR levels closer in magnitude
to the desired nominal level. Moreover, we develop a novel step-down procedure
that dominates the step-down procedure of Benjamini and Liu (1999) for discrete
data.
This paper outlines a uni?ed framework for high dimensional variable
selection for classification problems. Traditional approaches to ?nding
interesting variables mostly utilize only partial information through moments
(like mean difference). On the contrary, in this paper we address the question
of variable selection in full generality from a distributional point of view.
If a variable is not important for classification, then it will have similar
distributional aspect under different classes.
We propose a relaxed privacy definition called {\em random differential
privacy} (RDP). Differential privacy requires that adding any new observation
to a database will have small effect on the output of the data-release
procedure. Random differential privacy requires that adding a {\em randomly
drawn new observation} to a database will have small effect on the output. We
show an analog of the composition property of differentially private procedures
which applies to our new definition.
In this paper we propose a sparse coefficient estimation procedure for
autoregressive (AR) models based on penalized conditional maximum likelihood.
The penalized conditional maximum likelihood estimator (PCMLE) thus developed
has the advantage of performing simultaneous coefficient estimation and model
selection. Mild conditions are given on the penalty function and the innovation
process, under which the PCMLE satisfies a strong consistency, local $N^{-1/2}$
consistency, and oracle property, respectively, where N is sample size.
Signal identification in large-dimensional settings is a challenging problem
in biostatistics. Recently, the method of higher criticism (HC) was shown to be
an effective means for determining appropriate decision thresholds. Here, we
study HC from a false discovery rate (FDR) perspective. We show that the HC
threshold is best viewed as an approximation to a natural Bayesian decision
threshold which in turn is expressible as a specific FDR threshold.
We present a new computational approach to approximating a large, noisy data
table by a low-rank matrix with sparse singular vectors. The approximation is
obtained from thresholded subspace iterations that produce the singular vectors
simultaneously, rather than successively as in competing proposals. We
introduce novel ways to estimate thresholding parameters which obviate the need
for computationally expensive cross-validation.
In this paper, I present a new solution method for sparse regression using L0
regularization. The model introduces a sparseness mechanism in the likelihood,
instead of in the prior, as is done in the spike and slab model. The posterior
probability is computed in the variational approximation. The variational
parameters appear in the approximate model in a way that is similar to
Breiman's Garrote model. I refer to this method as the variational Garrote
(VG). The VG is compared numerically with the Lasso method and with ridge
regression.
In this paper we have demonstrated a complete framework for the analysis of
microarray time series data. The unique characteristics of microarry data lend
themselves well to a functional data analysis approach and we have shown how
this naturally extends to the inclusion of covariates such as age and sex.
When animals are transported and pass through customs, some of them may have
dangerous infectious diseases. Typically, due to the cost of testing, not all
animals are tested: a reasonable selection must be made. How to test
effectively, yet avoid cataclysmic events?
In this paper we discuss the variable selection method from \ell0-norm
constrained regression, which is equivalent to the problem of finding the best
subset of a fixed size. Our study focuses on two aspects, consistency and
computation. We prove that the sparse estimator from such a method can retain
all of the important variables asymptotically for even exponentially growing
dimensionality under regularity conditions.
To successfully work on variable selection, sparse model structure has become
a basic assumption for all existing methods. However, this assumption is
questionable as it is hard to hold in most of cases and none of existing
methods may provide consistent estimation and accurate model prediction in
nons-parse scenarios.
In this article two methods to distinguish between polynomial and exponential
tails are introduced. The methods are mainly based on the properties of the
residual coefficient of variation for the exponential and non-exponential
distributions. A graphical method, called CV-plot, shows departures from
exponentiality in the tails. It is, in fact, the empirical coefficient of
variation of the conditional excedance over a threshold. The plot is applied to
the daily log-returns of exchange rates of US dollar and Japan yen.
A powerful study design in the fields of genomics and metabolomics is the
'replicated time course experiment' where individual time series are observed
for a sample of biological units, such as human patients, termed replicates.
Standard practice for analysing these data sets is to fit each variable (e.g.
gene transcript) independently with a functional mixed-effects model to account
for between-replicate variance. However, such an independence assumption is
biologically implausible given that the variables are known to be highly
correlated.
The most popular approach in extreme value statistics is the modelling of
threshold exceedances using the asymptotically motivated generalised Pareto
distribution. This approach involves the selection of a high threshold above
which the model fits the data well. Sometimes, few observations of a
measurement process might be recorded in applications and so selecting a high
quantile of the sample as the threshold leads to almost no exceedances.
We study a class of semiparametric time series models with innovations
following a log-concave distribution. We propose a general maximum likelihood
framework which allows us to estimate simultaneously the parameters of a model
and the density of the innovations. This framework can be easily adapted to
many well-known models, including ARMA and GARCH. Furthermore, we show that the
estimator under our new framework is consistent in both ARMA and GARCH
settings.
In this paper linear canonical correlation analysis (LCCA) is generalized by
applying a structured transform to the joint probability distribution of the
considered pair of random vectors, i.e., a transformation of the joint
probability measure defined on their joint observation space. This framework,
called measure transformed canonical correlation analysis (MTCCA), applies LCCA
to the data after transformation of the joint probability measure.
In the typical analysis of a data set, a single method is selected for
statistical reporting even when equally applicable methods yield very different
results. Examples of equally applicable methods can correspond to those of
different ancillary statistics in frequentist inference and of different prior
distributions in Bayesian inference. More broadly, choices are made between
parametric and nonparametric methods and between frequentist and Bayesian
methods.
In the this paper, the authors propose to estimate the density of a targeted
population with a weighted kernel density estimator (wKDE) based on a weighted
sample. Bandwidth selection for wKDE is discussed. Three mean integrated
squared error based bandwidth estimators are introduced and their performance
is illustrated via Monte Carlo simulation. The least-squares cross-validation
method and the adaptive weight kernel density estimator are also studied.
This note is an extended review of the book Error and Inference, edited by
Deborah Mayo and Aris Spanos, about their frequentist and philosophical
perspective on testing of hypothesis and on the criticisms of alternatives like
the Bayesian approach.
Regularization techniques are widely used for tackling
high-dimension-low-sample-size problems. Yet, finding the right amount of
regularization can be challenging, especially in the unsupervised setting such
as structure learning problems where traditional methods such as BIC or
cross-validation often do not work well. In this paper, we propose a new method
--- Bootstrap Inference for Network COnstruction (BINCO) --- to infer networks
by directly controlling the false discovery rates (FDRs) of the selected edges.
This method utilizes the idea of model aggregation.
Let $X_{1},X_{2},...$ be a sequence of independent copies (s.i.c) of a real
random variable (r.v.) $X\geq 1$, with distribution function $df$
$F(x)=\mathbb{P}% (X\leq x)$ and let $X_{1,n}\leq X_{2,n} \leq ... \leq
X_{n,n}$ be the order statistics based on the $n\geq 1$ first of these
observations.
We introduce a method for aggregating many least squares estimator so that
the resulting estimate has two properties: sparsity and structure. That is,
only a few candidate covariates are used in the resulting model, and the
selected covariates follow some structure over the candidate covariates that is
assumed to be known a priori. While sparsity is well studied in many settings,
including aggregation, structured sparse methods are still emerging.
Recent work has focused on the problem of conducting linear regression when
the number of covariates is very large, potentially greater than the sample
size. To facilitate this, one useful tool is to assume that the model can be
well approximated by a fit involving only a small number of covariates -- a so
called sparsity assumption, which leads to the Lasso and other methods.
We consider the Pickands process {equation*} P_{n}(s)=\log (1/s)^{-1}\log
\frac{X_{n-k+1,n}-X_{n-[k/s]+1,n}}{% X_{n-[k/s]+1,n}-X_{n-[k/s^{2}]+1,n}},
{equation*} {equation*} (\frac{k}{n}\leq s^2 \leq 1), {equation*} which is a
generalization of the classical Pickands estimate $P_{n}(1/2)$ of the extremal
index. We undertake here a purely stochastic process view for the asymptotic
theory of that process by using the
Cs\"{o}rg\H{o}-Cs\"{o}rg\H{o}-Horv\'{a}th-Mason (1986) \cite{cchm} weighted
approximation of the empirical and quantile processes to suitable Brownian
bridges.
This technical report accompanies the manuscript "Conditional Modeling and
the Jitter Method of Spike Re-sampling." It contains further details, comments,
references, and equations concerning various simulations and data analyses
presented in that manuscript, as well as a self-contained Mathematical Appendix
that provides a formal treatment of jitter-based spike re-sampling methods.
We propose a Bayesian nonparametric approach to the problem of jointly
modeling multiple related time series. Our approach is based on the discovery
of a set of latent, shared dynamical behaviors. Using a beta process prior, the
size of the set and the sharing pattern are both inferred from data. We develop
efficient Markov chain Monte Carlo methods based on the Indian buffet process
representation of the predictive distribution of the beta process, without
relying on a truncated model.
We are concerned in this paper with the functional asymptotic behaviour of
the sequence of stochastic processes T_{n}(f)=\sum_{j=1}^{j=k}f(j)(\log
X_{n-j+1,n}-\log X_{n-j,n}), indexed by some classes $\mathcal{F}$ of functions
$f:\mathbb{N} \backslash {0} \longmapsto \mathbb{R}_{+}$ and where $k=k(n)$
satisfies 1\leq k\leq n,k/n\rightarrow 0\text{as}n\rightarrow \infty. This is a
functional generalized Hill process including as many new estimators of the
extremal index when $F$ is in the extremal domain.
We give an overview of several aspects arising in the statistical analysis of
extreme risks with actuarial applications in view. In particular it is
demonstrated that empirical process theory is a very powerful tool, both for
the asymptotic analysis of extreme value estimators and to devise tools for the
validation of the underlying model assumptions. While the focus of the paper is
on univariate tail risk analysis, the basic ideas of the analysis of the
extremal dependence between different risks are also outlined.
In the Bayesian community, an ongoing imperative is to develop efficient
algorithms. An appealing approach is to form a hybrid algorithm by combining
ideas from competing existing techniques. This paper addresses issues in
designing hybrid methods by considering selected case studies: the delayed
rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin
algorithm, and the population Monte Carlo algorithm. We observe that even if
each component of a hybrid algorithm has individual strengths, they may not
contribute equally or even positively when they are combined.
We present a model of voting behaviour based on a version of aggregated
overdispersed multinomial distributions; relative to a similar model by
\citet{BP86}, our model is based on more realistic assumptions and free from
certain shortcomings of the previous model.
Consider a linear regression model with n-dimensional response vector,
regression parameter \beta = (\beta_1, ..., \beta_p) and independent and
identically N(0, \sigma^2) distributed errors. Suppose that the parameter of
interest is \theta = a^T \beta where a is a specified vector. Define the
parameter \tau = c^T \beta - t where c and t are specified and a and c are
linearly independent. Also suppose that we have uncertain prior information
that \tau = 0.
The model evidence is a vital quantity in the comparison of statistical
models under the Bayesian paradigm. This paper presents a review of commonly
used methods. We outline some guidelines and offer some practical advice. The
reviewed methods are compared for two examples; non-nested Gaussian linear
regression and covariate subset selection in logistic regression.
In this work, we establish novel connections between the Bayesian
nonparametric clustering and featural paradigms by considering the problem of
admixture modeling. We examine the Dirichlet process-and its unnormalized
Poisson point process generation via the gamma process-on the traditional
clustering side of Bayesian nonparametrics. On the featural side, we examine
the beta process and introduce a new model, the beta negative binomial process
(BNBP), for admixture modeling.
For many decades, statisticians have made attempts to prepare the Bayesian
omelette without breaking the Bayesian eggs; that is, to obtain probabilistic
likelihood-based inferences without relying on informative prior distributions.
A recent example is Murray Aitkin's recent book, {\em Statistical Inference},
which presents an approach to statistical hypothesis testing based on
comparisons of posterior distributions of likelihoods under competing models.
Aitkin develops and illustrates his method using some simple examples of
inference from iid data and two-way tests of independence.
In genetic association analyses, it is often desired to analyze data from
multiple potentially-heterogeneous subgroups. The amount of expected
heterogeneity can vary from modest (as might typically be expected in a
meta-analysis of multiple studies of the same phenotype, for example), to large
(e.g. a strong gene-environment interaction, where the environmental exposure
defines discrete subgroups). Here, we consider a flexible set of Bayesian
models and priors that can capture these different levels of heterogeneity.
This paper introduces a general framework of covariance structures that can
be verified in many popular statistical models, such as factor and random
effect models. The new structure is a summation of low rank and sparse
matrices. We propose a LOw Rank and sparsE Covariance estimator (LOREC) to
exploit this general structure in the high-dimensional setting. Analysis of
this estimator shows that it recovers exactly the rank and support of the two
components respectively. Convergence rates under various norms are also
presented.
Large-scale multiple testing problems require the simultaneous assessment of
many p-values. This paper compares several methods to assess the evidence in
multiple binomial counts of p-values: the maximum of the binomial counts after
standardization (the `higher-criticism statistic'), the maximum of the binomial
counts after a log-likelihood ratio transformation (the `Berk-Jones
statistic'), and a newly introduced average of the binomial counts after a
likelihood ratio transformation.
We consider the problem of estimating multiple related but distinct graphical
models on the basis of a high-dimensional data set with observations that
belong to distinct classes. A motivating example occurs in the analysis of gene
expression data for tissue samples with and without cancer. In this case, we
might wish to estimate a gene expression network for the normal tissue and a
gene expression network for the tumor tissue.
Gaussian factor models have proven widely useful for parsimoniously
characterizing dependence in multivariate data. There is a rich literature on
their extension to mixed categorical and continuous variables, using latent
Gaussian variables or through generalized latent trait models acommodating
measurements in the exponential family. However, when generalizing to
non-Gaussian measured variables the latent variables typically influence both
the dependence structure and the form of the marginal distributions,
complicating interpretation and introducing artifacts.
Extremely contagious, acute, immunizing childhood infections like measles can
exhibit spatiotemporal dynamics that depend on the nature of spatial contagion
and spatiotemporal variations in population structure and demography. We study
a metapopulation model for regional measles dynamics that uses a gravity
coupling model and a time series susceptible- infected-recovered (TSIR) model
for local dynamics.
We consider the problem of estimating a mean shape from a set of J planar
configurations described by a sequence of k landmarks. We study the consistency
of a smoothed Procrustean mean when the observations obey a deformable model
including some nuisance parameters such as random translations, rotations and
scaling. The main contribution of the paper is to analyze the influence of the
dimension k of the data and of the number J of observed configurations on the
convergence of the smoothed Procrustean estimator to the mean pattern of the
model.
The generalizations of instantaneous frequency and instantaneous bandwidth to
a bivariate signal are derived. These are uniquely defined whether the signal
is represented as a pair of real-valued signals, or as one analytic and one
anti-analytic signal. A nonstationary but oscillatory bivariate signal has a
natural representation as an ellipse whose properties evolve in time, and this
representation provides a simple geometric interpretation for the bivariate
instantaneous moments.
We introduce a theoretical framework for performing statistical hypothesis
testing simultaneously over a fairly general, possibly uncountably infinite,
set of null hypotheses. This extends the standard statistical setting for
multiple hypotheses testing, which is restricted to a finite set. This work is
motivated by numerous modern applications where the observed signal is modeled
by a stochastic process over a continuum. As a measure of type I error, we
extend the concept of false discovery rate (FDR) to this setting.
When a posterior distribution has multiple modes, unconditional expectations,
such as the posterior mean, may not offer informative summaries of the
distribution. Motivated by this problem, we propose to decompose the sample
space of a multimodal distribution into domains of attraction of local modes.
Domain-based representations are defined to summarize the probability masses of
and conditional expectations on domains of attraction, which are much more
informative than the mean and other unconditional expectations.
In statistical modeling area, the Akaike information criterion AIC, is a
widely known and extensively used tool for model choice. The {\phi}-divergence
test statistic is a recently developed tool for statistical model selection.
The popularity of the divergence criterion is however tempered by their known
lack of robustness in small sample. In this paper the penalized minimum
Hellinger distance type statistics are considered and some properties are
established.
The purpose of this paper is to propose methodologies for statistical
inference of low-dimensional parameters with high-dimensional data. We focus on
constructing confidence intervals for individual coefficients and linear
combinations of several of them in a linear regression model, although our
ideas are applicable in a much broad context. The theoretical results presented
here provide sufficient conditions for the asymptotic normality of the proposed
estimators along with a consistent estimator for their finite-dimensional
covariance matrices.
In data mining, it is usually to describe a set of individuals using some
summaries (means, standard deviations, histograms, confidence intervals) that
generalize individual descriptions into a typology description. In this case,
data can be described by several values. In this paper, we propose an approach
for computing basic statics for such data, and, in particular, for data
described by numerical multi-valued variables (interval, histograms, discrete
multi-valued descriptions). We propose to treat all numerical multi-valued
variables as distributional data, i.e.
The proposed smooth blockwise iterative thresholding estimator (SBITE) is a
model selection technique defined as a fixed point reached by iterating a
likelihood gradient-based thresholding function. The smooth James-Stein
thresholding function has two regularization parameters $\lambda$ and $\nu$,
and a smoothness parameter $s$. It enjoys smoothness like ridge regression and
selects variables like lasso.
Network data often take the form of repeated interactions between senders and
receivers tabulated over time. A primary question to ask of such data is which
traits and behaviors are predictive of interaction. To answer this question, a
model is introduced for treating directed interactions as a multivariate point
process: a Cox multiplicative intensity model using covariates that depend on
the history of the process.
We study properties of Fisher distribution (von Mises-Fisher distribution,
matrix Langevin distribution) on the rotation group SO(3). In particular we
apply the holonomic gradient descent, introduced by Nakayama et al. (2011), and
a method of series expansion for evaluating the normalizing constant of the
distribution and for computing the maximum likelihood estimate. The rotation
group can be identified with the Stiefel manifold of two orthonormal vectors.
Therefore from the viewpoint of statistical modeling, it is of interest to
compare Fisher distributions on these manifolds.
The Statistics Consortium at the University of Maryland, College Park, hosted
a two-day workshop on Bayesian Methods that Frequentists Should Know during
April 30--May 1, 2008. The event was co-sponsored by the Institute of
Mathematical Statistics (IMS), Office of Research and Methodology, National
Center for Health Statistics, Survey Research Methods Section (SRMS) of the
American Statistical Association, and Washington Statistical Society.
This paper is devoted to the theory and application of a novel class of
models for binary data, which we call log-mean linear (LML) models. The
characterizing feature of these models is that they are specified by linear
constraints on the LML parameter, defined as a log-linear expansion of the mean
parameter of the multivariate Bernoulli distribution. We show that marginal
independence relationships between variables can be specified by setting
certain LML interactions to zero and, more specifically, that graphical models
of marginal independence are LML models.
We investigate a robust penalized logistic regression algorithm based on a
minimum distance criterion. Influential outliers are often associated with the
explosion of parameter vector estimates, but in the context of standard
logistic regression, the bias due to outliers always causes the parameter
vector to implode, that is shrink towards the zero vector. Thus, using
LASSO-like penalties to perform variable selection in the presence of outliers
can result in missed detections of relevant covariates.
Choice models, which capture popular preferences over objects of interest,
play a key role in making decisions whose eventual outcome is impacted by human
choice behavior. In most scenarios, the choice model, which can effectively be
viewed as a distribution over permutations, must be learned from observed data.
The observed data, in turn, may frequently be viewed as (partial, noisy)
information about marginals of this distribution over permutations.
Random projection is widely used as a method of dimension reduction. In
recent years, its combination with standard techniques of regression and
classification has been explored. Here we examine its use with principal
component analysis (PCA) and subspace detection methods. Specifically, we show
that, under appropriate conditions, with high probability the magnitude of the
residuals of a PCA analysis of randomly projected data behaves comparably to
that of the residuals of a similar PCA analysis of the original data.
This paper proposes a strategy for regularized estimation in multi-way
contingency tables, which are common in meta-analyses and multi-center clinical
trials. Our approach is based on data augmentation, and appeals heavily to a
novel class of Polya-Gamma distributions. Our main contributions are to build
up the relevant distributional theory and to demonstrate three useful features
of this data-augmentation scheme.
Consider a case-control study in which the aim is to assess the effect of a
factor on disease occurrence. We suppose that this factor is dichotomous. Also
suppose that the data consists of two strata, each stratum summarized by a
two-by-two table. A commonly-proposed two-stage analysis of this type of data
is the following. We carry out a preliminary test of homogeneity of the
stratum-specific odds ratios. If the null hypothesis of homogeneity is accepted
then we find a confidence interval for the assumed common value (across strata)
of the odds ratio.
In sparse regression modeling with regularization such as the lasso, elastic
net and bridge regression, it is important to select appropriate values of
tuning parameters including regularization parameters. The choice of tuning
parameters can be viewed as a model selection and evaluation problem. The
degrees of freedom, which leads to Mallows' $C_p$ criterion, plays a key role
in the theory of model selection. In the present paper, we propose an efficient
algorithm which computes the degrees of freedom sequentially by extending the
generalized path seeking (GPS) algorithm.
We propose a class of scale mixture of uniform distributions to generate
shrinkage priors for the covariance matrix. This new class of priors enjoys a
number of advantages over the traditional scale mixture of normal priors,
including its simplicity in characterizing the prior density based on its
first-order derivative and computationally efficiency based on a Gibbs sampler.
We first discuss the theory and computational details of this new approach for
the covariance matrix estimation.
We propose new affine invariant tests for multivariate normality, based on
independence characterizations of the sample moments of the normal
distribution. The test statistics are obtained using canonical correlations
between sets of sample moments, generalizing the Lin-Mudholkar test for
normality. The tests are compared to some popular tests based on Mardia's
skewness and kurtosis measures in an extensive simulation power study and are
found to offer higher power against many of the alternatives.
In randomized trials, researchers are often interested in mediation analysis
to understand how a treatment works, in particular how much of a treatment's
effect is mediated by an intermediated variable and how much the treatment
directly affects the outcome not through the mediator. The standard regression
approach to mediation analysis assumes sequential ignorability of the mediator,
that is that the mediator is effectively randomly assigned given baseline
covariates and the randomized treatment.
This paper introduces a new approach to analysing spatial point data
clustered along or around a system of curves or `fibres'. Such data arise in
catalogues of galaxy locations, recorded locations of earthquakes, aerial
images of minefields, and pore patterns on fingerprints. Finding the underlying
curvilinear structure of these point-pattern data sets may not only facilitate
a better understanding of how they arise but also aid reconstruction of missing
data. We base the space of fibres on the set of integral lines of an
orientation field.
Many epidemic models approximate social contact behavior by assuming random
mixing within mixing groups (e.g., homes, schools, workplaces). The effect of
more realistic social network structure on epidemic parameter estimates is an
open area of exploration. We develop a statistical model to estimate the social
contact network within a high school using friendship network data and a
contact survey. Our model includes classroom structure and longer and more
frequent contacts to friends than non-friends, based on reports in the contact
survey.
A conditional independence graph is a concise representation of pairwise
conditional independence among many variables. We propose Graphical Random
Forests (GRaFo) for estimating pairwise conditional independence relationships
among mixed-type, i.e. continuous and discrete, variables. The number of edges
is a tuning parameter in any graphical model estimator and there is no obvious
number that constitutes a good choice. Stability Selection helps choosing this
parameter with respect to a bound on the expected number of false positives
(error control).
A novel method is developed to jointly estimate regression curves applied to
the evolutionary biology for studying the trait relationships. The adaptive
evolution model is built on a coupled system of Ornstein-Ulenhbeck processes.
Our method is then applied to a set of ecological data and it is compared with
the recent regression method established in [9].
A dynamic treatment regime effectively incorporates both accrued information
and long-term effects of treatment from specially designed clinical trials. As
these become more and more popular in conjunction with longitudinal data from
clinical studies, the development of statistical inference for optimal dynamic
treatment regimes is a high priority.
This paper investigates an open issue related to false discovery rate (FDR)
control of step-up-down (SUD) multiple testing procedures. It has been
established in earlier literature that for this type of procedure, under some
broad conditions, and in an asymptotical sense, the FDR is maximum when the
signal strength under the alternative is maximum. In other words, so-called
"Dirac uniform configurations" are asymptotically {\em least favorable} in this
setting.
A function based nonlinear least squares estimation (FNLSE) method is
proposed and investigated in parameter estimation of Jelinski-Moranda software
reliability model. FNLSE extends the potential fitting functions of traditional
least squares estimation (LSE), and takes the logarithm transformed nonlinear
least squares estimation (LogLSE) as a special case.
In this paper we introduce a statistical model based on a permanental process
for supervised classification problems. Unlike many research work in the
literature, we assume only exchangeability instead of independence on
observations. Regardless of the number of classes or the dimension of the
feature variables, the model may require only 2-3 parameters for fitting the
covariance structure within clusters. It works well even if each class occupies
non-convex, disjoint regions, or regions overlapped with other classes in the
feature space.