Polynomial Chaos Expansions represent a powerful tool to simulate stochastic
models of dynamical systems. Yet, deriving the expansion's coefficients for
complex systems might require a significant and non-trivial manipulation of the
model, or the computation of large numbers of simulation runs, rendering the
approach too time consuming and impracticable for applications with more than a
handful of random variables. We introduce a novel computationally tractable
technique for computing the coefficients of polynomial chaos expansions.
Many problems arising in applications result in the need to probe a
probability distribution for functions. Examples include Bayesian nonparametric
statistics and conditioned diffusion processes. Standard MCMC algorithms
typically become arbitrarily slow under the mesh refinement dictated by
nonparametric description of the unknown function. We describe an approach to
modifying a whole range of MCMC methods which ensures that their speed of
convergence is robust under mesh refinement.
In this paper, we discuss an extension of the Split Hamiltonian Monte Carlo
(Split HMC) method for Gaussian process model (GPM). This method is based on
splitting the Hamiltonian in a way that allows much of the movement around the
state space to be done at low computational cost. To this end, we approximate
the negative log density (i.e., the energy function) of the distribution of
interest by a quadratic function U0 for which Hamiltonian dynamics can be
solved analytically. The overall energy function U is then written as U0 + U1,
where U1 is the approximation error.
In this article we consider Bayesian parameter inference associated to
partially-observed stochastic processes that start from a set B0 and are
stopped or killed at the first hitting time of a known set A. Such processes
occur naturally within the context of a wide variety of applications. The
associated posterior distributions are highly complex and posterior parameter
inference requires the use of advanced Markov chain Monte Carlo (MCMC)
techniques. Our approach uses a recently introduced simulation methodology,
particle Markov chain Monte Carlo (PMCMC) (Andrieu et. al.
Regularization is widely used in statistics and machine learning to prevent
overfitting and gear solution towards prior information. In general, a
regularized estimation problem minimizes the sum of a loss function and a
penalty term. The penalty term is usually weighted by a tuning parameter and
encourages certain constraints on the parameters to be estimated. Particular
choices of constraints lead to the popular lasso, fused-lasso, and other
generalized $l_1$ penalized regression methods.
In recent years, a rich variety of regularization procedures have been
proposed for high dimensional regression problems. However, tuning parameter
choice and computational efficiency in ultra-high dimensional problems remain
vexing issues. The routine use of $\ell_1$ regularization is largely
attributable to the computational efficiency of the LARS algorithm, but similar
efficiency for better behaved penalties has remained elusive. In this article,
we propose a highly efficient path following procedure for combination of any
convex loss function and a broad class of penalties.
In this paper we describe the basic features of the Bergm package for the
open-source R software which provides a comprehensive framework for Bayesian
analysis for exponential random graph models: tools for parameter estimation,
model selection and goodness-of-fit diagnostics. We illustrate the capabilities
of this package describing the algorithms that drive the package through a
tutorial analysis of two well-known network datasets.
This paper builds on recent research that focuses on regression modeling of
continuous bounded data, such as proportions measured on a continuous scale.
Specifically, it deals with beta regression models with mixed effects from a
Bayesian approach.
Exponential random graph models are a class of widely used exponential family
models for social networks. The topological structure of an observed network is
modeled by the relative prevalence of a set of local sub-graph configurations
termed network statistics. One of the key tasks in the application of these
models is which network statistics to include in the model. This can be thought
of as statistical model selection problem.
We improve the Modified Winitzki's Approximation of the error function
$erf(x)\cong \sqrt{1-e^{-x^2\frac{\frac{4}{\pi}+0.147x^2}{1+0.147x^2}}}$ which
has error $|\varepsilon (x)| < 1.25 \cdot 10^{-4}$ $\forall x \ge 0$ till
reaching 4 decimals of precision with $|\varepsilon (x)| < 2.27 \cdot 10^{-5}$;
also reducing slightly the relative error. Old formula and ours are both
explicitly invertible, essentially solving a biquadratic equation, after
obvious substitutions.
This paper addresses the problem of summarizing the posterior distributions
that typically arise, in a Bayesian framework, when dealing with signal
decomposition problems with unknown number of components. Such posterior
distributions are defined over union of subspaces of differing dimensionality
and can be sampled from using modern Monte Carlo techniques, for instance the
increasingly popular RJ-MCMC method. No generic approach is available, however,
to summarize the resulting variable-dimensional samples and extract from them
component-specific parameters.
Reversible jump MCMC (RJ-MCMC) sampling techniques, which allow to jointly
tackle model selection and parameter estimation problems in a coherent Bayesian
framework, have become increasingly popular in the signal processing literature
since the seminal paper of Andrieu and Doucet (IEEE Trans. Signal Process.,
47(10), 1999). Crucial to the implementation of any RJ-MCMC sampler is the
computation of the so-called Metropolis-Hastings-Green (MHG) ratio, which
determines the acceptance probability for the proposed moves.
Sequential Monte Carlo (SMC) methods, also known as particle filters, are
simulation-based recursive algorithms for the approximation of the a posteriori
probability measures generated by state-space dynamical models. At any given
(discrete) time t, a particle filter produces a set of samples over the state
space of the system of interest. These samples are referred to as "particles"
and can be used to build a discrete approximation of the a posteriori
probability distribution of the state, conditional on a sequence of available
observations.
This paper introduces a new specialized algorithm for equilibrium Monte Carlo
sampling of binary-valued systems, which allows for large moves in the state
space. This is achieved by constructing self-avoiding walks (SAWs) in the state
space. As a consequence, many bits are flipped in a single MCMC step. We name
the algorithm SARDONICS, an acronym for Self-Avoiding Random Dynamics on
Integer Complex Systems. The algorithm has several free parameters, but we show
that Bayesian optimization can be used to automatically tune them.
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm
that avoids the random walk behavior and sensitivity to correlated parameters
that plague many MCMC methods by taking a series of steps informed by
first-order gradient information. These features allow it to converge to
high-dimensional target distributions much more quickly than simpler methods
such as random walk Metropolis or Gibbs sampling. However, HMC's performance is
highly sensitive to two user-specified parameters: a step size {\epsilon} and a
desired number of steps L.
The accurate asymptotic evaluation of marginal likelihood integrals is a
fundamental problem in Bayesian statistics. Following the approach introduced
by Watanabe, we translate this into a problem of computational algebraic
geometry, namely, to determine the real log canonical threshold of a polynomial
ideal, and we present effective methods for solving this problem. Our results
are based on resolution of singularities, and they apply to all statistical
models for discrete data that admit a parametrization by real analytic
functions.
Bundle adjustment (BA) is the problem of refining a visual reconstruction to
produce better structure and viewing parameter estimates. This problem is often
formulated as a nonlinear least squares problem, where data arises from
interest point matching. Mismatched interest points cause serious problems in
this approach, as a single mismatch will affect the entire reconstruction. In
this paper, we propose a novel robust Student's t BA algorithm (RST-BA).
In this paper we introduce a new method for performing computational
inference on log-Gaussian Cox processes (LGCP). Contrary to current practice,
we do not approximate by a counting process on a partition of the domain, but
rather attack the point process likelihood directly. In order to do this, we
use the continuously specified Markovian random fields introduced by
\citet{Lindgren2011}. The inference is performed using the \texttt{R-INLA}
package of \citet{art451}, which allows us to perform fast approximate
inference on quite complicated models.
We extend Tropp's analysis of Orthogonal Matching Pursuit (OMP) using the
Exact Recovery Condition (ERC) to a first exact recovery analysis of Orthogonal
Least Squares (OLS). We show that when ERC is met, OLS is guaranteed to exactly
recover the unknown support. Moreover, we provide a closer look at the analysis
of both OMP and OLS when ERC is not fulfilled. We show that there exist
dictionaries for which some subsets are never recovered with OMP. This
phenomenon, which also appears with $\ell_1$ minimization, does not occur for
OLS.
This paper considers the topic of finding prior distributions when a major
component of the statistical model depends on a nonlinear function. Using
results on how to construct uniform distributions in general metric spaces, we
propose a prior distribution that is uniform in the space of functional shapes
of the underlying nonlinear function and then back-transform to obtain a prior
distribution for the original model parameters. The primary application
considered in this article is nonlinear regression, but the idea might be of
interest beyond this case.
Methods for choosing a fixed set of knot locations in additive spline models
are fairly well established in the statistical literature. While most of these
methods are in principle directly extendable to non-additive surface models,
they are likely to be less successful in that setting because of the curse of
dimensionality, especially when there are more than a couple of covariates.
Estimation of small failure probabilities is one of the most important and
challenging computational problems in reliability engineering. The failure
probability is usually given by an integral over a high-dimensional uncertain
parameter space that is difficult to evaluate numerically. This paper focuses
on enhancements to Subset Simulation (SS), proposed by Au and Beck, which
provides an efficient algorithm based on MCMC (Markov chain Monte Carlo)
simulation for computing small failure probabilities for general
high-dimensional reliability problems.
There are two main algorithms for fitting constrained marginal models to
discrete data available in the literature. We show that they are equivalent,
each being advantageous in different circumstances, and give some results on
their convergence properties. Extensions of the method are provided to allow
the inclusion of individual covariates, and for maximization under
$L_1$-penalties.
Recently, Andrieu, Doucet and Holenstein (2010) introduced a general
framework for using particle filters (PFs) to construct proposal kernels for
Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov
chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint
Bayesian state and parameter inference in nonlinear/non-Gaussian state-space
models. However, the mixing of the resulting MCMC kernels can be quite
sensitive, both to the number of particles used in the underlying PF and to the
number of observations in the data.
Let $K$ be a closed convex polyhedron defined by a finite number of linear
inequalities. In this paper we refine the theory of abstract tubes (Naiman and
Wynn, 1997) associated with $K$ when $K$ is perturbed. In particular, we focus
on the perturbation that is lexicographic and in an outer direction. An
algorithm for constructing the abstract tube by means of linear programming and
its implementation are discussed.
This article presents an algorithm that generates an exact (conservative)
confidence interval of a specified length and coverage probability for the
power of a Monte Carlo test (such as a bootstrap or permutation test). It is
the first method that achieves this aim for almost any Monte Carlo test.
The problem of rank-one approximation of symmetric tensors is important in
independent components analysis, also known as blind source separation, as well
as polynomial optimization. The rank-one approximation problem reduces to
finding the principal Z eigenvector, and an effective algorithm was recently
discovered for computing symmetric tensor eigenvectors.
Using an asymmetric Laplace distribution, which provides a mechanism for
Bayesian inference of quantile regression models, we develop a fully Bayesian
approach to fitting single-index models in conditional quantile regression. In
this work, we use a Gaussian process prior for the unknown nonparametric link
function and a Laplace distribution on the index vector, with the latter
motivated by the recent popularity of the Bayesian lasso idea. We design a
Markov chain Monte Carlo algorithm for posterior inference.
We present an iterative sampling method which delivers upper and lower
bounding processes for the Brownian path. We develop such processes with
particular emphasis on being able to unbiasedly simulate them on a personal
computer. The dominating processes converge almost surely in the supremum and
$L_1$ norms.
Under multiplicative drift and other regularity conditions, it is established
that the asymptotic variance associated with a particle filter approximation of
the prediction filter is bounded uniformly in time, and the non-asymptotic,
relative variance associated with the particle approximation of the normalizing
constant is bounded linearly in time. The conditions are demonstrated to hold
for some hidden Markov models on non-compact state spaces.
Statistical analysis of max-stable processes used to model spatial extremes
has been limited by the difficulty in calculating the joint likelihood
function. This precludes all standard likelihood-based approaches, including
Bayesian approaches. In this paper we present a Bayesian approach through the
use of approximate Bayesian computing. This circumvents the need for a joint
likelihood function by instead relying on simulations from the (unavailable)
likelihood. This method is compared with an alternative approach based on the
composite likelihood.
We propose an L1-penalized algorithm for fitting high-dimensional generalized
linear mixed models. Generalized linear mixed models (GLMMs) can be viewed as
an extension of generalized linear models for clustered observations. This
Lasso-type approach for GLMMs should be mainly used as variable screening
method to reduce the number of variables below the sample size. We then suggest
a refitting by maximum likelihood based on the selected variables only. This is
an effective correction to overcome problems stemming from the variable
screening procedure which are more severe with GLMMs.
While statisticians are well-accustomed to performing exploratory analysis in
the modeling stage of an analysis, the notion of conducting preliminary
general-purpose exploratory analysis in the Monte Carlo stage (or more
generally, the model-fitting stage) of an analysis is an area which we feel
deserves much further attention. Towards this aim, this paper proposes a
general-purpose algorithm for automatic density exploration.
Extending recent work of Corrado, we derive an algorithm that computes
rigorous upper and lower bounds for rectangle scan probabilities for Markov
increments. We experimentally examine the closeness of the bounds computed by
the algorithm and we examine the range of tractable input variables.
We present a new approach to Bayesian inference that entirely avoids Markov
chain simulation, by constructing a map that pushes forward the prior measure
to the posterior measure. Existence and uniqueness of a suitable
measure-preserving map is established by formulating the problem in the context
of optimal transport theory. We discuss various means of explicitly
parameterizing the map and computing it efficiently through solution of an
optimization problem, exploiting gradient information from the forward model
when possible.
Sturmfels offered 100 Swiss Francs in 2005 to a conjecture, which deals with
a special case of the maximum likelihood estimation for a latent class model.
This paper confirms the conjecture positively.
Accuracy and Availability computations for a GNSS System - or combination of
Systems - through Service Volume Simulations take considerable time. Therefore,
the computation of the accuracy in 2D and 3D are often simplified by an
approximate solution. The drawback is that such simplifications can lead to
accuracy results that are too conservative (up to 25% in the 2D case and up to
43% in the 3D case, for a 95% confidence level), which in turn translates into
pessimistic System Availability.
This article establishes sufficient conditions for a linear-in-time bound on
the non-asymptotic variance of particle approximations of time-homogeneous
Feynman-Kac formulae. These formulae appear in a wide variety of applications
including option pricing in finance and risk sensitive control in engineering.
In direct Monte Carlo approximation of these formulae, the non-asymptotic
variance typically increases at an exponential rate in the time parameter.
In this paper, we develop a new method to sample from posterior distributions
in hierarchical models without using Markov chain Monte Carlo. This method is
generally applicable to high-dimensional models involving large data sets.
Illustrative analysis exemplifies the ease with which one could implement our
method, which results in independent samples from the posterior distributions
of interest.
High-dimensional data with hundreds of thousands of observations are becoming
commonplace in many disciplines. The analysis of such data poses many
computational challenges, especially when the observations are correlated over
time and/or across space. In this paper we propose flexible hierarchical
regression models for analyzing such data that accommodate serial and/or
spatial correlation. We address the computational challenges involved in
fitting these models by adopting an approximate inference framework.
We address the issue of knots selection for Gaussian predictive process
methodology. Predictive process approximation provides an effective solution to
the cubic order computational complexity of Gaussian process models. This
approximation crucially depends on a set of points, called knots, at which the
original process is retained, while the rest is approximated via a
deterministic extrapolation. Knots should be few in number to keep the
computational complexity low, but provide a good coverage of the process domain
to limit approximation error.
Many models of interest in the natural and social sciences have no
closed-form likelihood function, which means that they cannot be treated using
the usual techniques of statistical inference. In the case where such models
can be efficiently simulated, Bayesian inference is still possible thanks to
the Approximate Bayesian Computation (ABC) algorithm. Although many refinements
have since been suggested, the technique suffers from three major shortcomings.
First, it requires introducing a vector of "summary statistics", the choice of
which is arbitrary and may lead to strong biases.
The correlation of the result lists provided by search engines is fundamental
and it has deep and multidisciplinary ramifications. Here, we present automatic
and unsupervised methods to assess whether or not search engines provide
results that are comparable or correlated. We have two main contributions:
First, we provide evidence that for more than 80% of the input queries -
independently of their frequency - the two major search engines share only
three or fewer URLs in their search results, leading to an increasing
divergence.
Given a decision process based on the approximate probability density
function returned by a data assimilation algorithm, an interaction level
between the decision making level and the data assimilation level is designed
to incorporate the information held by the decision maker into the data
assimilation process. Here the information held by the decision maker is a loss
function at a decision time which maps the state space onto real numbers which
represent the threat associated with different possible outcomes or states.
The paper builds upon a recent approach to find the approximate bounds of a
real function using Polynomial Chaos expansions. Given a function of random
variables with compact support probability distributions, the intuition is to
quantify the uncertainty in the response using Polynomial Chaos expansion and
discard all the information provided about the randomness of the output and
extract only the bounds of its compact support.
For nearly any challenging scientific problem evaluation of the likelihood is
problematic if not impossible. Approximate Bayesian computation (ABC) allows us
to employ the whole Bayesian formalism to problems where we can use simulations
from a model, but cannot evaluate the likelihood directly.
computation (ABC). Here we discuss how to construct the perturbation kernels
that are required in ABC-SMC approaches, in order to construct a set of
distributions that start out from a suitably defined prior and converge towards
the unknown posterior. We derive optimality criteria for different kernels,
which are based on the Kullback-Leibler divergence between intermediate
distributions and the posterior distribution. We will show that for many
complicated posterior distributions locally adapted kernels tend to show the
best performance.
We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up
by "splitting" the Hamiltonian in a way that allows much of the movement around
the state space to be done at low computational cost. One context where this is
possible is when the log density of the distribution of interest (the potential
energy function) can be written as the log of a Gaussian density, which is a
quadratic function, plus a slowly varying function. Hamiltonian dynamics for
quadratic energy functions can be analytically solved.
In this article we propose a novel MCMC method based on deterministic
transformations T : X x D --> X where X is the state-space and D is some set
which may or may not be a subset of X. We refer to our new methodology as
Transformation-based Markov chain Monte Carlo (TMCMC).
The Mat\'ern covariance function is a popular choice for modeling dependence
in spatial environmental data. Standard Mat\'ern covariance models are,
however, often computationally infeasible for large data sets. In this work,
recent results for Markov approximations of Gaussian Mat\'ern fields based on
Hilbert space approximations are extended using wavelet basis functions. These
Markov approximations are compared with two of the most popular methods for
efficient covariance approximations; covariance tapering and the process
convolution method.
The entropy is one of the most applicable uncertainty measures in many
statistical and en- gineering problems. In statistical literature, the entropy
is used in calculation of the Kullback- Leibler (KL) information which is a
powerful mean for performing goodness of fit tests. Ranked Set Sampling (RSS)
seems to provide improved estimators of many parameters of the popu- lation in
the huge studied problems in the literature. It is developed for situations
where the variable of interest is difficult or expensive to measure, but where
ranking in small sub-samples is easy.
The adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen
[Bernoulli 7 (2001) 223-242] uses the estimated covariance of the target
distribution in the proposal distribution. This paper introduces a new robust
adaptive Metropolis algorithm estimating the shape of the target distribution
and simultaneously coercing the acceptance rate. The adaptation rule is
computationally simple adding no extra cost compared with the AM algorithm.
Reversible jump Markov chain Monte Carlo (RJMCMC) extends ordinary MCMC
methods for use in Bayesian multimodel inference. We show that RJMCMC can be
implemented as Gibbs sampling with alternating updates of a model indicator and
a vector-valued "palette" of parameters denoted $\bm \psi$. Like an artist uses
the palette to mix dabs of color for specific needs, we create model-specific
parameters from the set available in $\bm \psi$.
In order to compute the log-likelihood for high dimensional spatial Gaussian
models, it is necessary to compute the determinant of the large, sparse,
symmetric positive definite precision matrix, Q. Traditional methods for
evaluating the log-likelihood for very large models may fail due to the massive
memory requirements. We present a novel approach for evaluating such
likelihoods when the matrix-vector product, Qv, is fast to compute. In this
approach we utilise matrix functions, Krylov subspaces, and probing vectors to
construct an iterative method for computing the log-likelihood.
A new methodology for model determination in decomposable graphical Gaussian
models is developed. The Bayesian paradigm is used and, for each given graph, a
hyper inverse Wishart prior distribution on the covariance matrix is
considered. This prior distribution depends on hyper-parameters. It is
well-known that the models's posterior distribution is sensitive to the
specification of these hyper-parameters and no completely satisfactory method
is registered.
In multivariate nonparametric analysis, sparseness of the covariates also
called curse of dimensionality, forces one to use large smoothing parameters.
This leads to a biased smoother. Instead of focusing on optimally selecting the
smoothing parameter, we fix it to some reasonably large value to ensure an
over-smoothing of the data. The resulting base smoother has a small variance
but a substantial bias. In this paper, we propose an R package named ibr to
iteratively correct the initial bias of the (base) estimator by an estimate of
the bias obtained by smoothing the residuals.
We recast importance sampling in terms of minimizing a Monte Carlo estimate
of the Kullback-Leibler divergence over a well-chosen exponential family. We
further examine two alternative Kullback-Leibler approximations that do not
require the target distribution to be normalized. One of these makes the
variational problem equivalent to normalized importance sampling, while the
other leads to a new moment estimation technique that works remarkably well if
the target distribution is reasonably close to the approximating family even
though the proposal distribution is mismatched.
The expectation-maximization (EM) algorithm introduced by Dempster et al in
1977 is a very general method to solve maximum likelihood estimation problems.
In this informal report, we review the theory behind EM as well as a number of
EM variants, suggesting that beyond the current state of the art is an even
much wider territory still to be discovered.
Approximate Bayesian computation (ABC) is a class of algorithmic methods in
Bayesian inference using statistical summaries and computer simulations. ABC
has become popular in evolutionary genetics and in other branches of biology.
However model selection under ABC algorithms has been a subject of intense
debate during the recent years. Here we propose novel approaches to model
selection based on posterior predictive distributions and approximations of the
deviance.
This paper studies fundamental aspects of modelling data using multivariate
Watson distributions. Although these distributions are natural for modelling
axially symmetric data (i.e., unit vectors where $\pm \x$ are equivalent), for
high-dimensions using them can be difficult. Why so? Largely because for Watson
distributions even basic tasks such as maximum-likelihood are numerically
challenging. To tackle the numerical difficulties some approximations have been
derived---but these are either grossly inaccurate in high-dimensions
(\emph{Directional Statistics}, Mardia & Jupp.
Importance sampling is a common technique for Monte Carlo approximation,
including Monte Carlo approximation of p-values. Here it is shown that a simple
correction of the usual importance sampling p-values creates valid p-values,
meaning that a hypothesis test created by rejecting the null when the p-value
is <= alpha will also have a type I error rate <= alpha. This correction uses
the importance weight of the original observation, which gives valuable
diagnostic information under the null hypothesis.
We describe a dynamic programming algorithm for exact counting and exact
uniform sampling of matrices with specified row and column sums. The algorithm
runs in polynomial time when the column sums are bounded. Binary or
non-negative integer matrices are handled. The method is distinguished by
applicability to non-regular margins, tractability on large matrices, and the
capacity for exact sampling.
We present a computational approach for generating Markov bases for multi-way
contin- gency tables whose cells counts might be constrained by fixed marginals
and by lower and upper bounds. Our framework includes tables with structural
zeros as a particular case. In- stead of computing the entire Markov basis in
an initial step, our framework finds sets of local moves that connect each
table in the reference set with a set of neighbor tables. We construct a Markov
chain on the reference set of tables that requires only a set of local moves at
each iteration.
Recently popularized randomized methods for principal component analysis
(PCA) efficiently and reliably produce nearly optimal accuracy --- even on
parallel processors --- unlike the classical (deterministic) alternatives. We
adapt one of these randomized methods for use with data sets that are too large
to be stored in random-access memory (RAM). (The traditional terminology is
that our procedure works efficiently "out-of-core.") We illustrate the
performance of the algorithm via several numerical examples.
This paper addresses finite sample stability properties of sequential Monte
Carlo methods for approximating sequences of probability distributions. The
results presented herein are applicable in the scenario where the start and end
distributions in the sequence are fixed and the number of intermediate steps is
a parameter of the algorithm. Under assumptions which hold on non-compact
spaces, it is shown that the effect of the initial distribution decays
exponentially fast in the number of intermediate steps and the corresponding
stochastic error is stable in \mathbb{L}_{p} norm.
We investigate the stability of a Sequential Monte Carlo (SMC) method applied
to the problem of sampling from a target distribution on R^d for large d. It is
well known that, using a single importance sampling step, one produces an
approximation for the target distribution that deteriorates as the dimension
$d$ increases, unless the number of Monte Carlo samples $N$ increases at an
exponential rate in d.
Many least squares problems involve affine equality and inequality
constraints. Although there are variety of methods for solving such problems,
most statisticians find constrained estimation challenging. The current paper
proposes a new path following algorithm for quadratic programming based on
exact penalization. Similar penalties arise in $l_1$ regularization in model
selection. Classical penalty methods solve a sequence of unconstrained problems
that put greater and greater stress on meeting the constraints.
The Laplace approximation is an old, but frequently used method to
approximate integrals for Bayesian calculations. In this paper we develop an
extension of the Laplace approximation, by applying it iteratively to the
residual, i.e., the difference between the current approximation and the true
function. The final approximation is thus a linear combination of multivariate
normal densities, where the coefficients are chosen to achieve a good fit to
the target distribution.
An off-line handwritten alphabetical character recognition system using
multilayer feed forward neural network is described in the paper. A new method,
called, diagonal based feature extraction is introduced for extracting the
features of the handwritten alphabets. Fifty data sets, each containing 26
alphabets written by various people, are used for training the neural network
and 570 different handwritten alphabetical characters are used for testing.
In the series of our earlier papers on the subject, we proposed a novel
statistical hypothesis testing method for detection of objects in noisy images.
The method uses results from percolation theory and random graph theory. We
developed algorithms that allowed to detect objects of unknown shapes in the
presence of nonparametric noise of unknown level and of unknown distribution.
No boundary shape constraints were imposed on the objects, only a weak bulk
condition for the object's interior was required. Our algorithms have linear
complexity and exponential accuracy.
Global sensitivity analysis is often impracticable for complex and time
demanding numerical models, as it requires a large number of runs. The
reduced-basis approach provides a way to replace the original model by a much
faster to run code. In this paper, we are interested in the information loss
induced by the approximation on the estimation of sensitivity indices. We
present a method to provide a robust error assessment, hence enabling
significant time savings without sacrifice on precision and rigourousness.
We propose and develop a novel and effective perfect sampling methodology for
simulating from posteriors corresponding to mixtures with either known (fixed)
or unknown number of components. For the latter we consider the Dirichlet
process-based mixture model developed by these authors, and show that our ideas
are applicable to conjugate, and importantly, to non-conjugate cases.
Online (also called "recursive" or "adaptive") estimation of fixed model
parameters in hidden Markov models is a topic of much interest in times series
modelling. In this work, we propose an online parameter estimation algorithm
that combines two key ideas. The first one, which is deeply rooted in the
Expectation-Maximization (EM) methodology consists in reparameterizing the
problem using complete-data sufficient statistics. The second ingredient
consists in exploiting a purely recursive form of smoothing in HMMs based on an
auxiliary recursion.
Kernel density estimation (KDE) is a popular statistical technique for
estimating the underlying density distribution with minimal assumptions.
Although they can be shown to achieve asymptotic estimation optimality for any
input distribution, cross-validating for an optimal parameter requires
significant computation dominated by kernel summations. In this paper we
present an improvement to the dual-tree algorithm, the first practical kernel
summation algorithm for general dimension. Our extension is based on the
series-expansion for the Gaussian kernel used by fast Gauss transform.
We consider various versions of adaptive Gibbs and Metropolis- within-Gibbs
samplers, which update their selection probabilities (and per- haps also their
proposal distributions) on the y during a run, by learning as they go in an
attempt to optimise the algorithm.We present a cautionary example of how even a
simple-seeming adaptive Gibbs sampler may fail to converge.We then present
various positive results guaranteeing convergence of adaptive Gibbs samplers
under certain conditions.
The Nummellin's split chain construction allows to decompose a Markov chain
Monte Carlo (MCMC) trajectory into i.i.d. "excursions". RegenerativeMCMC
algorithms based on this technique use a random number of samples. They have
been proposed as a promising alternative to usual fixed length simulation [25,
33, 14]. In this note we derive nonasymptotic bounds on the mean square error
(MSE) of regenerative MCMC estimates via techniques of renewal theory and
sequential statistics. These results are applied to costruct confidence
intervals.
Approximate Bayesian computation (ABC), also known as likelihood-free
methods, have become a favourite tool for the analysis of complex stochastic
models, primarily in population genetics but also in financial analyses. We
advocated in Grelaud et al.
A goal of systems biology is to understand the dynamics of intracellular
systems. Stochastic chemical kinetic models are often utilized to accurately
capture the stochastic nature of these systems due to low numbers of molecules.
Collecting system data allows for estimation of stochastic chemical kinetic
rate parameters. We describe a well-known, but typically impractical data
augmentation Markov chain Monte Carlo algorithm for estimating these
parameters.
Clustering with fast algorithms large samples of high dimensional data is an
important challenge in computational statistics. Borrowing ideas from MacQueen
(1967) who introduced a sequential version of the $k$-means algorithm, we
propose in this paper a new class of recursive stochastic gradient algorithms
designed for the $k$-medians loss criterion. By their recursive nature, these
algorithms are very fast and are well adapted to deal with large samples of
data that are allowed to arrive sequentially.
We consider the generic problem of performing sequential Bayesian inference
in a state-space model with observation process $(y_{t})$, state process
$(x_{t})$ and fixed parameter $\theta$. An idealized approach would be to apply
the \emph{iterated batch importance sampling} (IBIS) algorithm of
\citet{Chopin:IBIS}. This is a sequential Monte Carlo algorithm \emph{in the
This paper describes a method for estimating the marginal likelihood or Bayes
factors of Bayesian models using non-parametric importance sampling ("arrogance
sampling"). This method can also be used to compute the normalizing constant of
probability distributions. Because the required inputs are samples from the
distribution to be normalized and the scaled density at those samples, this
method may be a convenient replacement for the harmonic mean estimator. The
method has been implemented in the open source R package margLikArrogance.
Also known as likelihood-free methods, approximate Bayesian computational
(ABC) methods have appeared in the past ten years as the most satisfactory
approach to untractable likelihood problems, first in genetics then in a
broader spectrum of applications. However, these methods suffer to some degree
from calibration difficulties that make them rather volatile in their
implementation and thus render them suspicious to the users of more traditional
Monte Carlo methods.
The Adaptive Multiple Importance Sampling (AMIS) algorithm is aimed at an
optimal recycling of past simulations in an iterated importance sampling
scheme. The difference with earlier adaptive importance sampling
implementations like Population Monte Carlo is that the importance weights of
all simulated values, past as well as present, are recomputed at each
iteration, following the technique of the deterministic multiple mixture
estimator of Owen and Zhou (2000).
Many applications in the field of statistics require Markov chain Monte Carlo
methods. Determining appropriate starting values and run lengths can be both
analytically and empirically challenging. A desire to overcome these problems
has led to the development of exact, or perfect, sampling algorithms which
convert a Markov chain into an algorithm that produces i.i.d.\ samples from the
stationary distribution. Unfortunately, very few of these algorithms have been
developed for the intractable distributions that arise in statistical
applications, which typically have uncountable support.
A general purpose variance reduction technique for Markov chain Monte Carlo
estimators based on the zero-variance principle introduced in the physics
literature by Assaraf and Caffarel (1999, 2003), is proposed. Conditions for
unbiasedness of the zero-variance estimator are derived. A central limit
theorem is also proved under regularity conditions. The potential of the new
idea is illustrated with real applications to Bayesian inference for probit,
logit and GARCH models.
The performance of principal component analysis (PCA) suffers badly in the
presence of outliers. This paper proposes two novel approaches for robust PCA
based on semidefinite programming. The first method, maximum mean absolute
deviation rounding (MDR), seeks directions of large spread in the data while
damping the effect of outliers. The second method produces a low-leverage
decomposition (LLD) of the data that attempts to form a low-rank model for the
data by separating out corrupted observations. This paper also presents
efficient computational methods for solving these SDPs.
L1 -penalized regression methods such as the Lasso (Tibshirani 1996) that
achieve both variable selection and shrinkage have been very popular. An
extension of this method is the Fused Lasso (Tibshirani and Wang 2007), which
allows for the incorporation of external information into the model. In this
article, we develop new and fast algorithms for solving the Fused Lasso which
are based on coordinate-wise optimization. This class of algorithms has
recently been applied very successfully to solve L1 -penalized problems very
quickly (Friedman et al. 2007).
Network structures are reconstructed from dynamical data by respectively
naive mean field (nMF) and Thouless-Anderson-Palmer (TAP) approximations. For
TAP approximation, we use two methods to reconstruct the network: a) iteration
method; b) casting the inference formula to a set of cubic equations and
solving it directly. We investigate inference of the asymmetric Sherrington-
Kirkpatrick (S-K) model using asynchronous update. The solutions of the sets
cubic equation depend of temperature T in the S-K model, and a critical
temperature Tc is found around 2.1.
This paper proposes approaches for the analysis of multiple changepoint
models when dependency in the data is modelled through a hierarchical Gaussian
Markov random field model. Integrated nested Laplace approximations are used to
approximate data quantities, and an approximate filtering recursions approach
is proposed for savings in compuational cost when detecting changepoints. All
of these methods are simulation free. Analysis of real data demonstrates the
usefulness of the approach in general.
The shrinking rank method is a variation of slice sampling that is efficient
at sampling from multivariate distributions with highly correlated parameters.
It requires that the gradient of the log-density be computable. At each
individual step, it approximates the current slice with a Gaussian occupying a
shrinking-dimension subspace. The dimension of the approximation is shrunk
orthogonally to the gradient at rejected proposals, since the gradients at
points outside the current slice tend to point towards the slice.
In this paper, we study the alternating direction method for finding the
Dantzig selectors, which are first introduced in [8]. In particular, at each
iteration we apply the nonmonotone gradient method proposed in [17] to
approximately solve one subproblem of this method. We compare our approach with
a first-order method proposed in [3]. The computational results show that our
approach usually outperforms that method in terms of CPU time while producing
solutions of comparable quality.
We introduce a Bayesian extension of the latent block model for model-based
block clustering of data matrices. Our approach considers a block model where
block parameters may be integrated out. The result is a posterior defined over
the number of clusters in rows and columns and cluster memberships. The number
of row and column clusters need not be known in advance as these are sampled
along with cluster memberhips using Markov chain Monte Carlo.
This paper presents a Markov chain Monte Carlo method to generate approximate
posterior samples in retrospective multiple changepoint problems where the
number of changes is not known in advance. The method uses conjugate models
whereby the marginal likelihood for the data between consecutive changepoints
is tractable. Inclusion of hyperpriors gives a near automatic algorithm
providing a robust alternative to popular filtering recursions approaches in
cases which may be sensitive to prior information. Three real examples are used
to demonstrate the proposed approach.
Switching state-space models (SSSM) are a very popular class of time series
models that have found many applications in statistics, econometrics and
advanced signal processing. Bayesian inference for these models typically
relies on Markov chain Monte Carlo (MCMC) techniques. However, even
sophisticated MCMC methods dedicated to SSSM can prove quite inefficient as
they update potentially strongly correlated discrete-valued latent variables
one-at-a-time (Carter and Kohn, 1996; Gerlach et al., 2000; Giordani and Kohn,
2008).
The EM algorithm is a widely used methodology for penalized likelihood
estimation. Provable monotonicity and convergence are the hallmarks of the EM
algorithm and these properties are well established for smooth likelihood and
smooth penalty functions. However, many relaxed versions of variable selection
penalties are not smooth. The goal of this paper is to introduce a new class of
Space Alternating Penalized Kullback Proximal extensions of the EM algorithm
for nonsmooth likelihood inference.
This is a collection of discussions of `Riemann manifold Langevin and
Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the
Journal of the Royal Statistical Society, Series B.
We present a very fast algorithm for general matrix factorization of a data
matrix for use in the statistical analysis of high-dimensional data via latent
factors. Such data are prevalent across many application areas and generate an
ever-increasing demand for methods of dimension reduction in order to undertake
the statistical analysis of interest. Our algorithm uses a gradient-based
approach which can be used with an arbitrary loss function provided the latter
is differentiable.
This paper describes four methods for estimating autocorrelation time and
evaluates these methods with a test set of seven series. Fitting an
autoregressive process appears to be the most accurate method of the four. An R
package is provided for extending the comparison to more methods and test
series.
This technical report is the union of two contributions to the discussion of
the Read Paper "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"
by B. Calderhead and M. Girolami, presented in front of the Royal Statistical
Society on October 13th 2010 and to appear in the Journal of the Royal
Statistical Society Series B. The first comment establishes a parallel and
possible interactions with Adaptive Monte Carlo methods.
In this paper, we show how a complete and exact Bayesian analysis of a
parametric mixture model is possible in some cases when components of the
mixture are taken from exponential families and when conjugate priors are used.
This restricted set-up allows us to show the relevance of the Bayesian approach
as well as to exhibit the limitations of a complete analysis, namely that it is
impossible to conduct this analysis when the sample size is too large, when the
data are not from an exponential family, or when priors that are more complex
than conjugate priors are used.
In Fern\'andez-Dur\'an (2004), a new family of circular distributions based
on nonnegative trigonometric sums (NNTS models) is developed. Because the
parameter space of this family is the surface of the hypersphere, an efficient
Newton-like algorithm on manifolds is generated in order to obtain the maximum
likelihood estimates of the parameters.
This paper considers the problem of using MCMC to fit sparse Bayesian models
based on normal scale-mixture priors. Examples of this framework include the
Bayesian LASSO and the horseshoe prior. We study the usefulness of parameter
expansion (PX) for improving convergence in such models, which is notoriously
slow when the global variance component is near zero. Our conclusion is that
parameter expansion does improve matters in LASSO-type models, but only
modestly.
We study quasi-Newton methods from the viewpoint of information geometry
induced associated with Bregman divergences. Fletcher has studied a variational
problem which derives the approximate Hessian update formula of the
quasi-Newton methods. We point out that the variational problem is identical to
optimization of the Kullback-Leibler divergence, which is a discrepancy measure
between two probability distributions. The Kullback-Leibler divergence for the
multinomial normal distribution corresponds to the objective function Fletcher
has considered.
We propose an extension of quasi-Newton methods, and investigate the
convergence and the robustness properties of the proposed update formulae for
the approximate Hessian matrix. Fletcher has studied a variational problem
which derives the approximate Hessian update formula of the quasi-Newton
methods. We point out that the variational problem is identical to optimization
of the Kullback-Leibler divergence, which is a discrepancy measure between two
probability distributions.
In this paper, we consider the implications of the fact that parallel
raw-power can be exploited by a generic Metropolis--Hastings algorithm if the
proposed values are independent. In particular, we present improvements to the
independent Metropolis--Hastings algorithm that significantly decrease the
variance of any estimator derived from the MCMC output, for a null computing
cost since those improvements are based on a fixed number of target density
evaluations.
The method of tempered transitions was proposed by Neal (1996) for tackling
the difficulties arising when using Markov chain Monte Carlo to sample from
multimodal distributions. In common with methods such as simulated tempering
and Metropolis-coupled MCMC, the key idea is to utilise a series of
successively easier to sample distributions to improve movement around the
state space. Tempered transitions does this by incorporating moves through
these less modal distributions into the MCMC proposals.
Stochastic differential equations (SDEs) are established tools to model
physical phenomena whose dynamics are affected by random noise. By estimating
parameters of an SDE intrinsic randomness of a system around its drift can be
identified and separated from the drift itself. When it is of interest to model
dynamics within a given population, i.e. to model simultaneously the
performance of several experiments or subjects, mixed-effects modelling allows
for the distinction of between and within experiment variability.
This paper deals with the problem of estimating the volume of the excursion
set of a function $f:\mathbb{R}^d \to \mathbb{R}$ above a given threshold,
under a probability measure on $\mathbb{R}^d$ that is assumed to be known. In
the industrial world, this corresponds to the problem of estimating a
probability of failure of a system. When only an expensive-to-simulate model of
the system is available, the budget for simulations is usually severely limited
and therefore classical Monte Carlo methods ought to be avoided.
In the following paper we consider a simulation technique for stochastic
trees. One of the most important areas in computational genetics is the
calculation and subsequent maximization of the likelihood function associated
to such models. This typically consists of using importance sampling (IS) and
sequential Monte Carlo (SMC) techniques. The approach proceeds by simulating
the tree, backward in time from observed data, to a most recent common ancestor
(MRCA). However, in many cases, the computational time and variance of
estimators are often too high to make standard approaches useful.
We explore the use of the optimal statistical interpolation (OSI) data
assimilation method for the statistical tracking of emerging epidemics and to
study the spatial dynamics of a disease. The epidemic models that we used for
this study are spatial variants of the common susceptible-infectious-removed
(S-I-R) compartmental model of epidemiology. The spatial S-I-R epidemic model
is illustrated by application to simulated spatial dynamic epidemic data from
the historic "Black Death" plague of 14th century Europe.
The problem of matching unlabelled point sets using Bayesian inference is
considered. Two recently proposed models for the likelihood are compared, based
on the Procrustes size-and-shape and the full configuration. Bayesian inference
is carried out for matching point sets using Markov chain Monte Carlo
simulation. An improvement to the existing Procrustes algorithm is proposed
which improves convergence rates, using occasional large jumps in the burn-in
period.
We propose an approximation of the asymptotic variance that removes a certain
discontinuity in the formula for the raw and the smoothed periodogram in case a
data taper is used. It is based on an approximation of the covariance of the
(tapered) periodogram at two arbitrary frequencies.
In this paper we study interior point (IP) methods for solving optimal design
problems. In particular, we propose a primal IP method for solving the problems
with general convex optimality criteria and establish its global convergence.
In addition, we reformulate the problems with A-, D- and E-criterion into
linear or log-determinant semidefinite programs (SDPs) and apply standard
primal-dual IP solvers such as SDPT3 [21,25] to solve the resulting SDPs. We
also compare the IP methods with the widely used multiplicative algorithm
introduced by Silvey et al. [18].
We consider optimal non-sequential designs for a large class of (linear and
nonlinear) regression models involving polynomials and rational functions with
heteroscedastic noise also given by a polynomial or rational weight function.
The proposed method generates a polynomial whose zeros are the support points
of the optimal approximate design, and generalizes a number of previously known
results of the same flavor.
The parameters of a discrete stationary Markov model are transition
probabilities between states. Traditionally, data consists in sequences of
observed states for a given number of individuals over the whole observation
period. In such a case, the estimation of transition probabilities is
straightforwardly made by counting one-step moves from a given state to
another. In many real-life problems, however, the inference is much more
difficult as state sequences are not fully observed, namely the state of each
individual is known only for some given values of the time variable $t$.
Estimating parameters of continuous-time linear birth-death-immigration
processes, observed discretely at unevenly spaced time points, is a recurring
theme in statistical analyses of population dynamics. Viewing this task as a
missing data problem, we develop two novel expectation-maximization (EM)
algorithms. When the rate of immigration is either zero or proportional to the
birth rate, we use Kendall's generating function method to reduce the E-step of
the EM algorithm, as well as calculation of the Fisher information, to one
dimensional integration.
We present a simulation methodology for Bayesian estimation of rate
parameters in Markov jump processes arising for example in stochastic kinetic
models. To handle the problem of missing components and measurement errors in
observed data, we embed the Markov jump process into the framework of a general
state space model. We do not use diffusion approximations. Markov chain Monte
Carlo and particle filter type algorithms are introduced, which allow sampling
from the posterior distribution of the rate parameters and the Markov jump
process also in data-poor scenarios.
In this article we propose multiplication based random walk Metropolis
Hastings (MH) algorithm on the real line. We call it the random dive MH (RDMH)
algorithm. This algorithm, even if simple to apply, was not studied earlier in
Markov chain Monte Carlo literature. The associated kernel is shown to have
standard properties like irreducibility, aperiodicity and Harris recurrence
under some mild assumptions. These ensure basic convergence (ergodicity) of the
kernel. Further the kernel is shown to be geometric ergodic for a large class
of target densities on $\mathbb{R}$.
An overview is provided for the problem of calculating Bayes factors (BF) for
model selection using methods based on variants of importance sampling and
variants of Laplace Approximation, specifically path sampling (PS). These
methods are compared on several simulated and two real life data set. It is
noted that there are unexpected instabilities in PS and we provide both a
theoretical explanation and suggestions for improvement. Estimates of the same
BF may differ widely across methods.