We present an new sequential Monte Carlo sampler for coalescent based
Bayesian hierarchical clustering. Our model is appropriate for modeling
non-i.i.d. data and offers a substantial reduction of computational cost when
compared to the original sampler without resorting to approximations. We also
propose a quadratic complexity approximation that in practice shows almost no
loss in performance compared to its counterpart.
Unbiased, label-free proteomics is becoming a powerful technique for
measuring protein expression in almost any biological sample. The output of
these measurements after preprocessing are a collection of features (10's to
100's of thousands) and their associated intensities for each sample. Subsets
of features within the data are from the same peptide, subsets of peptides are
from the same protein, and subsets of proteins are in the same biological
pathways, therefore there is the potential for very complex and informative
correlational structure inherent in this data.
We propose an active set selection framework for Gaussian process
classification for cases when the dataset is large enough to render its
inference prohibitive. Our scheme consists on a two step alternating procedure
of active set update rules and hyperparameter optimization based upon marginal
likelihood maximization. The active set update rules rely on the ability of the
predictive distributions of a Gaussian process classifier to estimate the
relative contribution of a datapoint when being either included or removed from
the model.
In this paper we consider sparse and identifiable linear latent variable
(factor) and linear Bayesian network models for parsimonious analysis of
multivariate data. We propose a computationally efficient method for joint
parameter and model inference, and model comparison. It consists of a fully
Bayesian hierarchy for sparse models using slab and spike priors (two-component
delta and continuous mixtures), non-Gaussian latent factors and a stochastic
search over the ordering of the variables.