We present a probabilistic method for linking multiple datafiles. This task
is not trivial in the absence of unique identifiers for the individuals
recorded. This is a common scenario when linking census data to coverage
measurement surveys for census coverage evaluation, and in general when
multiple record-systems need to be integrated for posterior analysis. Our
method generalizes the Fellegi-Sunter theory for linking records from two
datafiles and its modern implementations.
Traditional statistical methods for confidentiality protection of statistical
databases do not scale well to deal with GWAS (genome-wide association studies)
databases especially in terms of guarantees regarding protection from linkage
to external information. The more recent concept of differential privacy,
introduced by the cryptographic community, is an approach which provides a
rigorous definition of privacy with meaningful privacy guarantees in the
presence of arbitrary external information, although the guarantees come at a
serious price in terms of data utility.
Rejoinder of "Bayesian Models and Methods in Public Policy and Government
Settings" by S. E. Fienberg [arXiv:1108.2177]
Starting with the neo-Bayesian revival of the 1950s, many statisticians
argued that it was inappropriate to use Bayesian methods, and in particular
subjective Bayesian methods in governmental and public policy settings because
of their reliance upon prior distributions. But the Bayesian framework often
provides the primary way to respond to questions raised in these settings and
the numbers and diversity of Bayesian applications have grown dramatically in
recent years.
Discussion of "Network routing in a dynamic environment" by N.D. Singpurwalla
[arXiv:1107.4852]
We study maximum likelihood estimation in log-linear models under conditional
Poisson sampling schemes. We derive necessary and sufficient conditions for
existence of the maximum likelihood estimator (MLE) of the model parameters and
investigate estimability of the natural and mean-value parameters under a
non-existent MLE. Our conditions focus on the role of sampling zeros in the
observed table. We situate our results within the general framework of extended
exponential families and we rely in a fundamental way on key geometric
properties of log-linear models.
The deployment of improvised explosive devices (IEDs) along major roadways
has been a favoured strategy of insurgents in recent war zones, both for the
ability to cause damage to targets along roadways at minimal cost, but also as
a means of controlling the flow of traffic and causing additional expense to
opposing forces.
Introduction to papers on the modeling and analysis of network data
We present a new similarity measure tailored to posts in an online forum. Our
measure takes into account all the available information about user interest
and interaction --- the content of posts, the threads in the forum, and the
author of the posts. We use this post similarity to build a similarity between
users, based on principal coordinate analysis. This allows easy visualization
of the user activity as well. Similarity between users has numerous
applications, such as clustering or classification.
The p_1 model is a directed random graph model used to describe dyadic
interactions in a social network in terms of effects due to differential
attraction (popularity) and expansiveness, as well as an additional effect due
to reciprocation. In this article we carry out an algebraic statistics analysis
of this model. We show that the p_1 model is a toric model specified by a
multi-homogeneous ideal. We conduct an extensive study of the Markov bases for
p_1 models that incorporate explicitly the constraint arising from
multi-homogeneity.