We propose a relaxed privacy definition called {\em random differential
privacy} (RDP). Differential privacy requires that adding any new observation
to a database will have small effect on the output of the data-release
procedure. Random differential privacy requires that adding a {\em randomly
drawn new observation} to a database will have small effect on the output. We
show an analog of the composition property of differentially private procedures
which applies to our new definition.
The growing availability of network data and of scientific interest in
distributed systems has led to the rapid development of statistical models of
network structure. Typically, however, these are models for the entire network,
while the data consists only of a sampled sub-network. Parameters for the whole
network, which is what is of interest, are estimated by applying the model to
the sub-network. This assumes that the model is consistent under sampling, or,
in terms of the theory of stochastic processes, that it defines a projective
family.
We study maximum likelihood estimation in log-linear models under conditional
Poisson sampling schemes. We derive necessary and sufficient conditions for
existence of the maximum likelihood estimator (MLE) of the model parameters and
investigate estimability of the natural and mean-value parameters under a
non-existent MLE. Our conditions focus on the role of sampling zeros in the
observed table. We situate our results within the general framework of extended
exponential families and we rely in a fundamental way on key geometric
properties of log-linear models.
High density clusters can be characterized by the connected components of a
level set $L(\lambda) = \{x:\ p(x)>\lambda\}$ of the underlying probability
density function $p$ generating the data, at some appropriate level
$\lambda\geq 0$. The complete hierarchical clustering can be characterized by a
cluster tree ${\cal T}= \bigcup_{\lambda} L(\lambda)$. In this paper, we study
the behavior of a density level set estimate $\widehat L(\lambda)$ and cluster
tree estimate $\widehat{\cal{T}}$ based on a kernel density estimator with
kernel bandwidth $h$.
We study generalized density-based clustering in which sharply defined
clusters such as clusters on lower dimensional manifolds are allowed. We show
that accurate clustering is possible even in high dimensions. We propose two
data-based methods for choosing the bandwidth and we study the stability
properties of density clusters. We show that a simple graph-based algorithm
successfully approximates the high density clusters.
The p_1 model is a directed random graph model used to describe dyadic
interactions in a social network in terms of effects due to differential
attraction (popularity) and expansiveness, as well as an additional effect due
to reciprocation. In this article we carry out an algebraic statistics analysis
of this model. We show that the p_1 model is a toric model specified by a
multi-homogeneous ideal. We conduct an extensive study of the Markov bases for
p_1 models that incorporate explicitly the constraint arising from
multi-homogeneity.
We consider estimating an unknown signal, both blocky and sparse, which is
corrupted by additive noise. We study three interrelated least squares
procedures and their asymptotic properties. The first procedure is the fused
lasso, put forward by Friedman et al. [Ann. Appl. Statist. 1 (2007) 302--332],
which we modify into a different estimator, called the fused adaptive lasso,
with better properties.