Community detection is a fundamental problem in network analysis, with
applications in many diverse areas. The stochastic block model is a common tool
for model-based community detection, and asymptotic tools for checking
consistency of community detection under the block model have been recently
developed. However, the block model is limited by its assumption that all nodes
within a community are stochastically equivalent, and provides a poor fit to
networks with hubs or highly varying node degrees within communities, which are
common in practice.
We propose a computationally intensive method, the random lasso method, for
variable selection in linear models. The method consists of two major steps. In
step 1, the lasso method is applied to many bootstrap samples, each using a set
of randomly selected covariates. A measure of importance is yielded from this
step for each covariate. In step 2, a similar procedure to the first step is
implemented with the exception that for each bootstrap sample, a subset of
covariates is randomly selected with unequal selection probabilities determined
by the covariates' importance.
Information flow analysis is a powerful technique for reasoning about the
sensitive information exposed by a program during its execution.
In many engineering and scientific applications, prediction variables are
grouped, for example, in biological applications where assayed genes or
proteins can be grouped by biological roles or biological pathways. Common
statistical analysis methods such as ANOVA, factor analysis, and functional
modeling with basis sets also exhibit natural variable groupings.
Analysis of networks and in particular discovering communities within
networks has been a focus of recent work in several fields, with applications
ranging from citation and friendship networks to food webs and gene regulatory
networks. Most of the existing community detection methods focus on
partitioning the entire network into communities, with the expectation of many
ties within communities and few ties between. However, many networks contain
nodes that do not fit in with any of the communities, and forcing every node
into a community can distort results.
Typical protocols for peer-to-peer file sharing over the Internet divide
files to be shared into pieces. New peers strive to obtain a complete
collection of pieces from other peers and from a seed. In this paper we
identify a problem that can occur if the seeding rate is not large enough. The
problem is that, even if the statistics of the system are symmetric in the
pieces, there can be symmetry breaking, with one piece becoming very rare. If
peers depart after obtaining a complete collection, they can tend to leave
before helping other peers receive the rare piece.
Regression models to relate a scalar $Y$ to a functional predictor $X(t)$ are
becoming increasingly common. Work in this area has concentrated on estimating
a coefficient function, $\beta(t)$, with $Y$ related to $X(t)$ through
$\int\beta(t)X(t) dt$. Regions where $\beta(t)\ne0$ correspond to places where
there is a relationship between $X(t)$ and $Y$. Alternatively, points where
$\beta(t)=0$ indicate no relationship.