We design a data-dependent metric in $\mathbb R^d$ and use it to define the
$k$-nearest neighbors of a given point. Our metric is invariant under all
affine transformations. We show that, with this metric, the standard
$k$-nearest neighbor regression estimate is asymptotically consistent under the
usual conditions on $k$, and minimal requirements on the input data.
We study the connectivity properties of random Bluetooth graphs that model
certain "ad hoc" wireless networks. The graphs are obtained as "irrigation
subgraphs" of the well-known random geometric graph model. There are two
parameters that control the model: the radius $r$ that determines the "visible
neighbors" of each node and the number of edges $c$ that each node is allowed
to send to these. The randomness comes from the underlying distribution of data
points in space and from the choices of each vertex.
Consider a branching random walk on $\mathbb R$, with offspring distribution
$Z$ and non-negative displacement distribution $W$. We say that explosion
occurs if an infinite number of particles may be found within a finite distance
of the origin. In this paper, we investigate this phenomenon when the offspring
distribution $Z$ is heavy-tailed.
Given an arbitrary three-dimensional correlation matrix, we prove that there
exists a three-dimensional joint distribution for the random variable $(X,Y,Z)$
such that $X$,$Y$ and $Z$ are identically distributed with beta distribution
$\beta_{k,k}(dx)$ on $(0,1)$ if $k\geq 1/2$. This implies that any correlation
structure can be attained for three-dimensional copulas.
Let R^d -> A be a query problem over R^d for which there exists a data
structure S that can compute P(q) in O(log n) time for any query point q in
R^d. Let D be a probability measure over R^d representing a distribution of
queries. We describe a data structure called the odds-on tree, of size
O(n^\epsilon) that can be used as a filter that quickly computes P(q) for some
query values q in R^d and relies on S for the remaining queries.
Let $G$ be a (possibly disconnected) planar subdivision and let $D$ be a
probability measure over $\R^2$. The current paper shows how to preprocess
$(G,D)$ into an O(n) size data structure that can answer planar point location
queries over $G$. The expected query time of this data structure, for a query
point drawn according to $D$, is $O(H+1)$, where $H$ is a lower bound on the
expected query time of any linear decision tree for point location in $G$. This
extends the results of Collette et al (2008, 2009) from connected planar
subdivisions to disconnected planar subdivisions.
A memoryless routing algorithm is one in which the decision about the next
edge on the route to a vertex t for a packet currently located at vertex v is
made based only on the coordinates of v, t, and the neighbourhood, N(v), of v.
The current paper explores the limitations of such algorithms by showing that,
for any (randomized) memoryless routing algorithm A, there exists a convex
subdivision on which A takes Omega(n^2) expected time to route a message
between some pair of vertices.
We study a class of hypothesis testing problems in which, upon observing the
realization of an n-dimensional Gaussian vector, one has to decide whether the
vector was drawn from a standard normal distribution or, alternatively, whether
there is a subset of the components belonging to a certain given class of sets
whose elements have been "contaminated," that is, have a mean different from
zero. We establish some general conditions under which testing is possible and
others under which testing is hopeless with a small risk.