We study the question of how to shuffle $n$ cards when faced with an opponent
who knows the initial position of all the cards {\em and} can track every card
when permuted, {\em except} when one takes $K< n$ cards at a time and shuffles
them in a private buffer "behind your back," which we call {\em buffer
shuffling}. The problem arises naturally in the context of parallel mixnet
servers as well as other security applications. Our analysis is based on
related analyses of load-balancing processes.
We study oblivious storage (OS), a natural way to model privacy-preserving
data outsourcing where a client, Alice, stores sensitive data at an
honest-but-curious server, Bob. We show that Alice can hide both the content of
her data and the pattern in which she accesses her data, with high probability,
using a method that achieves O(1) amortized rounds of communication between her
and Bob for each data access.
We describe fully retroactive dynamic data structures for approximate range
reporting and approximate nearest neighbor reporting. We show how to maintain,
for any positive constant $d$, a set of $n$ points in $\R^d$ indexed by time
such that we can perform insertions or deletions at any point in the timeline
in $O(\log n)$ amortized time. We support, for any small constant $\epsilon>0$,
$(1+\epsilon)$-approximate range reporting queries at any point in the timeline
in $O(\log n + k)$ time, where $k$ is the output size.
Oblivious RAM simulation is a method for achieving confidentiality and
privacy in cloud computing environments. It involves obscuring the access
patterns to a remote storage so that the manager of that storage cannot infer
information about its contents. Existing solutions typically involve small
amortized overheads for achieving this goal, but nevertheless involve
potentially huge variations in access times, depending on when they occur. In
this paper, we show how to de-amortize oblivious RAM simulations, so that each
access takes a worst-case bounded amount of time.
A dictionary (or map) is a key-value store that requires all keys be unique,
and a multimap is a key-value store that allows for multiple values to be
associated with the same key. We design hashing-based indexing schemes for
dictionaries and multimaps that achieve worst-case optimal performance for
lookups and updates, with a small or negligible probability the data structure
will require a rehash operation, depending on whether we are working in the the
external-memory (I/O) model or one of the well-known versions of the Random
Access Machine (RAM) model.
In this paper, we study methods for improving the efficiency and privacy of
compressed DNA sequence comparison computations, under various querying
scenarios. For instance, one scenario involves a querier, Bob, who wants to
test if his DNA string, $Q$, is close to a DNA string, $Y$, owned by a data
owner, Alice, but Bob does not want to reveal $Q$ to Alice and Alice is willing
to reveal $Y$ to Bob \emph{only if} it is close to $Q$. We describe a
privacy-enhanced method for comparing two compressed DNA sequences, which can
be used to achieve the goals of such a scenario.
We present data-oblivious algorithms in the external-memory model for
compaction, selection, and sorting. Motivation for such problems comes from
clients who use outsourced data storage services and wish to mask their data
access patterns. We show that compaction and selection can be done
data-obliviously using $O(N/B)$ I/Os, and sorting can be done, with a high
probability of success, using $O((N/B)\log_{M/B} (N/B))$ I/Os.
We give efficient data-oblivious algorithms for several fundamental geometric
problems that are relevant to geographic information systems, including planar
convex hulls and all-nearest neighbors. Our methods are "data-oblivious" in
that they don't perform any data-dependent operations, with the exception of
operations performed inside low-level blackbox circuits having a constant
number of inputs and outputs.
We present techniques for maintaining subgraph frequencies in a dynamic
graph, using data structures that are parameterized in terms of h, the h-index
of the graph. Our methods extend previous results of Eppstein and Spiro for
maintaining statistics for undirected subgraphs of size three to directed
subgraphs and to subgraphs of size four. For the directed case, we provide a
data structure to maintain counts for all 3-vertex induced subgraphs in O(h)
amortized time per update. For the undirected case, we maintain the counts of
size-four subgraphs in O(h^2) amortized time per update.
We study the classic graph drawing problem of drawing a planar graph using
straight-line edges with a prescribed convex polygon as the outer face. Unlike
previous algorithms for this problem, which may produce drawings with
exponential area, our method produces drawings with polynomial area. In
addition, we allow for collinear points on the boundary, provided such vertices
do not create overlapping edges.
We present an efficient algorithm for performing cuckoo hashing in the
MapReduce parallel model of computation and we show how this result in turn
leads to improved methods for performing data-oblivious RAM simulations.
The round-trip distance function on a geographic network (such as a road
network, flight network, or utility distribution grid) defines the "distance"
from a single vertex to a pair of vertices as the minimum length tour visiting
all three vertices and ending at the starting vertex. Given a geographic
network and a subset of its vertices called "sites" (for example a road network
with a list of grocery stores), a two-site round-trip Voronoi diagram labels
each vertex in the network with the pair of sites that minimizes the round-trip
distance from that vertex.
In greedy geometric routing, messages are passed in a network embedded in a
metric space according to the greedy strategy of always forwarding messages to
nodes that are closer to the destination. We show that greedy geometric routing
schemes exist for the Euclidean metric in R^2, for 3-connected planar graphs,
with coordinates that can be represented succinctly, that is, with O(log n)
bits, where n is the number of vertices in the graph. Moreover, our embedding
strategy introduces a coordinate system for R^2 that supports distance
comparisons using our succinct coordinates.
We introduce the straggler identification problem, in which an algorithm must
determine the identities of the remaining members of a set after it has had a
large number of insertion and deletion operations performed on it, and now has
relatively few remaining members. The goal is to do this in o(n) space, where n
is the total number of identities. The straggler identification problem has
applications, for example, in determining the set of unacknowledged packets in
a high-bandwidth multicast data stream.
In this paper, we describe randomized Shellsort--a simple, randomized,
data-oblivious version of the Shellsort algorithm that always runs in O(n log
n) time and, as we show, succeeds in sorting any given input permutation with
very high probability. Thus, randomized Shellsort is simultaneously simple,
time-optimal, and data-oblivious. Taken together, these properties imply
applications in the design of new efficient privacy-preserving computations
based on the secure multi-party computation (SMC) paradigm.
Authenticated data structures provide cryptographic proofs that their answers
are as accurate as the author intended, even if the data structure is being
controlled by a remote untrusted host. We present efficient techniques for
authenticating data structures that represent graphs and collections of
geometric objects. We introduce the path hash accumulator, a new primitive
based on cryptographic hashing for efficiently authenticating various
properties of structured data represented as paths, including any decomposable
query over sequences of elements.