The epidemic spreading on arbitrary complex networks is studied in SIR
(Susceptible Infected Recovered) compartment model. We propose our
implementation of a Naive SIR algorithm for epidemic simulation spreading on
networks that uses data structures efficiently to reduce running time. The
Naive SIR algorithm models full epidemic dynamics and can be easily upgraded to
parallel version. We also propose novel algorithm for epidemic simulation
spreading on networks called the FastSIR algorithm that has better average case
running time than the Naive SIR algorithm.
In this paper we present the first algorithm in the streaming model to
characterize completely the biconnectivity properties of undirected networks:
articulation points, bridges, and connected and biconnected components. The
motivation of our work was the development of a real-time algorithm to monitor
the connectivity of the Autonomous Systems (AS) Network, but the solution
provided is general enough to be applied to any network.
Let G be a directed graph with n vertices and non-negative weights in its
directed edges, embedded on a surface of genus g, and let F be an arbitrary
face of G. We describe an algorithm to preprocess the graph in O(gn log n)
time, so that the shortest-path distance from any vertex on the boundary of F
to any other vertex in G can be retrieved in O(log n) time. Our result directly
generalizes the O(n log n)-time algorithm of Klein [SODA 2005] for
multiple-source shortest paths in planar graphs.
The graph partitioning problem is widely used and studied in many practical
and theoretical applications. The multilevel strategies represent today one of
the most effective and efficient generic frameworks for solving this problem on
large-scale graphs. Most of the attention in designing the multilevel
partitioning frameworks has been on the refinement phase. In this work we focus
on the coarsening phase, which is responsible for creating structurally similar
to the original but smaller graphs.
Given a graph G and integers b and w. The black-and-white coloring problem
asks if there exist disjoint sets of vertices B and W with |B|=b and |W|=w such
that no vertex in B is adjacent to any vertex in W. In this paper we show that
the problem is polynomial when restricted to permutation graphs.
Designing short DNA words is a problem of constructing a set (i.e., code) of
n DNA strings (i.e., words) with the minimum length such that the Hamming
distance between each pair of words is at least k and the n words satisfy a set
of additional constraints. This problem has applications in, e.g., DNA
self-assembly and DNA arrays. Previous works include those that extended
results from coding theory to obtain bounds on code and word sizes for
biologically motivated constraints and those that applied heuristic local
searches, genetic algorithms, and randomized algorithms.
This paper studies the problem of finding a (1+eps)-approximation to positive
semidefinite programs. These are semidefinite programs in which all matrices in
the constraints and objective are positive semidefinite and all scalars are
non-negative. Previous work (Jain and Yao, FOCS'11) gave an NC algorithm that
requires at least Omega((1/eps)^13\log^{13}n) iterations. The algorithm
performs at least Omega(n^\omega) work per iteration, where n is the dimension
of the matrices involved, since each iteration involves computing spectral
decomposition.
In this paper we define what we call an affinity system, which is a set of
individuals, each with a vector characterizing its preference for all other
individuals in the set. The preference of a member can be given either by a
ranking of all members or by a weighted vector that defines the degrees of its
affinity to others. Affinity systems are useful for modeling social systems as
well as general data sets, as social interactions are often determined by
affinities among the members.
In this paper, first we give a sequential linear-time algorithm for the
longest path problem in meshes. This algorithm can be considered as an
improvement of [13]. Then based on this sequential algorithm, we present a
constant-time parallel algorithm for the problem which can be run on every
parallel machine.
Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering
problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in
\mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$,
where $d(x,C)$ denotes the distance between $x$ and the closest center in $C$,
is minimized. This is one of the most prominent objective functions that have
been studied with respect to clustering.
We consider the problem of computing the k-sparse approximation to the
discrete Fourier transform of an n-dimensional signal. We show:
* An O(k log n)-time algorithm for the case where the input signal has at
most k non-zero Fourier coefficients, and
* An O(k log n log(n/k))-time algorithm for general input signals.
In 2009 Grzegorz Czajkowski from Google's system infrastructure team has
published an article which didn't get much attention in the SEO community at
the time. It was titled "Large-scale graph computing at Google" and gave an
excellent insight into the future of Google's search. This article highlights
some of the little known facts which lead to transformation of Google's
algorithm in the last two years.
We apply our new hitting set enumeration algorithm to solve the sudoku
minimum number of clues problem, which is the following question: What is the
smallest number of clues (givens) that a sudoku puzzle may have? It was
conjectured that the answer is 17. We have performed an exhaustive search for a
16-clue sudoku puzzle, and we did not find one, thereby proving that the answer
is indeed 17. This article describes our method and the actual search.
In this paper we present an implicit dynamic dictionary with the working-set
property, supporting \op{insert}($e$) and \op{delete}($e$) in $O(\log n)$ time,
\op{predecessor}($e$) in $O(\log \ws{\pred(e)})$ time, \op{successor}($e$) in
$O(\log \ws{\succ(e)})$ time and \op{search}($e$) in $O(\log
\min(\ws{\pred(e)},\ws{e}, \ws{\succ(e)}))$ time, where $n$ is the number of
elements stored in the dictionary, $\ws{e}$ is the number of distinct elements
searched for since element $e$ was last searched for and $\pred(e)$ and
$\succ(e)$ are the predecessor and successor of $e$, respectively.
We study the problem of determining strongly connected components (SCCs) of
directed hypergraphs. The main contribution is an algorithm computing the
terminal strongly connected components (ie SCCs which do not reach any other
components than themselves). The time complexity of the algorithm is almost
linear, which is a significant improvement over the known methods which are
quadratic time. This also proves that the problems of testing strong
connectivity, and determining the existence of a sink, can be both solved in
almost linear time in directed hypergraphs.
We give a complete characterization of the two-state anti-ferromagnetic spin
systems which are of exponential correlation decay. We show that a system is of
correlation decay on all graphs with maximum degree \Delta\ if and only if the
system has a unique Gibbs measure on all infinite regular trees up to degree
\Delta, where \Delta\ can either be bounded or unbounded.
Motivated by the problems of computing sample covariance matrices, and of
transforming a collection of vectors to a basis where they are sparse, we
present a simple algorithm that computes an approximation of the product of two
n-by-n real matrices A and B. Let ||AB||_F denote the Frobenius norm of AB, and
b be a parameter determining the time/accuracy trade-off.
We propose a simple linear-time on-line algorithm for constructing a position
heap for a string [Ehrenfeucht et al, 2011]. Our definition of position heap
differs slightly from the one proposed in [Ehrenfeucht et al, 2011] in that it
considers the suffixes ordered from left to right. Our construction is based on
classic suffix pointers and resembles the Ukkonen's algorithm for suffix trees
[Ukkonen, 1995]. Using suffix pointers, the position heap can be extended into
the augmented position heap that allows for a linear-time string matching
algorithm [Ehrenfeucht et al, 2011].
Tries are popular data structures for storing a set of strings, where common
prefixes are represented by common root-to-node paths. Over fifty years of
usage have produced many variants and implementations to overcome some of their
limitations. We explore new succinct representations of path-decomposed tries
and experimentally evaluate the corresponding reduction in space usage and
memory latency, comparing with the state of the art.
We give algorithms with constant-factor performance guarantees for several
capacity and throughput problems in the SINR model. The algorithms are all
based on a novel LP formulation for capacity problems. First, we give a new
constant-factor approximation algorithm for selecting the maximum subset of
links that can be scheduled simultaneously, under any non-decreasing and
sublinear power assignment. For the case of uniform power, we extend this to
the case of variable QoS requirements and link-dependent noise terms.
We present new and improved data structures that answer exact node-to-node
distance queries in planar graphs. Such data structures are also known as
distance oracles. For any directed planar graph on n nodes with non-negative
lengths we obtain the following:
* Given a desired space allocation $S\in[n\lg\lg n,n^2]$, we show how to
construct in $\tilde O(S)$ time a data structure of size $O(S)$ that answers
distance queries in $\tilde O(n/\sqrt S)$ time per query.
In this article we discuss general strategies and computer algorithms to test
the connectivity of unstructured networks which consist of a number of segments
connected through randomly distributed nodes.
We present the design and analysis of a near linear-work parallel algorithm
for solving symmetric diagonally dominant (SDD) linear systems. On input of a
SDD $n$-by-$n$ matrix $A$ with $m$ non-zero entries and a vector $b$, our
algorithm computes a vector $\tilde{x}$ such that $\norm[A]{\tilde{x} - A^+b}
\leq \vareps \cdot \norm[A]{A^+b}$ in $O(m\log^{O(1)}{n}\log{\frac1\epsilon})$
work and $O(m^{1/3+\theta}\log \frac1\epsilon)$ depth for any fixed $\theta >
0$.
In this work we consider the {\em image matching} problem for two grayscale
$n \times n$ images, $M_1$ and $M_2$ (where pixel values range from 0 to 1).
Our goal is to find an affine transformation $T$ that maps pixels from $M_1$ to
pixels in $M_2$ so that the differences over pixels $p$ between $M_1(p)$ and
$M_2(T(p))$ is minimized.
Advances in DNA sequencing technology will soon result in databases of
thousands of genomes. Within a species, individuals' genomes are almost exact
copies of each other; e.g., any two human genomes are 99.9% the same. Relative
Lempel-Ziv (RLZ) compression takes advantage of this property: it stores the
first genome uncompressed or as an FM-index, then compresses the other genomes
with a variant of LZ77 that copies phrases only from the first genome.
One of the motivations for property testing of boolean functions is the idea
that testing can serve as a preprocessing step before learning. However, in
most machine learning applications, the ability to query functions at arbitrary
points in the input space is considered highly unrealistic. Instead, the
dominant query paradigm in applied machine learning, called active learning, is
one where the algorithm may ask for examples to be labeled, but only from among
those that exist in nature.
Balanced allocation of online balls-and-bins has long been an active area of
research for efficient load balancing and hashing applications. There exists a
large number of results in this domain with different settings, such as
parallel allocations [1], multi-dimensional allocations [5], weighted balls [4]
etc. For sequential multi-choice allocation where m balls are thrown into n
bins with each ball choosing d (constant) bins independently uniformly at
random, the maximum load of a bin is O(log log n)+m/n with high probability
[3]. This offers the current best known allocation scheme.
Makespan minimization on identical parallel machines is a classical
scheduling problem. We consider the online scenario where a sequence of $n$
jobs has to be scheduled non-preemptively on $m$ machines so as to minimize the
maximum completion time of any job. The best competitive ratio that can be
achieved by deterministic online algorithms is in the range $[1.88,1.9201]$.
Currently no randomized online algorithm with a smaller competitiveness is
known, for general $m$.
Allocation of balls into bins is a well studied abstraction for load
balancing problems.The literature hosts numerous results for sequential(single
dimensional) allocation case when m balls are thrown into n bins. In this paper
we study the symmetric multiple choice process for both unweighted and weighted
balls as well as for both multidimensional and scalar models.Additionally,we
present the results on bounds on gap for (1+beta) choice process with
multidimensional balls and bins.
Pairing heaps are shown to have constant amortized time Insert and Meld, thus
showing that pairing heaps have the same amortized runtimes as Fibonacci heaps
for all operations but Decrease-key.
The problem central to sparse recovery and compressive sensing is that of
stable sparse recovery: we want a distribution of matrices A in R^{m\times n}
such that, for any x \in R^n and with probability at least 2/3 over A, there is
an algorithm to recover x* from Ax with
We consider the problem of distinguishing between two arbitrary black-box
distributions defined over the domain [n], given access to $s$ samples from
both. It is known that in the worst case O(n^{2/3}) samples is both necessary
and sufficient, provided that the distributions have L1 difference of at least
{\epsilon}. However, it is also known that in many cases fewer samples suffice.
We identify a new parameter, that provides an upper bound on how many samples
needed, and present an efficient algorithm that requires the number of samples
independent of the domain size.
Recently, Mahoney and Orecchia demonstrated that popular diffusion-based
procedures to compute a quick \emph{approximation} to the first nontrivial
eigenvector of a data graph Laplacian \emph{exactly} solve certain regularized
Semi-Definite Programs (SDPs). In this paper, we extend that result by
providing a statistical interpretation of their approximation procedure.
We show that there exist linear-time algorithms that compute the strong
chromatic index and a maximum induced matching of tree-cographs when the
decomposition tree is a part of the input. We also show that there exists an
efficient algorithm for the strong chromatic index of permutation graphs.
We consider a nonlinear extension of the generalized network flow model, with
the flow leaving an arc being an increasing concave function of the flow
entering it, as proposed by Truemper and Shigeno. We give the first polynomial
time combinatorial algorithm for solving corresponding optimization problems,
finding a epsilon-approximate solution in O(m(m+\log n)\log(MUm/epsilon))
arithmetic operations and value oracle queries, where M and U are upper bounds
on simple parameters.
In this paper, we consider the \alg{Generalized Bin Covering} problem: We are
given $m$ bin types, where each bin of type $i$ has profit $p_i$ and demand
$d_i$. Furthermore, there are $n$ items, where item $j$ has size $s_j$. A bin
of type $i$ is covered if the set of items assigned to it has total size at
least the demand $d_i$. In that case, the profit of $p_i$ is earned and the
objective is to maximize the total profit. To the best of our knowledge, only
the cases $p_i = d_i = 1$ (\alg{Bin Covering}) and $p_i = d_i$
(\alg{Variable-Sized Bin Covering}) have been treated before.
Motivated by the imminent growth of massive, highly redundant genomic
databases, we study the problem of compressing a string database while
simultaneously supporting fast random access, substring extraction and pattern
matching to the underlying string(s). Bille et al. (2011) recently showed how,
given a straight-line program with $r$ rules for a string $s$ of length $n$, we
can build an $\Oh{r}$-word data structure that allows us to extract any
substring (s [i..j]) in $\Oh{\log n + j - i}$ time.
Modern graphics processors provide exceptional computa- tional power, but
only for certain computational models. While they have revolutionized
computation in many fields, compression has been largely unnaffected. This
paper aims to explain the current issues and possibili- ties in GPGPU
compression. This is done by a high level overview of the GPGPU computational
model in the context of compression algorithms; along with a more in-depth
analysis of how one would implement bzip2 on a GPGPU architecture.
Submodular function minimization is a key problem in a wide variety of
applications in machine learning, economics, game theory, computer vision and
many others. The general solver has a complexity of $O(n^6+n^5L)$ where $L$ is
the time required to evaluate the function and $n$ is the number of variables
\cite{orlin09}.
There are many existing well known cost models for the list accessing
problem. The standard cost model developed by Sleator and Tarjan is most widely
used. In this paper, we have made a comprehensive study of the existing cost
models and proposed a new cost model for the list accessing problem. In our
proposed cost model, for calculating the processing cost of request sequence
using a singly linked list, we consider the access cost, matching cost and
replacement cost. The cost of processing a request sequence is the sum of
access cost, matching cost and replacement cost.
In this paper we demonstrate that, ignoring computational constraints, it is
possible to privately release synthetic databases that are useful for large
classes of queries -- much larger in size than the database itself.
Specifically, we give a mechanism that privately releases synthetic data for a
class of queries over a discrete domain with error that grows as a function of
the size of the smallest net approximately representing the answers to that
class of queries.
List Accessing Problem is a well studied research problem in the context of
linear search. Input to the list accessing problem is an unsorted linear list
of distinct elements along with a sequence of requests, where each request is
an access operation on an element of the list. A list accessing algorithm
reorganizes the list while processing a request sequence on the list in order
to minimize the access cost.
In order to evaluate, compare, and tune graph algorithms, experiments on well
designed benchmark sets have to be performed. Together with the goal of
reproducibility of experimental results, this creates a demand for a public
archive to gather and store graph instances. Such an archive would ideally
allow annotation of instances or sets of graphs with additional information
like graph properties and references to the respective experiments and results.
Here we examine the requirements, and introduce a new community project with
the aim of producing an easily accessible library of graphs.
Covers being one of the most popular form of regularities in strings, have
drawn much attention over time. In this paper, we focus on the problem of
linear time inference of strings from cover arrays using the least sized
alphabet possible. We present an algorithm that can reconstruct a string $x$
over a two-letter alphabet whenever a valid cover array $C$ is given as an
input. This algorithm uses several interesting combinatorial properties of
cover arrays and an interesting relation between border array and cover array
to achieve this. Our algorithm runs in linear time.
We revisit various string indexing problems with range reporting features,
namely, position-restricted substring searching, indexing substrings with gaps,
and indexing substrings with intervals. We obtain the following main results.
{itemize} We give efficient reductions for each of the above problems to a new
problem, which we call \emph{substring range reporting}. Hence, we unify the
previous work by showing that we may restrict our attention to a single problem
rather than studying each of the above problems individually.
In this paper, we present a polynomial dynamic programming algorithm that
tests whether a $n$-vertex directed tree $T$ has an upward planar embedding
into a convex point-set $S$ of size $n$. Further, we extend our approach to the
class of outerplanar digraphs. This nontrivial and surprising result implies
that any given digraph can be efficiently tested for an upward planar embedding
into a given convex point set.
We introduce the problem of Weak Dominance Drawing for general directed
acyclic graphs and we show the connection with the linear extension diameter of
a partial order P. We present complexity results and bounds.
In this paper we present an algorithm, called conauto-2.0, that can
efficiently compute a set of generators of the automorphism group of a graph,
and test whether two graphs are isomorphic, finding an isomorphism if they are.
This algorithm uses the basic individualization/refinement technique, and is an
improved version of the algorithm conauto, which has been shown to be very fast
for random graphs and several families of hard graphs.
Given a set of wireless links, a fundamental problem is to find the largest
subset that can transmit simultaneously, within the SINR model of interference.
Significant progress on this problem has been made in recent years. In this
note, we study the problem in the setting where we are given a fixed set of
arbitrary powers each sender must use, and an arbitrary gain matrix defining
how signals fade. This variation of the problem appears immune to most
algorithmic approaches studied in the literature.
We consider the minimum spanning tree (MST) problem under the restriction
that for every vertex v, the edges of the tree that are adjacent to v satisfy a
given family of constraints. A famous example thereof is the classical
degree-constrained MST problem, where for every vertex v, a simple upper bound
on the degree is imposed. Iterative rounding/relaxation algorithms became the
tool of choice for degree-bounded network design problems.
A dictionary (or map) is a key-value store that requires all keys be unique,
and a multimap is a key-value store that allows for multiple values to be
associated with the same key. We design hashing-based indexing schemes for
dictionaries and multimaps that achieve worst-case optimal performance for
lookups and updates, with a small or negligible probability the data structure
will require a rehash operation, depending on whether we are working in the the
external-memory (I/O) model or one of the well-known versions of the Random
Access Machine (RAM) model.
Smoothed analysis of multiobjective 0-1 linear optimization has drawn
considerable attention recently. The number of Pareto-optimal solutions (i.e.,
solutions with the property that no other solution is at least as good in all
the coordinates and better in at least one) for multiobjective optimization
problems is the central object of study. In this paper, we prove several lower
bounds for the expected number of Pareto optima.
Until recently, techniques for obtaining lower bounds for kernelization were
one of the most sought after tools in the field of parameterized complexity.
Now, after a strong influx of techniques, we are in the fortunate situation of
having tools available that are even stronger than what has been required in
their applications so far. Based on a result of Fortnow and Santhanam (JCSS
2011), Bodlaender et al.
The Odd Cycle Transversal problem (OCT) asks whether a given graph can be
made bipartite (i.e., 2-colorable) by deleting at most l vertices. We study
structural parameterizations of OCT with respect to their polynomial
kernelizability, i.e., whether instances can be efficiently reduced to a size
polynomial in the chosen parameter. It is a major open problem in parameterized
complexity whether Odd Cycle Transversal admits a polynomial kernel when
parameterized by l.
Sundararajan and Chakraborty (2007) introduced a new version of Quick sort
removing the interchanges. Khreisat (2007) found this algorithm to be competing
well with some other versions of Quick sort. However, it uses an auxiliary
array thereby increasing the space complexity. Here, we provide a second
version of our new sort where we have removed the auxiliary array. This second
improved version of the algorithm, which we call K-sort, is found to sort
elements faster than Heap sort for an appreciably large array size (n <=
70,00,000) for uniform U[0, 1] inputs.
A basic fact in algebraic graph theory is that the number of connected
components in an undirected graph is equal to the multiplicity of the
eigenvalue 1 in the normalized adjacency matrix of the graph. In particular,
the graph is disconnected if and only if there are at least two eigenvalues
equal to 1.
Cheeger's inequality provides an "approximate" version of the latter fact,
and it states that a graph has a sparse cut (it is "almost disconnected") if
and only if there are at least two eigenvalues that are close to one.
We consider a basic problem in unsupervised learning: learning an unknown
\emph{Poisson Binomial Distribution} over $\{0,1,...,n\}$. A Poisson Binomial
Distribution (PBD) is a sum $X = X_1 + ... + X_n$ of $n$ independent Bernoulli
random variables which may have arbitrary expectations. We work in a framework
where the learner is given access to independent draws from the distribution
and must (with high probability) output a hypothesis distribution which has
total variation distance at most $\eps$ from the unknown target PBD.
A $k$-modal probability distribution over the domain $\{1,...,n\}$ is one
whose histogram has at most $k$ "peaks" and "valleys." Such distributions are
natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability
distributions, which have been intensively studied in probability theory and
statistics.
In the context of designing a scalable overlay network to support
decentralized topic-based pub/sub communication, the Minimum Topic-Connected
Overlay problem (Min-TCO in short) has been investigated: Given a set of t
topics and a collection of n users together with the lists of topics they are
interested in, the aim is to connect these users to a network by a minimum
number of edges such that every graph induced by users interested in a common
topic is connected. It is known that Min-TCO is NP-hard and approximable within
O(log t) in polynomial time.
Periodicity in words is one of the most fundamental areas of text algorithms
and combinatorics. Two classical and natural variations of periodicity are
seeds and covers (also called quasiperiods). Linear-time algorithms are known
for finding all the covers of a word, however in case of seeds, for the past 15
years only an $O(n\log{n})$ time algorithm was known (Iliopoulos, Moore and
Park, 1996). Finding an $o(n\log{n})$ time algorithm for the all-seeds problem
was mentioned as one of the most important open problems related to repetitions
in words in a survey by Smyth (2000).
In this paper we consider two above lower bound parameterizations of the Node
Multiway Cut problem - above the maximum separating cut and above a natural
LP-relaxation - and prove them to be fixed-parameter tractable. Our results
imply O*(4^k) algorithms for Vertex Cover above Maximum Matching and Almost
2-SAT as well as an O*(2^k) algorithm for Node Multiway Cut with a standard
parameterization by the solution size, improving previous bounds for these
problems.
We address the problem of finding the minimal number of block interchanges
(exchange of two intervals) required to transform a duplicated linear genome
into a tandem duplicated linear genome. We provide a formula for the distance
as well as a polynomial time algorithm for the sorting problem.
We study the Travelling Salesman Problem (TSP) on the metric completion of
cubic and subcubic graphs, which is known to be NP-hard. The problem is of
interest because of its relation to the famous 4/3 conjecture for metric TSP,
which says that the integrality gap, i.e., the worst case ratio between the
optimal values of the TSP and its linear programming relaxation (the subtour
elimination relaxation), is 4/3. We present the first algorithm for cubic
graphs with approximation ratio 4/3. The proof uses polyhedral techniques in a
surprising way, which is of independent interest.
We study an important practical aspect of the route planning problem in
real-world road networks -- maneuvers. Informally, maneuvers are walks that
might either decrease or increase the weight of a route by additional costs.
One can model a wide range of route restrictions or traffic regulations by
maneuvers. For instance turn-prohibitions, traffic light delays, round-abouts,
forbidden passages and so on. A new approach to tackle maneuvers during route
planning queries without prior adjustments of the underlying road network graph
is presented.
Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a
generic tool for constant-time algorithms. For any epsilon > 0, a partitioning
oracle provides query access to a fixed partition of the input bounded-degree
minor-free graph, in which every component has size poly(1/epsilon), and the
number of edges removed is at most epsilon*n, where n is the number of vertices
in the graph.
The intersection graph of a collection of trapezoids with corner points lying
on two parallel lines is called a trapezoid graph. Using binary indexed tree
data structure, we improve algorithms for calculating the size and the number
of minimum vertex covers (or independent sets), as well as the total number of
vertex covers, and reduce the time complexity from $O (n^2)$ to $O (n \log n)$,
where $n$ is the number of trapezoids.
A simple algorithm with quasi-linear time complexity and linear space
complexity for the evaluation of the hypergeometric series with rational
coefficients is constructed. It is shown that this algorithm is suitable in
practical informatics for constructive analogues of often used constants of
analysis.
First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA)
problem (a.k.a. FTFA problem in \cite{shihongftfa}). In the problem, we are
given a set of sites equipped with an unconstrained number of facilities as
resources, and a set of clients with set $\mathcal{R}$ as corresponding
connection requirements, where every facility belonging to the same site has an
identical opening (operating) cost and every client-facility pair has a
connection cost. The objective is to allocate facilities from sites to satisfy
$\mathcal{R}$ at a minimum total cost.
The Multiple Hypothesis Tracking (MHT) algorithm is known to produce good
results in difficult multi-target tracking situations. However, its
implementation is not trivial, and is associated with a significant programming
effort, code size and long implementation time. We propose a library which
addresses these problems by providing a domain independent implementation of
the most complex MHT operations. We also address the problem of applying
clustering in domain independent manner.
A subset $T \subseteq V$ of terminals is $k$-connected to a root $s$ in a
directed/undirected graph $J$ if $J$ has $k$ internally-disjoint $vs$-paths for
every $v \in T$; $T$ is $k$-connected in $J$ if $T$ is $k$-connected to every
$s \in T$. We consider the {\sf Subset $k$-Connectivity Augmentation} problem:
given a graph $G=(V,E)$ with edge/node-costs, node subset $T \subseteq V$, and
a subgraph $J=(V,E_J)$ of $G$ such that $T$ is $k$-connected in $J$, find a
minimum-cost augmenting edge-set $F \subseteq E \setminus E_J$ such that $T$ is
$(k+1)$-connected in $J \cup F$.
I give an introduction to algorithmic uses of the principle of
inclusion-exclusion. The presentation is intended to be be concrete and
accessible, at the expense of generality and comprehensiveness.
We present two on-line algorithms for maintaining a topological order of a
directed $n$-vertex acyclic graph as arcs are added, and detecting a cycle when
one is created. Our first algorithm handles $m$ arc additions in $O(m^{3/2})$
time. For sparse graphs ($m/n = O(1)$), this bound improves the best previous
bound by a logarithmic factor, and is tight to within a constant factor among
algorithms satisfying a natural {\em locality} property. Our second algorithm
handles an arbitrary sequence of arc additions in $O(n^{5/2})$ time.
We present a (5/3 - epsilon)-approximation algorithm for some constant
epsilon>0 for the traveling salesman path problem under the unit-weight
graphical metric, and prove an upper bound on the integrality gap of the
path-variant Held-Karp relaxation both under this metric and the general
metric. Given a complete graph with the metric cost and two designated
endpoints in the graph, the traveling salesman path problem is to find a
minimum Hamiltonian path between these two endpoints. The best previously known
performance guarantee for this problem was 5/3 and was due to Hoogeveen.
We study algorithms for the Submodular Multiway Partition problem (SubMP). An
instance of SubMP consists of a finite ground set $V$, a subset of $k$ elements
$S = \{s_1,s_2,...,s_k\}$ called terminals, and a non-negative submodular set
function $f:2^V\rightarrow \mathbb{R}_+$ on $V$ provided as a value oracle. The
goal is to partition $V$ into $k$ sets $A_1,...,A_k$ such that for $1 \le i \le
k$, $s_i \in A_i$ and $\sum_{i=1}^k f(A_i)$ is minimized.
We study the Minimum Submodular-Cost Allocation problem (MSCA). In this
problem we are given a finite ground set $V$ and $k$ non-negative submodular
set functions $f_1 ,..., f_k$ on $V$. The objective is to partition $V$ into
$k$ (possibly empty) sets $A_1 ,..., A_k$ such that the sum $\sum_{i=1}^k
f_i(A_i)$ is minimized. Several well-studied problems such as the non-metric
facility location problem, multiway-cut in graphs and hypergraphs, and uniform
metric labeling and its generalizations can be shown to be special cases of
MSCA.
This paper studies the problem of testing if an input (Gamma,*), where Gamma
is a finite set of unknown size and * is a binary operation over Gamma given as
an oracle, is close to a specified class of groups. Friedl et al. [Efficient
testing of groups, STOC'05] have constructed an efficient tester using
poly(log|Gamma|) queries for the class of abelian groups.
We comment on two randomized algorithms for constructing low-rank matrix
decompositions. Both algorithms employ the Subsampled Randomized Hadamard
Transform [14]. The first algorithm appeared recently in [9]; here, we provide
a novel analysis that significantly improves the approximation bound obtained
in [9]. A preliminary version of the second algorithm appeared in [7]; here, we
present a mild modification of this algorithm that achieves the same
approximation bound but significantly improves the corresponding running time.
Motivated by applications to social network analysis (SNA), we study the
problem of finding the maximum number of disjoint uni-color paths in an
edge-colored graph. We show the NP-hardness and the approximability of the
problem, and both approximation and exact algorithms are proposed. Since short
paths are much more significant in SNA, we also study the length-bounded
version of the problem, in which the lengths of paths are required to be upper
bounded by a fixed integer $l$. It is shown that the problem can be solved in
polynomial time for $l=3$ and is NP-hard for $l\geq 4$.
We solve the dynamic Predecessor Problem with high probability (whp) in
constant time, using only $n^{1+\delta}$ bits of memory, for any constant
$\delta > 0$. The input keys are random wrt a wider class of the well studied
and practically important class of $(f_1, f_2)$-smooth distributions introduced
in \cite{and:mat}. It achieves O(1) whp amortized time. Its worst-case time is
$O(\sqrt{\frac{\log n}{\log \log n}})$. Also, we prove whp $O(\log \log \log
n)$ time using only $n^{1+ \frac{1}{\log \log n}}= n^{1+o(1)}$ bits. Finally,
we show whp $O(\log \log n)$ time using O(n) space.
The minimum-cost subset $k$-connected subgraph problem is a cornerstone
problem in the area of network design with vertex connectivity requirements. In
this problem, we are given a graph $G=(V,E)$ with costs on edges and a set of
terminals $T$. The goal is to find a minimum cost subgraph such that every pair
of terminals are connected by $k$ openly (vertex) disjoint paths. In this
paper, we present an approximation algorithm for the subset $k$-connected
subgraph problem which improves on the previous best approximation guarantee of
$O(k^2\log{k})$ by Nutov (FOCS 2009).
Courcelle's Theorem states that every problem definable in Monadic
Second-Order logic can be solved in linear time on structures of bounded
treewidth, for example, by constructing a tree automaton that recognizes or
rejects a tree decomposition of the structure. Existing, optimized software
like the MONA tool can be used to build the corresponding tree automata, which
for bounded treewidth are of constant size. Unfortunately, the constants
involved can become extremely large - every quantifier alternation requires a
power set construction for the automaton.
The notion of the cover is a generalization of a period of a string, and
there are linear time algorithms for finding the shortest cover. The seed is a
more complicated generalization of periodicity, it is a cover of a superstring
of a given string, and the shortest seed problem is of much higher algorithmic
difficulty. The problem is not well understood, no linear time algorithm is
known. In the paper we give linear time algorithms for some of its versions ---
computing shortest left-seed array, longest left-seed array and checking for
seeds of a given length.
We introduce a variant of modal logic, dubbed EXISTENTIAL COUNTING MODAL
LOGIC (ECML), which captures a vast majority of problems known to be tractable
in single exponential time when parameterized by treewidth. It appears that all
these results can be subsumed by the theorem that model checking of ECML admits
an algorithm with such complexity. We extend ECML by adding connectivity
requirements and, using the Cut&Count technique introduced by Cygan et al.
It is shown how to enhance any data structure in the pointer model to make it
confluently persistent, with efficient query and update times and limited space
overhead. Updates are performed in $O(\log n)$ amortized time, and following a
pointer takes $O(\log c \log n)$ time where $c$ is the in-degree of a node in
the data structure. In particular, this proves that confluent persistence can
be achieved at a logarithmic cost in the bounded in-degree model used widely in
previous work.
We give an efficient algorithm which can obtain a relative error
approximation to the spectral norm of a matrix, combining the power iteration
method with some techniques from matrix reconstruction which use random
sampling.
We study the fundamental problem of reliable interactive communication over a
noisy channel. In a breakthrough sequence of papers published in 1992 and 1993,
Schulman gave non-constructive proofs of the existence of general methods to
emulate any two-party interactive protocol such that: (1) the emulation
protocol takes a constant-factor longer than the original protocol, and (2) if
the emulation protocol is executed over a noisy channel, then the probability
that the emulation protocol fails is exponentially small in the total length of
the protocol.
A flaw in the greedy approximation algorithm proposed by Zhang et al. for
minimum connected set cover problem is corrected, and a stronger result on the
approximation ratio of the modified greedy algorithm is established. The
results are now consistent with the existing results on connected dominating
set problem which is a special case of the minimum connected set cover problem.
We introduce a new model for buffer management of network switches with
Quality of Service (QoS) requirements and give a tight analysis. Specifically,
each network packet is associated with a numerical value, called Class of
Service (CoS), which represents its priority. We are furthermore given a
network switch with $m$ queues that have individual capacities. Each queue
stores packets of a certain CoS, only. A stream of packets arrives over time at
the switch and an online algorithm has to decide on the admission and
transmission of packets.
An \emph{acyclic coloring} of a graph is a proper vertex coloring such that
the union of any two color classes induces a disjoint collection of trees. The
more restricted notion of \emph{star coloring} requires that the union of any
two color classes induces a disjoint collection of stars. We prove that every
acyclic coloring of a cograph is also a star coloring and give a linear-time
algorithm for finding an optimal acyclic and star coloring of a cograph. If the
graph is given in the form of a cotree, the algorithm runs in O(n) time.
We present data-oblivious algorithms in the external-memory model for
compaction, selection, and sorting. Motivation for such problems comes from
clients who use outsourced data storage services and wish to mask their data
access patterns. We show that compaction and selection can be done
data-obliviously using $O(N/B)$ I/Os, and sorting can be done, with a high
probability of success, using $O((N/B)\log_{M/B} (N/B))$ I/Os.
A classic versioned data structure in storage and computer science is the
copy-on-write (CoW) B-tree -- it underlies many of today's file systems and
databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn't
inherit the B-tree's optimality properties; it has poor space utilization,
cannot offer fast updates, and relies on random IO to scale. Yet, nothing
better has been developed since. We describe the `stratified B-tree', which
beats the CoW B-tree in every way.
Eigenvectors of data matrices play an important role in many computational
problems, ranging from signal processing to machine learning and control. For
instance, algorithms that compute positions of the nodes of a wireless network
on the basis of pairwise distance measurements require a few leading
eigenvectors of the distances matrix.
We present an incremental, scalable and efficient dimension reduction
technique for tensors that is based on sparse random linear coding. Data is
stored in a compactified representation with fixed size, which makes memory
requirements low and predictable. Component encoding and decoding are performed
on-line without computationally expensive re-analysis of the data set. The
range of tensor indices can be extended dynamically without modifying the
component representation.
In a recent work, Doerr and Fouz [\emph{Asymptotically Optimal Randomized
Rumor Spreading}, in ArXiv] present a new quasi-random PUSH algorithm for the
rumor spreading problem (also known as gossip spreading or message propagation
problem). Their \emph{hybrid protocol} outperforms all known PUSH protocols.
In this paper we are interested in indexing texts for susbstring matching
with one edit error. That is given a text T of n characters over an alphabet of
size sigma, we are asked to build a data structure that answers to the
following query: find all the occ substrings of the text which are at edit
distance at most 1 from a string p of length m. In this paper we show two new
results for this problem. The first result suitable for arbitrary alphabet size
sigma uses O(n(log^epsilon n+ log sigma)) words of space and answers to queries
in time O(m+occ).
We present an improved heuristic for estimating the star discrepancy of a
point set, building on the threshold accepting algorithm of Winker and Fang but
with several improvements, including a non-uniform sampling strategy which is
much more fitting for higher-dimensional inputs, and a rounding ("snapping")
step which rounds points to critical boxes where the higher discrepancy values
are guaranteed to be found.
A biclique is a set of vertices that induce a bipartite complete graph. A
graph G is biclique-Helly when its family of maximal bicliques satisfies the
Helly property. If every induced subgraph of G is also biclique-Helly, then G
is hereditary biclique-Helly. A graph is C_4-dominated when every cycle of
length 4 contains a vertex that is dominated by the vertex of the cycle that is
not adjacent to it.
We consider low-rank reconstruction of a matrix using its columns and we
present asymptotically optimal algorithms for both spectral norm and Frobenius
norm reconstruction. The main tools we introduce to obtain our r esults are:
(i) the use of fast approximate SVD-like decompositions for column
reconstruction, and (ii) two deter ministic algorithms for selecting rows from
matrices with orthonormal columns, building upon the sparse represen tation
theorem for decompositions of the identity that appeared in \cite{BSS09}.
We study a location-routing problem in the context of capacitated vehicle
routing. The input is a set of demand locations in a metric space and a fleet
of k vehicles each of capacity Q. The objective is to locate k depots, one for
each vehicle, and compute routes for the vehicles so that all demands are
satisfied and the total cost is minimized. Our main result is a constant-factor
approximation algorithm for this problem. To achieve this result, we reduce to
the k-median-forest problem, which generalizes both k-median and minimum
spanning tree, and which might be of independent interest.
In 1986 Victor Miller described an algorithm for computing the Weil pairing
in his unpublished manuscript. This algorithm has then become the core of all
pairing-based cryptosystems. Many improvements of the algorithm have been
presented. Most of them involve a choice of elliptic curves of a \emph{special}
forms to exploit a possible twist during Tate pairing computation. Other
improvements involve a reduction of the number of iterations in the Miller's
algorithm. For the generic case, Blake, Murty and Xu proposed three refinements
to Miller's algorithm over Weierstrass curves.
We implement a new algorithm for listing all maximal cliques in sparse graphs
due to Eppstein, L\"offler, and Strash (ISAAC 2010) and analyze its performance
on a large corpus of real-world graphs. Our analysis shows that this algorithm
is the first to offer a practical solution to listing all maximal cliques in
large sparse graphs. All other theoretically-fast algorithms for sparse graphs
have been shown to be significantly slower than the algorithm of Tomita et al.
(Theoretical Computer Science, 2006) in practice. However, the algorithm of
Tomita et al.
In a classical covering problem, we are given a set of requests that we need
to satisfy (fully or partially), by buying a subset of items at minimum cost.
For example, in the k-MST problem we want to find the cheapest tree spanning at
least k nodes of an edge-weighted graph. Here nodes and edges represent
requests and items, respectively.
Orthogonal Matching Pursuit (OMP) has long been considered a powerful
heuristic for attacking compressive sensing problems; however, its theoretical
development is, unfortunately, somewhat lacking. This paper presents an
improved Restricted Isometry Property (RIP) based performance guarantee for
T-sparse signal reconstruction that asymptotically approaches the conjectured
lower bound given in Davenport et al. We also further extend the
state-of-the-art by deriving reconstruction error bounds for the case of
general non-sparse signals subjected to measurement noise.
In the stochastic knapsack problem, we are given a knapsack of size B, and a
set of jobs whose sizes and rewards are drawn from a known probability
distribution. However, we know the actual size and reward only when the job
completes. How should we schedule jobs to maximize the expected total reward?
We know O(1)-approximations when we assume that (i) rewards and sizes are
independent random variables, and (ii) we cannot prematurely cancel jobs. What
can we say when either or both of these assumptions are changed?
We study algorithmic problems in multi-stage open shop processing systems
that are centered around reachability and deadlock detection questions. We
characterize safe and unsafe system states. We show that it is easy to
recognize system states that can be reached from the initial state (where the
system is empty), but that in general it is hard to decide whether one given
system state is reachable from another given system state.
Inferring cluster structure in microarray datasets is a fundamental task for
the -omic sciences. A fundamental question in Statistics, Data Analysis and
Classification, is the prediction of the number of clusters in a dataset,
usually established via internal validation measures. Despite the wealth of
internal measures available in the literature, new ones have been recently
proposed, some of them specifically for microarray data. In this dissertation,
a study of internal validation measures is given, paying particular attention
to the stability based ones.
This paper addresses combinatorial optimization scheme for solving the
multicriteria Steiner tree problem for communication network topology design
(e.g., wireless mesh network). The solving scheme is based on several models:
multicriteria ranking, clustering, minimum spanning tree, and minimum Steiner
tree problem. An illustrative numerical example corresponds to designing a
covering long-distance Wi-Fi network (static Ad-Hoc network). The set of
criteria (i.e., objective functions) involves the following: total cost, total
edge length, overall throughput (capacity), and estimate of QoS.
The problem of time-varying signal estimation or state estimation from
dynamic observation is encountered in many applications. Among well-developed
approaches the Kalman filter-type approaches are the most popular and have
played an important role of estimating dynamic signal from sequential
observations. However, when the number of observations at each time step is
highly limited, they usually take too many time steps to catch up with the true
value with satisfied accuracy, even fail to do that in many cases.
We give the first analysis of the computational complexity of {\it coalition
structure generation over graphs}. Given an undirected graph $G=(N,E)$ and a
valuation function $v:2^N\rightarrow\RR$ over the subsets of nodes, the problem
is to find a partition of $N$ into connected subsets, that maximises the sum of
the components' values.
The Parikh vector p(s) of a string s is defined as the vector of
multiplicities of the characters. Parikh vector q occurs in s if s has a
substring t with p(t)=q. We present two novel algorithms for searching for a
query q in a text s. One solves the decision problem over a binary text in
constant time, using a linear size index of the text. The second algorithm, for
a general finite alphabet, finds all occurrences of a given Parikh vector q and
has sub-linear expected time complexity; we present two variants, which both
use a linear size index of the text.
The paper addresses a new class of combinatorial problems which consist in
restructuring of solutions (as structures) in combinatorial optimization. Two
main features of the restructuring process are examined: (i) a cost of the
restructuring, (ii) a closeness to a goal solution. This problem corresponds to
redesign (improvement, upgrade) of modular systems or solutions. The
restructuring approach is described and illustrated for the following
combinatorial optimization problems: knapsack problem, multiple choice problem,
assignment problem, spanning tree problems.
A hitting set for a collection of sets is a set that has a non-empty
intersection with each set in the collection; the hitting set problem is to
find a hitting set of minimum cardinality. Motivated by instances of the
hitting set problem where the number of sets to be hit is large, we introduce
the notion of implicit hitting set problems. In an implicit hitting set problem
the collection of sets to be hit is typically too large to list explicitly;
instead, an oracle is provided which, given a set H, either determines that H
is a hitting set or returns a set that H does not hit.
A natural requirement of many distributed structures is fault-tolerance:
after some failures, whatever remains from the structure should still be
effective for whatever remains from the network. In this paper we examine
spanners of general graphs that are tolerant to vertex failures, and
significantly improve their dependence on the number of faults $r$, for all
stretch bounds.
The topic of this paper is to present a study of the connection errors in
networks of linear features that are mapped in Geographical Information Systems
(GIS), as well as algorithms that detect them and the notion of geometrical
reduction and its use in spatial data algorithms. In datasets of networks
usually there can be found errors, when a number of elements of the network are
not connected according to the specifications of the network. This can occur
due to errors in the digitization process of the network or because these
elements are connected in reality in a not legal way.
In this paper we consider methods for dynamically storing a set of different
objects ("modules") in a physical array. Each module requires one free
contiguous subinterval in order to be placed. Items are inserted or removed,
resulting in a fragmented layout that makes it harder to insert further
modules. It is possible to relocate modules, one at a time, to another free
subinterval that is contiguous and does not overlap with the current location
of the module. These constraints clearly distinguish our problem from classical
memory allocation.
A mode of a multiset $S$ is an element $a \in S$ of maximum multiplicity;
that is, $a$ occurs at least as frequently as any other element in $S$. Given a
list $A[1:n]$ of $n$ items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on $A$. Each query
consists of an input pair of indices $(i, j)$ for which a mode of $A[i:j]$ must
be returned. We present an $O(n^{2-2\epsilon})$-space static data structure
that supports range mode queries in $O(n^\epsilon)$ time in the worst case, for
any fixed $\epsilon \in [0,1/2]$.
We introduce the first self-index based on the Lempel-Ziv 1977 compression
format (LZ77). It is particularly competitive for highly repetitive text
collections such as sequence databases of genomes of related species, software
repositories, versioned document collections, and temporal text databases. Such
collections are extremely compressible but classical self-indexes fail to
capture that source of compressibility.
We study the problem of estimating the average of a Lipschitz continuous
function $f$ defined over a metric space, by querying $f$ at only a single
point. More specifically, we explore the role of randomness in drawing this
sample. Our goal is to find a distribution minimizing the expected estimation
error against an adversarially chosen Lipschitz continuous function. Our work
falls into the broad class of estimating aggregate statistics of a function
from a small number of carefully chosen samples.
We study the problem of efficiently clustering protein sequences in a limited
information setting. We assume that we do not know the distances between the
sequences in advance, and must query them during the execution of the
algorithm. Our goal is to find an accurate clustering while keeping the number
of queries small. We model the problem as a point set S with an unknown metric
d on S, and assume that we have access to one versus all distance queries that
given a point s in S return the distances between s and all other points.