In this paper, we explore the problem of iterative approximate Byzantine
consensus in arbitrary directed graphs. In particular, we prove a necessary and
sufficient condition for the existence of iterative byzantine consensus
algorithms. Additionally, we use our sufficient condition to examine whether
such algorithms exist for some specific graphs.
In this paper, we study CPU utilization time patterns of several Map-Reduce
applications. After extracting running patterns of several applications, the
patterns with their statistical information are saved in a reference database
to be later used to tweak system parameters to efficiently execute unknown
applications in future. To achieve this goal, CPU utilization patterns of new
applications along with its statistical information are compared with the
already known ones in the reference database to find/predict their most
probable execution patterns.
Time plays a crucial role in the performance of computing systems. The
accurate modelling of logical devices, and of their physical implementations,
requires an appropriate representation of time and of all properties that
depend on this notion. The need for a proper model, particularly acute in the
design of clockless delay-insensitive (DI) circuits, leads one to reconsider
the classical descriptions of time and of the resulting order and causal
relations satisfied by logical operations.
Design and architecture of cloud storage system plays a vital role in cloud
computing infrastructure in order to improve the storage capacity as well as
cost effectiveness. Usually cloud storage system provides users to efficient
storage space with elasticity feature. One of the challenges of cloud storage
system is difficult to balance the providing huge elastic capacity of storage
and investment of expensive cost for it.
Cellular Automata(CA) is a discrete computing model which provides simple,
flexible and efficient platform for simulating complicated systems and
performing complex computation based on the neighborhoods information. CA
consists of two components 1) a set of cells and 2) a set of rules .
Programmable Cellular Automata(PCA) employs some control signals on a Cellular
Automata(CA) structure.
Unions of graph multiplier operators are an important class of linear
operators for processing signals defined on graphs. We present a novel method
to efficiently distribute the application of these operators to
high-dimensional signals. The proposed method features approximations of the
graph multipliers by shifted Chebyshev polynomials, whose recurrence relations
make them readily amenable to distributed computation.
Group management is a fundamental building block of today's Internet
applications. Mailing lists, chat systems, collaborative document edition but
also online social networks such as Facebook and Twitter use group management
systems. In many cases, group security is required in the sense that access to
data is restricted to group members only. Some applications also require
privacy by keeping group members anonymous and unlinkable. Group management
systems routinely rely on a central authority that manages and controls the
infrastructure and data of the system.
The ever increasing demands for using resource-constrained mobile devices for
running more resource intensive applications nowadays has initiated the
development of cyber foraging solutions that offload parts or whole
computational intensive tasks to more powerful surrogate stationary computers
and run them on behalf of mobile devices as required.
It is not an easy task to securely maintain all essential data where it has
the need in many applications for clients in cloud. To maintain our data in
cloud, it may not be fully trustworthy because client doesn't have copy of all
stored data. But any authors don't tell us data integrity through its user and
CSP level by comparison before and after the data update in cloud. So we have
to establish new proposed system for this using our data reading protocol
algorithm to check the integrity of data before and after the data insertion in
cloud.
It is very challenging part to keep safely all required data that are needed
in many applications for user in cloud. Storing our data in cloud may not be
fully trustworthy. Since client doesn't have copy of all stored data, he has to
depend on Cloud Service Provider.
High fidelity simulation of large-sized complex networks can be realized on a
distributed computing platform that leverages the combined resources of
multiple processors or machines. In a discrete event driven simulation, the
assignment of logical processes (LPs) to machines is a critical step that
affects the computational and communication burden on the machines, which in
turn affects the simulation execution time of the experiment. We study a
network partitioning game wherein each node (LP) acts as a selfish player.
Computation of optimal cycle mean in a directed weighted graph has many
applications in program analysis, performance verification in particular. In
this paper we propose a data-parallel algorithmic solution to the problem and
show how the computation of optimal cycle mean can be efficiently accelerated
by means of CUDA technology. We show how the problem of computation of optimal
cycle mean is decomposed into a sequence of data-parallel graph computation
primitives and show how these primitives can be implemented and optimized for
CUDA computation.
In our work we present two parallel algorithms and their lock-free
implementations using a popular GPU environment Nvidia CUDA. The first
algorithm is the push-relabel method for the flow problem in grid graphs. The
second is the cost scaling algorithm for the assignment problem in complete
bipartite graphs.
We present a parallel algorithm that computes the ask and bid prices of an
American option when proportional transaction costs apply to the trading of the
underlying asset. The algorithm computes the prices on recombining binomial
trees, and is designed for modern multi-core processors. Although parallel
option pricing has been well studied, none of the existing approaches takes
transaction costs into consideration. The algorithm that we propose partitions
a binomial tree into blocks.
This paper addresses the consensus problem in homonymous distributed systems
where processes are prone to crash failures and have no initial knowledge of
the system membership ("homonymous" means that several processes may have the
same identifier). New classes of failure detectors suited to these systems are
first defined. Among them, the classes H\Omega\ and H\Sigma\ are introduced
that are the homonymous counterparts of the classes \Omega\ and \Sigma,
respectively.
Cloud computing is a model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources. To provide cloud
computing services economically, it is important to optimize resource
allocation under the assumption that the required resource can be taken from a
shared resource pool. In addition, to be able to provide processing ability and
storage capacity, it is necessary to allocate bandwidth to access them at the
same time.
This paper proposes a simple and scalable web-based model for grid resource
discovery for the Internet. The resource discovery model contains the metadata
and resource finder web services. The information of resource finder web
services is kept in the repositories that are distributed in the application
layer of Internet. The resource finder web services will be discovered by
sending queries to the repositories in a similar way as the DNS protocol. The
underlying technology for implementation of the two architectures of this model
is introduced.
This paper investigates the use of redundancy and self repairing against node
failures in distributed storage systems, using various strategies. In
replication method, access to one replication node is sufficient to reconstruct
a lost node, while in MDS erasure coded systems which are optimal in terms of
redundancy-reliability tradeoff, a single node failure is repaired after
recovering the entire stored data. Moreover, regenerating codes yield a
tradeoff curve between storage capacity and repair bandwidth. The current paper
aims at investigating a new storage code.
Exascale systems, expected to emerge by the end of the next decade, will
require the exploitation of billion-way parallelism at multiple hierarchical
levels in order to achieve the desired sustained performance. The task of
assessing future machine performance is approached by identifying the factors
which currently challenge the scalability of parallel applications.
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads.
Virtualization has become commonplace in modern data centers, often referred
as "computing clouds". The capability of virtual machine live migration brings
benefits such as improved performance, manageability and fault tolerance, while
allowing workload movement with a short service downtime. However, service
levels of applications are likely to be negatively affected during a live
migration. For this reason, a better understanding of its effects on system
performance is desirable.
Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key
primitive for many high performance graph algorithms as well as for some linear
solvers, such as algebraic multigrid. Here we show that SpGEMM also yields
efficient algorithms for general sparse-matrix indexing in distributed memory,
provided that the underlying SpGEMM implementation is sufficiently flexible and
scalable.
The continuously increasing amount of digital data generated by today's
society asks for better storage solutions.
With recent developments in cloud computing, a paradigm shift from rather
static deployment of resources to more dynamic, on-demand practices means more
flexibility and better utilization of resources. This demands new ways to
efficiently configure networks.
Clustering problems have numerous applications and are becoming more
challenging as the size of the data increases. In this paper, we consider
designing clustering algorithms that can be used in MapReduce, the most popular
programming environment for processing large datasets. We focus on the
practical and popular clustering problems, $k$-center and $k$-median. We
develop fast clustering algorithms with constant factor approximation
guarantees.
Speedup measures how much faster we can solve the same problem using many
cores. If we can afford to keep the execution time fixed, then quality up
measures how much better the solution will be computed using many cores. In
this paper we describe our multithreaded implementation to track one solution
path defined by a polynomial homotopy. Limiting quality to accuracy and
confusing accuracy with precision, we strive to offset the cost of
multiprecision arithmetic running multithreaded code on many cores.
We consider the problems of deterministic broadcasting and gossiping in
completely unknown ad-hoc radio networks. We assume that nothing is known to
the nodes about the topology or even the size of the network, $n$, except that
$n > 1$. Protocols for vanilla model, when $n$ is known, may be run for
increasingly larger estimates $2^i$ on the size of the network, but one cannot
determine when such a protocol should terminate. Thus, to carry this design
paradigm, successful completion or in-completion of the process should be
detected, and this knowledge circulated in the network.
Healing algorithms play a crucial part in distributed P2P networks where
failures occur continuously and frequently. Several self-healing algorithms
have been suggested recently [IPDPS'08, PODC'08, PODC'09, PODC'11] in a line of
work that has yielded gradual improvements in the properties ensured on the
graph. This work motivates a strong general phenomenon of edge-preserving
healing that aims at obtaining self-healing algorithms with the constraint that
all original edges in the graph (not deleted by the adversary), be retained in
every intermediate graph.
We study the problem of cooperative localization of a large network of nodes
in integer-coordinated unit disk graphs, a simplified but useful version of
general random graph. Exploiting the property that the radius $r$ sets clear
cut on the connectivity of two nodes, we propose an essential philosophy that
"no connectivity is also useful information just like the information being
connected" in unit disk graphs.
This work considers the deployment of pseudo-random number generators (PRNGs)
on graphics processing units (GPUs), developing an approach based on the
xorgens generator to rapidly produce pseudo-random numbers of high statistical
quality. The chosen algorithm has configurable state size and period, making it
ideal for tuning to the GPU architecture. We present a comparison of both speed
and statistical quality with other common parallel, GPU-based PRNGs,
demonstrating favourable performance of the xorgens-based approach.
Wireless sensor networks (WSNs) are usually utilized to perform decision
fusion of event detection. Current decision fusion schemes are based on binary
valued decision and do not consider bursty contextcapture. However, bursty
context and multi-valued data are important characteristics of WSNs. One on
hand, the local decisions from sensors usually have bursty and contextual
characteristics. Fusion center must capture the bursty context information from
the sensors.
Sampling a large network with a given distribution has been identified as a
useful operation to build network overlays. For example, sampling nodes with
uniform probability is the cornerstone of epidemic information spreading, and
constructing small world network topologies can be done by sampling with a
probability that depends on the distance to a given node. In this paper we
describe distributed algorithms for sampling networks, so that a node is
selected with a probability that is a function of the distance of the node to a
special node, called the \emph{source}.
We compare the solvability of the Consensus and Broadcast problems in
synchronous communication networks in which the delivery of messages is not
reliable. The failure model is the mobile omission faults model. During each
round, some messages can be lost and the set of possible simultaneous losses is
the same for each round. We investigate these problems for the first time for
arbitrary sets of possible failures. Previously, these sets were defined by
bounding the numbers of failures.
Recent years have witnessed a slew of coding techniques custom designed for
networked storage systems. Network coding inspired regenerating codes are the
most prolifically studied among these new age storage centric codes. A lot of
effort has been invested in understanding the fundamental achievable trade-offs
of storage and bandwidth usage to maintain redundancy in presence of different
models of failures, showcasing the efficacy of regenerating codes with respect
to traditional erasure coding techniques.
At present there are a number of barriers to creating an energy efficient
workload scheduler for a Private Cloud based data center. Firstly, the
relationship between different workloads and power consumption must be
investigated. Secondly, current hardware-based solutions to providing energy
usage statistics are unsuitable in warehouse scale data centers where low cost
and scalability are desirable properties.
In this tutorial paper, we will firstly review some basic simulation concepts
and then introduce the parallel and distributed simulation techniques in view
of some new challenges of today and tomorrow. More in particular, in the last
years there has been a wide diffusion of many cores architectures and we can
expect this trend to continue. On the other hand, the success of cloud
computing is strongly promoting the everything as a service paradigm. Is
parallel and distributed simulation ready for these new challenges?
As a result of the phenomenal proliferation of modern mobile Internet-enabled
devices and the widespread utilization of wireless and cellular data networks,
mobile users are increasingly requiring services tailored to their current
context. High-level context information is typically obtained from context
services that aggregate raw context information sensed by various sensors and
mobile devices.
Unions of graph Fourier multipliers are an important class of linear
operators for processing signals defined on graphs. We present a novel method
to efficiently distribute the application of these operators to the
high-dimensional signals collected by sensor networks. The proposed method
features approximations of the graph Fourier multipliers by shifted Chebyshev
polynomials, whose recurrence relations make them readily amenable to
distributed computation.
InfiniBand is a switched fabric interconnect. The InfiniBand specification
does not define an API. However the OFED package, libibverbs, has become the
default API on Linux and Solaris systems. Sparse documentation exists for the
verbs API. The simplest InfiniBand program provided by OFED, ibv_rc_pingpong,
is about 800 lines long. The semantics of using the verbs API for this program
is not obvious to the first time reader. This paper will dissect the
ibv_rc_pingpong program in an attempt to make clear to users how to interact
with verbs.
Operators of networks covering large areas are confronted with demands from
some of their customers who are virtual service providers. These providers may
call for the connectivity service which fulfils the specificity of their
services, for instance a multicast transition with allocated bandwidth. On the
other hand, network operators want to make profit by trading the connectivity
service of requested quality to their customers and to limit their
infrastructure investments (or do not invest anything at all).
Exploiting the performance of today's microprocessors requires intimate
knowledge of the microarchitecture as well as an awareness of the ever-growing
complexity in thread and cache topology. LIKWID is a set of command line
utilities that addresses four key problems: Probing the thread and cache
topology of a shared-memory node, enforcing thread-core affinity on a program,
measuring performance counter metrics, and microbenchmarking for reliable upper
performance bounds.
Data-intensive, graph-based computations are pervasive in several scientific
applications, and are known to to be quite challenging to implement on
distributed memory systems. In this work, we explore the design space of
parallel algorithms for Breadth-First Search (BFS), a key subroutine in several
graph algorithms.
Cloud computing is rapidly emerging as a new paradigm for delivering IT
services as utlity-oriented services on subscription-basis. The rapid
development of applications and their deployment in Cloud computing
environments in efficient manner is a complex task.
A self-stabilizing protocol has the capacity to recover a legitimate behavior
whatever is its initial state. The majority of works in self-stabilization
assume a shared memory model or a communication using reliable and FIFO
channels. In this article, we interest in self-stabilizing systems using
bounded but non reliable and non FIFO channels. We propose a stabilizing
communication protocol with optimal fault resilience. In more details, this
protocol simulates a reliable and FIFO channel and ensures a minimal number of
looses, duplications, creations, and re-ordering of messages.
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory.
We present and compare various approaches to a classical selection problem on
Graphics Processing Units (GPUs). The selection problem consists in selecting
the $k$-th smallest element from an array of size $n$, called $k$-th order
statistic. We focus on calculating the median of a sample, the $n/2$-th order
statistic. We introduce a new method based on minimization of a convex
function, and show its numerical superiority when calculating the order
statistics of very large arrays on GPUs.
We consider the problem of maximizing the throughput of Byzantine consensus,
when communication links have finite capacity. Byzantine consensus is a
classical problem in distributed computing. In existing literature, the
communication links are implicitly assumed to have infinite capacity. The
problem changes significantly when the capacity of links is finite. We define
the throughput and capacity of consensus, and identify upper bound of
achievable consensus throughput.
This paper is concerned with the problem of implementing an unbounded
timestamp object from multi-writer atomic registers, in an asynchronous
distributed system of n processors with distinct identifiers where timestamps
are taken from an arbitrary universe. Ellen, Fatourou and Ruppert (2008) showed
that sqrt{n}/2-O(1) registers are required for any obstruction-free
implementation of long-lived timestamp systems from atomic registers (meaning
processors can repeatedly get timestamps). We improve this existing lower bound
in two ways.
Load balancing across a networked environment is a monotonous job. Moreover,
if the job to be distributed is a constraint satisfying one, the distribution
of load demands core intelligence. This paper proposes parallel processing
through Global Evaluation Function by means of randomly initialized agents for
solving Constraint Satisfaction Problems.
A computationally intensive large job, granulized to concurrent pieces and
operating in a dynamic environment should reduce the total processing time.
However, distributing jobs across a networked environment is a tedious and
difficult task. Job distribution in a Local Area Network based on Triangular
Dynamic Architecture (TDA) is a mechanism that establishes a dynamic
environment for job distribution, load balancing and distributed processing
with minimum interaction from the user.
This paper analyzes the cache miss cost of algorithms when scheduled using
randomized work stealing (RWS) in a parallel environment, taking into account
the effects of false sharing.
First, prior analyses (due to Acar et al.) are extended to incorporate false
sharing. However, to control the possible delays due to false sharing, some
restrictions on the algorithms seem necessary. Accordingly, the class of
Hierarchical Tree algorithms is introduced and their performance analyzed.
Self-stabilization is a versatile approach to fault-tolerance since it
permits a distributed system to recover from any transient fault that
arbitrarily corrupts the contents of all memories in the system. Byzantine
tolerance is an attractive feature of distributed systems that permits to cope
with arbitrary malicious behaviors. We consider the well known problem of
constructing a maximum metric tree in this context. Combining these two
properties leads to some impossibility results.
The crux of software transactional memory (STM) is to combine an easy-to-use
programming interface with an efficient utilization of the concurrent computing
abilities provided by modern machines. But does this combination come with an
inherent cost? We evaluate the cost of concurrency by measuring the amount of
expensive synchronization that must be employed in an STM implementation that
ensures positive concurrency, i.e., allows for concurrent transaction
processing in some executions. We consider two popular progress conditions:
progressiveness and permissiveness.
This paper discusses our experience in building SPIRE, an autonomic system
for service provision. The architecture consists of a set of hosted Web
Services subject to QoS constraints, and a certain number of servers used to
run session-based traffic. Customers pay for having their jobs run, but require
in turn certain quality guarantees: there are different SLAs specifying charges
for running jobs and penalties for failing to meet promised performance
metrics. The system is driven by an utility function, aiming at optimizing the
average earned revenue per unit time.
Cloud providers, like Amazon, offer their data centers' computational and
storage capacities for lease to paying customers. High electricity consumption,
associated with running a data center, not only reflects on its carbon
footprint, but also increases the costs of running the data center itself. This
paper addresses the problem of maximizing the revenues of Cloud providers by
trimming down their electricity costs. As a solution allocation policies which
are based on the dynamic powering servers on and off are introduced and
evaluated.
We consider the problem of performing a random walk in a distributed network.
Given bandwidth constraints, the goal of the problem is to minimize the number
of rounds required to obtain a random walk sample. Das Sarma et al. [PODC'10]
show that a random walk of length $\ell$ on a network of diameter $D$ can be
performed in $\tilde O(\sqrt{\ell D}+D)$ time. A major question left open is
whether there exists a faster algorithm, especially whether the multiplication
of $\sqrt{\ell}$ and $\sqrt{D}$ is necessary.
Application-layer multicast implements the multicast functionality at the
application layer. The main goal of application-layer multicast is to construct
and maintain efficient distribution structures between endhosts. In this paper
we focus on the implementation of an application-layer multicast network using
PlanetLab. We observe that the total time required to measure network latency
over TCP is influenced dramatically by the TCP connection time.
A primary motivation for our research in digital ecosystems is the desire to
exploit the self-organising properties of biological ecosystems. Ecosystems are
thought to be robust, scalable architectures that can automatically solve
complex, dynamic problems.
Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic
for the solution of a wide variety of problems. As a population-based
algorithm, its computation is intrinsically massively parallel, and it is
there- fore theoretically well-suited for implementation on Graphics Processing
Units (GPUs). The ACO algorithm comprises two main stages: Tour construction
and Pheromone update. The former has been previously implemented on the GPU,
using a task-based parallelism approach. However, up until now, the latter has
always been implemented on the CPU.
Operational semantics has established itself as a flexible but rigorous means
to describe the meaning of programming languages. Oftentimes, it is felt
necessary to keep a semantics small, for example to facilitate its use for
model checking by avoiding state space explosion. However, omitting many
details in a semantics typically makes results valid for a limited core
language only, leaving a wide gap towards any real implementation.
The general description of infrastructure and content of SciShop.ru computer
simulation center is given. This resource is a new form of knowledge generation
and remote education using modern Cloud Computing technologies.
We report on the performance of our cold-dark matter cosmological N-body
simulation which was carried out concurrently using supercomputers across the
globe. We ran simulations on 60 to 750 cores distributed over a variety of
supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia),
Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network
latency of 0.32 seconds and the communication over 30.000 km of optical network
cable we are able to achieve about 87% of the performance compared to an equal
number of cores on a single supercomputer.
The current trend of multicore architectures on shared memory systems
underscores the need of parallelism. While there are some programming model to
express parallelism, thread programming model has become a standard to support
these system such as OpenMP, and POSIX threads. MPI (Message Passing Interface)
which remains the dominant model used in high-performance computing today faces
this challenge.
In this article we present a new format for storing sparse matrices. The
format is designed to perform well mainly on the GPU devices. We present its
implementation in CUDA. The performance has been tested on 1,600 different
types of matrices and we compare our format with the Hybrid format. We give
detailed comparison of both formats and show their strong and weak parts.
Studies on the large scale peer-to-peer (P2P) network like Gnutella have
shown the presence of large number of free riders. Moreover, the open and
decentralized nature of P2P network is exploited by malicious users who
distribute unauthentic or harmful contents. Despite the existence of a number
of trust management schemes in the literature for combating against free riding
and distribution of malicious files, these mechanisms are not scalable due to
their high computational, communication and storage overhead.
We model both concurrent programs and the possible executions from one state
to another in a concurrent program using simplices. The latter are calculated
using necklaces of simplices in the former. For these models, the appropriate
setting is the not the traditional approach to simplicial sets, but a more
recent one due to Joyal.
Aggregation is an important building block of modern distributed
applications, allowing the determination of meaningful properties (e.g. network
size, total storage capacity, average load, majorities, etc.) that are used to
direct the execution of the system. However, the majority of the existing
aggregation algorithms exhibit relevant dependability issues, when prospecting
their use in real application environments. In this paper, we reveal some
dependability issues of aggregation algorithms based on iterative averaging
techniques, giving some directions to solve them.
This paper introduces a new theory of multiparty session types based on
symmetric sum types, by which we can type non-deterministic orchestration
choice behaviours. While the original branching type in session types can
represent a choice made by a single participant and accepted by others
determining how the session proceeds, the symmetric sum type represents a
choice made by agreement among all the participants of a session.
We present and analyze a simple and general scheme to build a churn
(fault)-tolerant structured Peer-to-Peer (P2P) network. Our scheme shows how to
``convert" a static network into a dynamic distributed hash table(DHT)-based
P2P network such that all the good properties of the static network are
guaranteed with high probability (w.h.p). Applying our scheme to a
cube-connected cycles network, for example, yields a $O(\log N)$ degree
connected network, in which every search succeeds in $O(\log N)$ hops w.h.p.,
using $O(\log N)$ messages, where $N$ is the expected stable network size.
The classic renaming protocol of Moir and Anderson (1995) uses a network of
Theta(n^2) splitters to assign unique names to n processes with unbounded
initial names. We show how to reduce this bound to Theta(n^{3/2}) splitters.
Internet of Things (IoT) and B3G/4G communication are promoting the pervasive
mobile services with its advanced features. However, security problems are also
baffled the development. This paper proposes a trust model to protect the
user's security. The billing or trust operator works as an agent to provide a
trust authentication for all the service providers. The services are classified
by sensitive value calculation. With the value, the user's trustiness for
corresponding service can be obtained.
Most traditional alarm systems cannot address security threats in a
satisfactory manner. To alleviate this problem, we developed a high-confidence
cyber-physical alarm system (CPAS), a new kind of alarm systems. This system
establishes the connection of the Internet (i.e. TCP/IP) through GPRS/CDMA/3G.
It achieves mutual communication control among terminal equipments, human
machine interfaces and users by using the existing mobile communication
network. The CPAS will enable the transformation in alarm mode from traditional
one-way alarm to two-way alarm.
It is an increasingly important issue to reduce the energy consumption of
computing systems. In this paper, we consider partition based energy-aware
scheduling of periodic real-time tasks on multicore processors. The scheduling
exploits dynamic voltage scaling (DVS) and core sleep scheduling to reduce both
dynamic and leakage energy consumption. If the overhead of core state switching
is non-negligible, however, the performance of this scheduling strategy in
terms of energy efficiency might degrade.
We study the {\em verification} problem in distributed networks, stated as
follows. Let $H$ be a subgraph of a network $G$ where each vertex of $G$ knows
which edges incident on it are in $H$. We would like to verify whether $H$ has
some properties, e.g., if it is a tree or if it is connected. We would like to
perform this verification in a decentralized fashion via a distributed
algorithm. The time complexity of verification is measured as the number of
rounds of distributed communication.
This paper describes and analyzes a hierarchical gossip algorithm for solving
the distributed average consensus problem in wireless sensor networks. The
network is recursively partitioned into subnetworks. Initially, nodes at the
finest scale gossip to compute local averages. Then, using geographic routing
to enable gossip between nodes that are not directly connected, these local
averages are progressively fused up the hierarchy until the global average is
computed.
The Extended Burrows Wheeler transform (EBWT) helps to find the distance
between two sequences. Implementation of an existing algorithm takes
considerable amount of time for small size sequences. In this paper, we give a
parallel implementation of this algorithm using NVIDIA Compute Unified Device
Architecture (CUDA). We have obtained, on an average, a 2X improvement in the
performance.
In this paper, we continue our development of algorithms used for topological
network discovery. We present native P system versions of two fundamental
problems in graph theory: finding the maximum number of edge- and node-disjoint
paths between a source node and target node. We start from the standard
depth-first-search maximum flow algorithms, but our approach is totally
distributed, when initially no structural information is available and each P
system cell has to even learn its immediate neighbors.
Nowadays, several industrial applications are being ported to parallel
architectures. In fact, these platforms allow acquire more performance for
system modelling and simulation. In the electric machines area, there are many
problems which need speed-up on their solution. This paper examines the
parallelism of sparse matrix solver on the graphics processors. More
specifically, we implement the conjugate gradient technique with input matrix
stored in CSR, and Symmetric CSR and CSC formats.
Cloud-computing shares a common pool of resources across customers at a scale
that is orders of magnitude larger than traditional multi-user systems.
Constituent physical compute servers are allocated multiple "virtual machines"
(VM) to serve simultaneously. Each VM user should ideally be unaffected by
others' demand. Naturally, this environment produces new challenges for the
service providers in meeting customer expectations while extracting an
efficient utilization from server resources. We study a new cloud service
metric that measures prolonged latency or delay suffered by customers.
Cloud infrastructures enable the efficient parallel execution of
data-intensive tasks such as entity resolution on large datasets. We
investigate challenges and possible solutions of using the MapReduce
programming model for parallel entity resolution. In particular, we propose and
evaluate two MapReduce-based implementations for Sorted Neighborhood blocking
that either use multiple MapReduce jobs or apply a tailored data replication.
In Distributed Interactive Applications (DIA) such as multiplayer games,
where many participants are involved in a same game session and communicate
through a network, they may have an inconsistent view of the virtual world
because of the communication delays across the network. This issue becomes even
more challenging when communicating through a cellular network while executing
the DIA client on a mobile terminal. Consistency maintenance algorithms may be
used to obtain a uniform view of the virtual world.
Group communication is becoming a more and more popular infrastructure for
efficient distributed applications. It consists in representing locally a group
of remote objects as a single object accessed in a single step; communications
are then broadcasted to all members. This paper provides models for automatic
verification of group-based applications, typically for detecting deadlocks or
checking message ordering. We show how to encode group communication, together
with different forms of synchronisation for group results.
We study the {edge-coloring} problem in the message-passing model of
distributed computing. This is one of the most fundamental and well-studied
problems in this area. Currently, the best-known deterministic algorithms for
(2Delta -1)-edge-coloring requires O(Delta) + log-star n time \cite{PR01},
where Delta is the maximum degree of the input graph.
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques.
Electing leader is a vital issue not only in distributed computing but also
in communication network [1, 2, 3, 4, 5], centralized mutual exclusion
algorithm [6, 7], centralized control IPC, etc. A leader is required to make
synchronization between different processes. And different election algorithms
are used to elect a coordinator among the available processes in the system
such a way that there will be only one coordinator at any time. Bully election
algorithm is one of the classical and well-known approaches in coordinator
election process.
In the coming decade, astronomical surveys of the sky will generate tens of
terabytes of images and detect hundreds of millions of sources every night. The
study of these sources will involve computation challenges such as anomaly
detection and classification, and moving object tracking. Since such studies
benefit from the highest quality data, methods such as image coaddition
(stacking) will be a critical preprocessing step prior to scientific
investigation.
While sharing resources the efficiency is substantially degraded as a result
of the scarceness of availability of the requested resources in a multiclient
support manner. These resources are often aggravated by many factors like the
temporal constraints for availability or node flooding by the requested
replicated file chunks. Thus replicated file chunks should be efficiently
disseminated in order to enable resource availability on-demand by the mobile
users.
We present a new overlay, called the {\em Deterministic Decentralized tree}
($D^2$-tree).
Mobile applications are becoming increasingly ubiquitous and provide ever
richer functionality on mobile devices. At the same time, such devices often
enjoy strong connectivity with more powerful machines ranging from laptops and
desktops to commercial clouds. This paper presents the design and
implementation of CloneCloud, a system that automatically transforms mobile
applications to benefit from the cloud.
A novel memory with limited processing power and internal connectivity at
each element is proposed. This memory carries out parallel processing within
itself. Many common algorithms using this memory are discussed. For an array of
N items, it reduces the total instruction cycle count of universal operations
such as insertion and match finding to ~ 1, local operations such as filtering
and pattern recognition to ~ local operation size, and global operations such
as sum and sorting to ~ sqrt(N) instruction cycles.
Component frameworks are complex systems that rely on many layers of
abstraction to function properly. One essential requirement is a consistent
means of describing each individual component and how it relates to both other
components and the whole framework. As component frameworks are designed to be
flexible by nature, the description method should be simultaneously powerful,
lead to efficient code, and be easy to use, so that new users can quickly adapt
their own code to work with the framework.
We consider transactional memory contention management in the context of
balanced workloads, where if a transaction is writing, the number of write
operations it performs is a constant fraction of its total reads and writes. We
explore the theoretical performance boundaries of contention management in
balanced workloads from the worst-case perspective by presenting and analyzing
two new contention management algorithms. The first algorithm Clairvoyant is
O(\surd s)-competitive, where s is the number of shared resources. This
algorithm depends on explicitly knowing the conflict graph.
Managing cloud services is a fundamental challenge in todays virtualized
environments. These challenges equally face both providers and consumers of
cloud services. The issue becomes even more challenging in virtualized
environments that support mobile clouds. Cloud computing platforms such as
Amazon EC2 provide customers with flexible, on demand resources at low cost.
However, they fail to provide seamless infrastructure management and monitoring
capabilities that many customers may need.
In this report, building on the deterministic multi-valued one-to-many
Byzantine agreement (broadcast) algorithm in our recent technical report [2],
we introduce a deterministic multi-valued all-to-all Byzantine agreement
algorithm (consensus), with linear complexity per bit agreed upon. The
discussion in this note is not self-contained, and relies heavily on the
material in [2] - please refer to [2] for the necessary background.
We present a new record on computing specific bits of Pi, the mathematical
constant, and discuss performing such computations on Apache Hadoop clusters.
The specific bits represented in hexadecimal are 0E6C1294 AED40403 F56D2D76
4026265B CA98511D 0FCFFAA1 0F4D28B1 BB5392B8. These 256 bits end at the
2,000,000,000,000,252nd bit position, which doubles the previous known record.
The position of the first bit is 1,999,999,999,999,997 and the value of the two
quadrillionth bit is 0. The computation is carried out by a MapReduce program
called DistBbp.
We study the problem of clock synchronization in highly dynamic networks,
where communication links can appear or disappear at any time. The nodes in the
network are equipped with hardware clocks, but the rate of the hardware clocks
can vary arbitrarily within specific bounds, and the estimates that nodes can
obtain about the clock values of other nodes are inherently inaccurate.
We present MPWide, a platform independent communication library for
performing message passing between computers. Our library allows coupling of
several local MPI applications through a long distance network and is
specifically optimized for such communications. The implementation is
deliberately kept light-weight, platform independent and the library can be
installed and used without administrative privileges.
Many components of the IS are constructed as modular units which do not need
to communicate with each other such that the number of components increases but
the size remains constant. However, a sub-modular IS architecture in which
lymph node number and size both increase sublinearly with body size is shown to
efficiently balance the requirements of communication and migration, consistent
with experimental data. We hypothesize that the IS architecture optimizes the
tradeoff between local search for pathogens and global response using
antibodies.
Cloud computing promises a radical shift in the provisioning of computing
resource within the enterprise. This paper describes the challenges that
decision makers face when assessing the feasibility of the adoption of cloud
computing in their organisations, and describes our Cloud Adoption Toolkit,
which has been developed to support this process. The toolkit provides a
framework to support decision makers in identifying their concerns, and
matching these concerns to appropriate tools/techniques that can be used to
address them.
Erasure codes provide a storage efficient alternative to replication based
redundancy in (networked) storage systems. They however entail high
communication overhead for maintenance, when some of the encoded fragments are
lost and need to be replenished. Such overheads arise from the fundamental need
to recreate (or keep separately) first a copy of the whole object before any
individual encoded fragment can be generated and replenished. There has been
recently intense interest to explore alternatives, most prominent ones being
regenerating codes (RGC) and hierarchical codes (HC).
This paper considers parallel Gr\"obner bases algorithms on distributed
memory parallel computers with multi-core compute nodes. We summarize three
different Gr\"obner bases implementations: shared memory parallel, pure
distributed memory parallel and distributed memory combined with shared memory
parallelism. The last algorithm, called distributed hybrid, uses only one
control communication channel between the master node and the worker nodes and
keeps polynomials in shared memory on a node.
Wireless sensor-actor networks are a recent development of wireless networks
where both ordinary sensor nodes and more sophisticated and powerful nodes,
called actors, are present. In this paper we formalize a recently introduced
algorithm that recovers failed actor communication links via the existing
sensor infrastructure. We prove via refinement that the recovery is terminating
in a finite number of steps and is distributed, thus self-performed by the
actors.
Implementing a component-based system in a distributed way so that it ensures
some global constraints is a challenging problem. We consider here abstract
specifications consisting of a composition of components and a controller given
in the form of a set of interactions and a priority order amongst them. In the
context of distributed systems, such a controller must be executed in a
distributed fashion while still respecting the global constraints imposed by
interactions and priorities.
Developing large-scale distributed applications can be a daunting task.
object-based environments have attempted to alleviate problems by providing
distributed objects that look like local objects. We advocate that this
approach has actually only made matters worse, as the developer needs to be
aware of many intricate internal details in order to adequately handle partial
failures. The result is an increase of application complexity. We present an
alternative in which distribution transparency is lessened in favor of clearer
semantics.
As energy proportional computing has extended the success of DVFS (Dynamic
voltage and frequency scaling) to the entire system, DVFS control algorithms
will play a key role in reducing server clusters' power consumption. The focus
of this paper is to provide accurate cluster-level DVFS control for power
saving in a server cluster. To achieve this goal, we propose a request tracing
approach that online classifies the major causal path patterns and monitors
their performance data as a guide for accurate DVFS control.
Randomized algorithm that achieves multi-valued Byzantine agreement with high
probability, and achieves optimal complexity.
Consider an asynchronous network in a shared-memory environment consisting of
n nodes. Assume that up to f of the nodes might be Byzantine (n > 12f), where
the adversary is full-information and dynamic (sometimes called adaptive). In
addition, the non-Byzantine nodes may undergo transient failures. Nodes advance
in atomic steps, which consist of reading all registers, performing some
calculation and writing to all registers.
Storage optimization in distributed environments is a major concern when
talking about reliability in this kind of schemes. Although replication is the
most used option, erasure coding is a more optimized one.
However, erasure coding uses a lot of bandwidth to replace one node. In a
dynamic scheme, where nodes enter and leave the system frequently, bandwidth
use could be an important drawback.
The focus of this work is on the analysis of transmit beamforming schemes
with a low-rate feedback link in wireless sensor/relay networks, where nodes in
the network need to implement beamforming in a distributed manner.
Specifically, the problem of distributed phase alignment is considered, where
neither the transmitters nor the receiver has perfect channel state
information, but there is a low-rate feedback link from the receiver to the
transmitters. In this setting, a framework is proposed for systematically
analyzing the performance of distributed beamforming schemes.
Developing data mining algorithms that are suitable for cloud computing
platforms is currently an active area of research, as is developing cloud
computing platforms appropriate for data mining. Currently, the most common
benchmark for cloud computing is the Terasort (and related) benchmarks.
Although the Terasort Benchmark is quite useful, it was not designed for data
mining per se.
This paper presents a simple local medium access control protocol, called
\textsc{Jade}, for multi-hop wireless networks with a single channel that is
provably robust against adaptive adversarial jamming. The wireless network is
modeled as a unit disk graph on a set of nodes distributed arbitrarily in the
plane. In addition to these nodes, there are adversarial jammers that know the
protocol and its entire history and that are allowed to jam the wireless
channel at any node for an arbitrary $(1-\epsilon)$-fraction of the time steps,
where $0<\epsilon<1$ is an arbitrary constant.
In this paper we consider a synchronous message passing system in which in
every round an external adversary is able to send each processor up to k
messages with falsified sender identities and arbitrary content. It is formally
shown that this impersonation model is slightly stronger than the asynchronous
message passing model with crash failures. In particular, we prove that
(k+1)-set agreement can be solved in this model, while k-set agreement is
impossible, for any k>=1.
Gradecast is a simple three-round algorithm presented by Feldman and Micali.
The current work presents a very simple algorithm that utilized Gradecast to
achieve Byzantine agreement. Two small variations of the presented algorithm
lead to improved algorithms for solving the Approximate agreement problem and
the Multi-consensus problem.
An optimal approximate agreement algorithm was presented by Fekete, which
supports up to 1/4 n Byzantine nodes and has message complexity of O(n^k),
where n is the number of nodes and k is the number of rounds.
Application-layer multicast implements the multicast functionality at the
application layer. The main goal of application-layer multicast is to construct
and maintain efficient distribution structures between end-hosts. In this paper
we focus on the implementation of an application-layer multicast distribution
algorithm. We observe that the total time required to measure network latency
over TCP is influenced dramatically by the TCP connection time.
As known, physical circuits, e.g. integrated circuits or power system, work
in a distributed manner, but these circuits could not be easily simulated in a
distributed way. This is mainly because that the dynamical system of physical
circuits is nonlinear and the linearized system of physical circuits is
non-symmetrical. This paper proposes a simple and natural strategy to simulate
the physical circuit in parallel, by mimicking the internal wires or
interconnects inside the circuits by distributed numerical algorithm.
This document describes the Gloss infrastructure supporting implementation of
location-aware services. The document is in two parts. The first part describes
software architecture for the smart space. As described in D8, a local
architecture provides a framework for constructing Gloss applications, termed
assemblies, that run on individual physical nodes, whereas a global
architecture defines an overlay network for linking individual assemblies. The
second part outlines the hardware installation for local sensing.
In this paper we describe an architecture which: Permits the deployment and
execution of components in appropriate geographical locations. Provides
security mechanisms that prevent misuse of the architecture. Supports a
programming model that is familiar to application programmers. Permits
installed components to share data. Permits the deployed components to
communicate via communication channels. Provides evolution mechanisms
permitting the dynamic rearrangement of inter-connection topologies the
components that they connect.
This paper introduces the \emph{RoboCast} communication abstraction. The
RoboCast allows a swarm of non oblivious, anonymous robots that are only
endowed with visibility sensors but do not share a common coordinate system, to
asynchronously exchange information. We propose a generic framework that covers
a large class of asynchronous broadcast-based algorithms and show how our
framework can be used to implement fundamental building blocks in robot
networks such as gathering or stigmergy.
Random ring-based overlay networks have been used to study the small world
phenomenon and model fault-tolerant peer-to-peer systems (Kleinberg, 2006). It
has been shown that when each of $n$ nodes has $\ell = O(\log n)$ links,
assigning contacts according to an inverse power-law distance distribution
allows greedy routing to perform in $O(\log^2 n / \ell)$ steps (Aspnes et al.
2002). In this paper, we generalize this result by showing the same upper bound
holds when nodes are assigned a random number of links according to an
arbitrary distribution with mean $\ell$.