Component frameworks are complex systems that rely on many layers of
abstraction to function properly. One essential requirement is a consistent
means of describing each individual component and how it relates to both other
components and the whole framework. As component frameworks are designed to be
flexible by nature, the description method should be simultaneously powerful,
lead to efficient code, and be easy to use, so that new users can quickly adapt
their own code to work with the framework.
We consider transactional memory contention management in the context of
balanced workloads, where if a transaction is writing, the number of write
operations it performs is a constant fraction of its total reads and writes. We
explore the theoretical performance boundaries of contention management in
balanced workloads from the worst-case perspective by presenting and analyzing
two new contention management algorithms. The first algorithm Clairvoyant is
O(\surd s)-competitive, where s is the number of shared resources. This
algorithm depends on explicitly knowing the conflict graph.
Managing cloud services is a fundamental challenge in todays virtualized
environments. These challenges equally face both providers and consumers of
cloud services. The issue becomes even more challenging in virtualized
environments that support mobile clouds. Cloud computing platforms such as
Amazon EC2 provide customers with flexible, on demand resources at low cost.
However, they fail to provide seamless infrastructure management and monitoring
capabilities that many customers may need.
In this report, building on the deterministic multi-valued one-to-many
Byzantine agreement (broadcast) algorithm in our recent technical report [2],
we introduce a deterministic multi-valued all-to-all Byzantine agreement
algorithm (consensus), with linear complexity per bit agreed upon. The
discussion in this note is not self-contained, and relies heavily on the
material in [2] - please refer to [2] for the necessary background.
We present a new record on computing specific bits of Pi, the mathematical
constant, and discuss performing such computations on Apache Hadoop clusters.
The specific bits represented in hexadecimal are 0E6C1294 AED40403 F56D2D76
4026265B CA98511D 0FCFFAA1 0F4D28B1 BB5392B8. These 256 bits end at the
2,000,000,000,000,252nd bit position, which doubles the previous known record.
The position of the first bit is 1,999,999,999,999,997 and the value of the two
quadrillionth bit is 0. The computation is carried out by a MapReduce program
called DistBbp.
We study the problem of clock synchronization in highly dynamic networks,
where communication links can appear or disappear at any time. The nodes in the
network are equipped with hardware clocks, but the rate of the hardware clocks
can vary arbitrarily within specific bounds, and the estimates that nodes can
obtain about the clock values of other nodes are inherently inaccurate.
We present MPWide, a platform independent communication library for
performing message passing between computers. Our library allows coupling of
several local MPI applications through a long distance network and is
specifically optimized for such communications. The implementation is
deliberately kept light-weight, platform independent and the library can be
installed and used without administrative privileges.
Many components of the IS are constructed as modular units which do not need
to communicate with each other such that the number of components increases but
the size remains constant. However, a sub-modular IS architecture in which
lymph node number and size both increase sublinearly with body size is shown to
efficiently balance the requirements of communication and migration, consistent
with experimental data. We hypothesize that the IS architecture optimizes the
tradeoff between local search for pathogens and global response using
antibodies.
Cloud computing promises a radical shift in the provisioning of computing
resource within the enterprise. This paper describes the challenges that
decision makers face when assessing the feasibility of the adoption of cloud
computing in their organisations, and describes our Cloud Adoption Toolkit,
which has been developed to support this process. The toolkit provides a
framework to support decision makers in identifying their concerns, and
matching these concerns to appropriate tools/techniques that can be used to
address them.
Erasure codes provide a storage efficient alternative to replication based
redundancy in (networked) storage systems. They however entail high
communication overhead for maintenance, when some of the encoded fragments are
lost and need to be replenished. Such overheads arise from the fundamental need
to recreate (or keep separately) first a copy of the whole object before any
individual encoded fragment can be generated and replenished. There has been
recently intense interest to explore alternatives, most prominent ones being
regenerating codes (RGC) and hierarchical codes (HC).
This paper considers parallel Gr\"obner bases algorithms on distributed
memory parallel computers with multi-core compute nodes. We summarize three
different Gr\"obner bases implementations: shared memory parallel, pure
distributed memory parallel and distributed memory combined with shared memory
parallelism. The last algorithm, called distributed hybrid, uses only one
control communication channel between the master node and the worker nodes and
keeps polynomials in shared memory on a node.
Wireless sensor-actor networks are a recent development of wireless networks
where both ordinary sensor nodes and more sophisticated and powerful nodes,
called actors, are present. In this paper we formalize a recently introduced
algorithm that recovers failed actor communication links via the existing
sensor infrastructure. We prove via refinement that the recovery is terminating
in a finite number of steps and is distributed, thus self-performed by the
actors.
Implementing a component-based system in a distributed way so that it ensures
some global constraints is a challenging problem. We consider here abstract
specifications consisting of a composition of components and a controller given
in the form of a set of interactions and a priority order amongst them. In the
context of distributed systems, such a controller must be executed in a
distributed fashion while still respecting the global constraints imposed by
interactions and priorities.
Developing large-scale distributed applications can be a daunting task.
object-based environments have attempted to alleviate problems by providing
distributed objects that look like local objects. We advocate that this
approach has actually only made matters worse, as the developer needs to be
aware of many intricate internal details in order to adequately handle partial
failures. The result is an increase of application complexity. We present an
alternative in which distribution transparency is lessened in favor of clearer
semantics.
As energy proportional computing has extended the success of DVFS (Dynamic
voltage and frequency scaling) to the entire system, DVFS control algorithms
will play a key role in reducing server clusters' power consumption. The focus
of this paper is to provide accurate cluster-level DVFS control for power
saving in a server cluster. To achieve this goal, we propose a request tracing
approach that online classifies the major causal path patterns and monitors
their performance data as a guide for accurate DVFS control.
Randomized algorithm that achieves multi-valued Byzantine agreement with high
probability, and achieves optimal complexity.
Consider an asynchronous network in a shared-memory environment consisting of
n nodes. Assume that up to f of the nodes might be Byzantine (n > 12f), where
the adversary is full-information and dynamic (sometimes called adaptive). In
addition, the non-Byzantine nodes may undergo transient failures. Nodes advance
in atomic steps, which consist of reading all registers, performing some
calculation and writing to all registers.
Storage optimization in distributed environments is a major concern when
talking about reliability in this kind of schemes. Although replication is the
most used option, erasure coding is a more optimized one.
However, erasure coding uses a lot of bandwidth to replace one node. In a
dynamic scheme, where nodes enter and leave the system frequently, bandwidth
use could be an important drawback.
The focus of this work is on the analysis of transmit beamforming schemes
with a low-rate feedback link in wireless sensor/relay networks, where nodes in
the network need to implement beamforming in a distributed manner.
Specifically, the problem of distributed phase alignment is considered, where
neither the transmitters nor the receiver has perfect channel state
information, but there is a low-rate feedback link from the receiver to the
transmitters. In this setting, a framework is proposed for systematically
analyzing the performance of distributed beamforming schemes.
Developing data mining algorithms that are suitable for cloud computing
platforms is currently an active area of research, as is developing cloud
computing platforms appropriate for data mining. Currently, the most common
benchmark for cloud computing is the Terasort (and related) benchmarks.
Although the Terasort Benchmark is quite useful, it was not designed for data
mining per se.
This paper presents a simple local medium access control protocol, called
\textsc{Jade}, for multi-hop wireless networks with a single channel that is
provably robust against adaptive adversarial jamming. The wireless network is
modeled as a unit disk graph on a set of nodes distributed arbitrarily in the
plane. In addition to these nodes, there are adversarial jammers that know the
protocol and its entire history and that are allowed to jam the wireless
channel at any node for an arbitrary $(1-\epsilon)$-fraction of the time steps,
where $0<\epsilon<1$ is an arbitrary constant.
In this paper we consider a synchronous message passing system in which in
every round an external adversary is able to send each processor up to k
messages with falsified sender identities and arbitrary content. It is formally
shown that this impersonation model is slightly stronger than the asynchronous
message passing model with crash failures. In particular, we prove that
(k+1)-set agreement can be solved in this model, while k-set agreement is
impossible, for any k>=1.
Gradecast is a simple three-round algorithm presented by Feldman and Micali.
The current work presents a very simple algorithm that utilized Gradecast to
achieve Byzantine agreement. Two small variations of the presented algorithm
lead to improved algorithms for solving the Approximate agreement problem and
the Multi-consensus problem.
An optimal approximate agreement algorithm was presented by Fekete, which
supports up to 1/4 n Byzantine nodes and has message complexity of O(n^k),
where n is the number of nodes and k is the number of rounds.
Application-layer multicast implements the multicast functionality at the
application layer. The main goal of application-layer multicast is to construct
and maintain efficient distribution structures between end-hosts. In this paper
we focus on the implementation of an application-layer multicast distribution
algorithm. We observe that the total time required to measure network latency
over TCP is influenced dramatically by the TCP connection time.
As known, physical circuits, e.g. integrated circuits or power system, work
in a distributed manner, but these circuits could not be easily simulated in a
distributed way. This is mainly because that the dynamical system of physical
circuits is nonlinear and the linearized system of physical circuits is
non-symmetrical. This paper proposes a simple and natural strategy to simulate
the physical circuit in parallel, by mimicking the internal wires or
interconnects inside the circuits by distributed numerical algorithm.
This document describes the Gloss infrastructure supporting implementation of
location-aware services. The document is in two parts. The first part describes
software architecture for the smart space. As described in D8, a local
architecture provides a framework for constructing Gloss applications, termed
assemblies, that run on individual physical nodes, whereas a global
architecture defines an overlay network for linking individual assemblies. The
second part outlines the hardware installation for local sensing.
In this paper we describe an architecture which: Permits the deployment and
execution of components in appropriate geographical locations. Provides
security mechanisms that prevent misuse of the architecture. Supports a
programming model that is familiar to application programmers. Permits
installed components to share data. Permits the deployed components to
communicate via communication channels. Provides evolution mechanisms
permitting the dynamic rearrangement of inter-connection topologies the
components that they connect.
This paper introduces the \emph{RoboCast} communication abstraction. The
RoboCast allows a swarm of non oblivious, anonymous robots that are only
endowed with visibility sensors but do not share a common coordinate system, to
asynchronously exchange information. We propose a generic framework that covers
a large class of asynchronous broadcast-based algorithms and show how our
framework can be used to implement fundamental building blocks in robot
networks such as gathering or stigmergy.
Random ring-based overlay networks have been used to study the small world
phenomenon and model fault-tolerant peer-to-peer systems (Kleinberg, 2006). It
has been shown that when each of $n$ nodes has $\ell = O(\log n)$ links,
assigning contacts according to an inverse power-law distance distribution
allows greedy routing to perform in $O(\log^2 n / \ell)$ steps (Aspnes et al.
2002). In this paper, we generalize this result by showing the same upper bound
holds when nodes are assigned a random number of links according to an
arbitrary distribution with mean $\ell$.
A GLObal Smart Space (GLOSS) provides support for interaction amongst people,
artefacts and places while taking account of both context and movement on a
global scale. Crucial to the definition of a GLOSS is the provision of a set of
location-aware services that detect, convey, store and exploit location
information. We use one of these services, hearsay, to illustrate the
implementation dimensions of a GLOSS. The focus of the paper is on both local
and global software architecture to support the implementation of such
services.
Pervasive services may be defined as services that are available "to any
client (anytime, anywhere)". Here we focus on the software and network
infrastructure required to support pervasive contextual services operating over
a wide area. One of the key requirements is a matching service capable of
as-similating and filtering information from various sources and determining
matches relevant to those services.
We propose a middleware framework for deployment and subsequent autonomic
management of component-based distributed applications. An initial deployment
goal is specified using a declarative constraint language, expressing
constraints over aspects such as component-host mappings and component
interconnection topology. A constraint solver is used to find a configuration
that satisfies the goal, and the configuration is deployed automatically.
We propose a framework for the deployment and subsequent autonomic management
of component-based distributed applications. An initial deployment goal is
specified using a declarative constraint language, expressing constraints over
aspects such as component-host mappings and component interconnection topology.
A constraint solver is used to find a configuration that satisfies the goal,
and the configuration is deployed automatically. The deployed application is
instrumented to allow subsequent autonomic management.
Distributed systems with different levels of dependence to central services
have been designed and used during recent years. Pure peer-to-peer systems
among distributed systems have no dependence on a central resource. DHT is one
of the main techniques behind these systems resulting into failure tolerant
systems which are also able to isolate continuous changes to the network to a
small section of it and therefore making it possible to scale up such networks
to millions of nodes. This survey takes a look at P2P in general and DHT
algorithms and implementations in more detail.
Concurrent garbage collectors are notoriously difficult to implement
correctly. Previous approaches to the issue of producing correct collectors
have mainly been based on posit-and-prove verification or on the application of
domain-specific templates and transformations. We show how to derive the upper
reaches of a family of concurrent garbage collectors by refinement from a
formal specification, emphasizing the application of domain-independent design
theories and transformations.
The persistent programming systems of the 1980s offered a programming model
that integrated computation and long-term storage. In these systems, reliable
applications could be engineered without requiring the programmer to write
translation code to manage the transfer of data to and from non-volatile
storage. More importantly, it simplified the programmer's conceptual model of
an application, and avoided the many coherency problems that result from
multiple cached copies of the same information.
We present our approach for deploying and managing distributed
component-based applications. A Desired State Description (DSD), written in a
high-level declarative language, specifies requirements for a distributed
application. Our infrastructure accepts a DSD as input, and from it
automatically configures and deploys the distributed application. Subsequent
violations of the original requirements are detected and, where possible,
automatically rectified by reconfiguration and redeployment of the necessary
application components.
In this paper, we present the first snap-stabilizing message forwarding
protocol that uses a number of buffers per node being inde- pendent of any
global parameter, that is 4 buffers per link. The protocol works on a linear
chain of nodes, that is possibly an overlay on a large- scale and dynamic
system, e.g., Peer-to-Peer systems, Grids. . . Provided that the topology
remains a linear chain and that nodes join and leave "neatly", the protocol
tolerates topology changes. We expect that this protocol will be the base to
get similar results on more general topologies.
We describe an approach to modelling a Byzantine tolerant distributed
algorithm as a family of related finite state machines, generated from a single
meta-model. Various artefacts are generated from each state machine, including
diagrams and source-level protocol implementations. The approach allows a state
machine formulation to be applied to problems for which it would not otherwise
be suitable, increasing confidence in correctness.
We present a novel self-stabilizing algorithm for minimum spanning tree (MST)
construction. The space complexity of our solution is $O(\log^2n)$ bits and it
converges in $O(n^2)$ rounds. Thus, this algorithm improves the convergence
time of all previously known self-stabilizing asynchronous MST algorithms by a
multiplicative factor $\Theta(n)$, to the price of increasing the best known
space complexity by a factor $O(\log n)$.
Bandwidth-starved multicore chips have become ubiquitous. It is well known
that the performance of stencil codes can be improved by temporal blocking,
lessening the pressure on the memory interface. We introduce a new pipelined
approach that makes explicit use of shared caches in multicore environments and
minimizes synchronization and boundary overhead. Benchmark results are
presented for three current x86-based microprocessors, showing clearly that our
optimization works best on designs with high-speed shared caches and low memory
bandwidth per core.
This volume contains the papers presented at the 1st International Workshop
on "Decentralized Coordination of Distributed Processes", DCDP 2010, held in
Amsterdam, The Netherlands on June 10th, 2010 in conjunction with the 5th
International Federated Conferences on Distributed Computing Techniques,
DisCoTec 2010. The central theme of the workshop is the decentralized
coordination of distributed processes. Decentralized: there is no single
authority in the network that everything is vulnerable to.
In this paper, a method for efficient scheduling to obtain optimum job
throughput in a distributed campus grid environment is presented; Traditional
job schedulers determine job scheduling using user and job resource attributes.
User attributes are related to current usage, historical usage, user priority
and project access. Job resource attributes mainly comprise of soft
requirements (compilers, libraries) and hard requirements like memory, storage
and interconnect. A job scheduler dispatches jobs to a resource if a job's hard
and soft requirements are met by a resource.
Event ordering in distributed system (DS) is disputable and proactive subject
in DS particularly with the emergence of multimedia synchronization. According
to the literature, different type of event ordering is used for different DS
mode such as asynchronous or synchronous. Recently, there are several novel
implementation of these types introduced to fulfill the demand for establishing
a certain order according to a specific criterion in DS with lighter
complexity.
Cloud Data Servers is the novel approach for providing secure service to
e-business .Millions of users are surfing the Cloud for various purposes,
therefore they need highly safe and persistent services. Usually hackers target
particular Operating Systems or a Particular Controller. Inspiteof several
ongoing researches Conventional Web Servers and its Intrusion Detection System
might not be able to detect such attacks. So we implement a Cloud Data Server
with Session Controller Architecture using Redundancy and Disconnected Data
Access Mechanism.
Coordinated checkpointing is an effective fault tolerant technique in
distributed system as it avoids the domino effect and require minimum storage
requirement. Most of the earlier coordinated checkpoint algorithms block their
computation during checkpointing and forces minimum-process or non-blocking but
forces all nodes to takes checkpoint even though many of them may not be
necessary or non-blocking minimum-process but takes useless checkpoints or
reduced useless checkpoint but has higher synchronization message overhead or
has high checkpoint request propagation time.
A simple method for improving cache efficiency of serial and parallel
explicit finite procedure with application to casting solidification simulation
over three-dimensional complex geometries is presented. The method is based on
division of the global data to smaller blocks and treating each block
independently from others at each time step. A novel parallel finite element
algorithm for non-overlapped element-base decomposed domain is presented for
implementation of serial and parallel version of the presented method. Effect
of mesh reordering on the efficiency is also investigated.
In a heterogeneous, dynamic environment, like Grid, post-mortem analysis is
of no use and data needs to be collected and analysed in real time. Novel
techniques are also required for dynamically tuning the application's
performance and resource brokering in order to maintain the desired QoS. The
objective of this paper is to propose an integrated framework for performance
analysis and tuning of the application, and rescheduling the application, if
necessary, to some other resources in order to adapt to the changing resource
usage scenario in a dynamic environment.
This paper presents the overall design of a multi-agent framework for tuning
the performance of an application executing in a distributed environment. The
multi-agent framework provides services like resource brokering, analyzing
performance monitoring data, local tuning and also rescheduling in case of any
performance problem on a specific resource provider. The paper also briefly
describes the implementation of some part of the framework. In particular, job
migration on the basis of performance monitoring data is particularly
highlighted in this paper.
Cloud computing refers a paradigm shift to overall IT solutions while raising
the accessibility, scalability and effectiveness through its enabling
technologies. However, migrated cloud platforms and services cost benefits as
well as performances are neither clear nor summarized. Globalization and the
recessionary economic times have not only raised the bar of a better IT
delivery models but also have given access to technology enabled services via
internet.
Efficiency and simplicity of random algorithms have made them a lucrative
alternative for solving complex problems in the domain of communication
networks. This paper presents a random algorithm for handling the routing
problem in Mobile Ad hoc Networks [MANETS].The performance of most existing
routing protocols for MANETS degrades in terms of packet delay and congestion
caused as the number of mobile nodes increases beyond a certain level or their
speed passes a certain level.
In mobile database environments, multiple users may access similar data items
irrespective of their physical location leading to concurrent access anomalies.
As disconnections and mobility are the common characteristics in mobile
environment, performing concurrent access to a particular data item leads to
inconsistency. Most of the approaches use locking mechanisms to achieve
concurrency control.
The condition of $t$-resilience stipulates that an $n$-process program is
only obliged to make progress when at least $n-t$ processes are correct. Put
another way, the \emph{live sets}, the collection of process sets such that
progress is required if all the processes in one of these sets are correct, are
all sets with at least $n-t$ processes. In this paper we study what happens
when the live sets are any arbitrary collection of sets $\L$.
Self-stabilization is an versatile approach to fault-tolerance since it
permits a distributed system to recover from any transient fault that
arbitrarily corrupts the contents of all memories in the system. Byzantine
tolerance is an attractive feature of distributed system that permits to cope
with arbitrary malicious behaviors. We consider the well known problem of
constructing a maximum metric tree in this context.
Computational Grid is enormous environments with heterogeneous resources and
stable infrastructures among other Internet-based computing systems. However,
the managing of resources in such systems has its special problems. Scheduler
systems need to get last information about participant nodes from information
centers for the purpose of firmly job scheduling. In this paper, we focus on
online updating resource information centers with processed and provided data
based on the assumed hierarchical model.
A self-stabilizing protocol provides by definition a tolerance to transient
failures. Recently, a new class of self-stabilizing protocols appears. These
protocols provides also a tolerance to a given number of permanent failures. In
this article, we are interested in self-stabilizing protocols that deal with
Byzantines failures. We prove that, for some problems which not allow strict
stabilization (see [Nesterenko,Arora,2002]), there exist solutions that
tolerates Byzantine faults if we define a new criteria of tolerance.
Tree-based protocols are ubiquitous in distributed systems. They are
flexible, they perform generally well, and, in static conditions, their
analysis is mostly simple. Under churn, however, node joins and failures can
have complex global effects on the tree overlays, making analysis surprisingly
subtle. To our knowledge, few prior analytic results for performance estimation
of tree based protocols under churn are currently known. We study a simple
Bellman-Ford-like protocol which performs network size estimation over a
tree-shaped overlay.
Parallel Global Optimization Algorithms (PGOA) provide an efficient way of
dealing with hard optimization problems. One method of parallelization of GOAs
that is frequently applied and commonly found in the contemporary literature is
the so-called Island Model (IM). In this paper we analyze the impact of the
migration topology on the performance of a PGOA which uses the Island Model. In
particular we consider parallel Differential Evolution and Simulated Annealing
with Adaptive Neighborhood and draw first conclusions that emerge from the
conducted experiments.
Exploiting the performance of today's processors requires, apart from an
intimate knowledge of the microarchitecture, taking into account the influence
of an ever-growing complexity in thread and cache topology. LIKWID is a
collection of small command line applications that support inexperienced as
well as seasoned programmers in developing and running software in an efficient
way. The development of LIKWID is targeted on providing access to
performance-oriented tooling in a transparent and easy manner.
A software platform for global optimisation, called PaGMO, has been developed
within the Advanced Concepts Team (ACT) at the European Space Agency, and was
recently released as an open-source project.
We propose different implementations of the sparse matrix--dense vector
multiplication (\spmv{}) for finite fields and rings $\Zb/m\Zb$. We take
advantage of graphic card processors (GPU) and multi-core architectures. Our
aim is to improve the speed of \spmv{} in the \linbox library, and henceforth
the speed of its black box algorithms. Besides, we use this and a new
parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank
implementation over finite fields.
We propose a new theoretical model for passively mobile Wireless Sensor
Networks. We call it the PALOMA model, standing for PAssively mobile
LOgarithmic space MAchines. The main modification w.r.t. the Population
Protocol model is that agents now, instead of being automata, are Turing
Machines whose memory is logarithmic in the population size n. Note that the
new model is still easily implementable with current technology. We focus on
complete communication graphs.
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible.
A wide variety of models for concurrent programs has been proposed during the
past decades, each one focusing on various aspects of computations: trace
equivalence, causality between events, conflicts and schedules due to resource
accesses, etc. More recently, models with a geometrical flavor have been
introduced, based on the notion of cubical set. These models are very rich and
expressive since they can represent commutation between any bunch of events,
thus generalizing the principle of true concurrency.
P2P overlays provide a framework for building distributed applications
consisting of few to many resources with features including self-configuration,
scalability, and resilience to node failures. Such systems have been
successfully adopted in large-scale services for content delivery networks,
file sharing, and data storage. In small-scale systems, they can be useful to
address privacy concerns and for network applications that lack dedicated
servers. The bootstrap problem, finding an existing peer in the overlay,
remains a challenge to enabling these services for small-scale P2P systems.
The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for
distributed applications, using the leader-follower replication technique. The
LLFT system provides application-transparent replication, with strong replica
consistency, for applications that involve multiple interacting processes or
threads.
After decades of engineering development and infrastructural investment,
Internet connections have become commodity product in many countries, and
Internet scale "cloud computing" has started to compete with traditional
software business through its technological advantages and economy of scale.
Cloud computing is a promising enabling technology of Internet ware Cloud
Computing is termed as the next big thing in the modern corporate world.
In this paper, we intend to answer one key question to the success of cloud
computing: in cloud, do many task computing (MTC) or high throughput computing
(HTC) service providers, which offer the corresponding computing service to end
users, benefit from the economies of scale? Our research contributions are
three-fold: first, we propose an innovative usage model, called dynamic service
provision (DSP) model, for MTC or HTC service providers.
The basic idea behind Cloud computing is that resource providers offer
elastic resources to end users. In this paper, we intend to answer one key
question to the success of Cloud computing: in Cloud, can small or medium-scale
scientific computing communities benefit from the economies of scale?
We describe an approach to parallel graph partitioning that scales to
hundreds of processors and produces a high solution quality. For example, for
many instances from Walshaw's benchmark collection we improve the best known
partitioning. We use the well known framework of multi-level graph
partitioning.
Modern large-scale date centres, such as those used for cloud computing
service provision, are becoming ever-larger as the operators of those data
centres seek to maximise the benefits from economies of scale. With these
increases in size comes a growth in system complexity, which is usually
problematic. There is an increased desire for automated "self-star"
configuration, management, and failure-recovery of the data-centre
infrastructure, but many traditional techniques scale much worse than linearly
as the number of nodes to be managed increases.
Much of the current focus in high-performance computing is on
multi-threading, multi-computing, and graphics processing unit (GPU) computing.
However, vectorization and non-parallel optimization techniques, which can
often be employed additionally, are less frequently discussed. In this paper,
we present an analysis of several optimizations done on both central processing
unit (CPU) and GPU implementations of a particular computationally intensive
Metropolis Monte Carlo algorithm.
This paper presents two conceptually simple methods for parallelizing a
Parallel Tempering Monte Carlo simulation in a distributed volunteer computing
context, where computers belonging to the general public are used. The first
method uses conventional multi-threading. The second method uses CUDA, a
graphics card computing system. Parallel Tempering is described, and challenges
such as parallel random number generation and mapping of Monte Carlo chains to
different threads are explained.
Power line communication continues to draw increasing interest by promising a
wide range of applications including cost-free last-mile communication
solution. However, signal transmitted through the power lines deteriorates
badly due to the presence of severe inter-symbol interference (ISI) and harsh
random pulse noise. This work proposes a new precoded turbo equalization scheme
specifically designed for the PLC channels.
This paper defines a class of labeled stratified order structures that
characterizes exactly the notion of combined traces (i.e., comtraces) proposed
by Janicki and Koutny in 1995. Our main technical contributions are the
representation theorems showing that comtrace quotient monoid, combined
dependency graph (Kleijn and Koutny 2008) and our labeled stratified order
structure characterization are three different and yet equivalent ways to
represent comtraces.
To save cost, recently more and more users choose to provision virtual
machine resources in cluster systems, especially in data centres. Maintaining a
consistent member view is the foundation of reliable cluster managements, and
it also raises several challenge issues for large scale cluster systems
deployed with virtual machines (which we call virtualized clusters). In this
paper, we introduce our experiences in design and implementation of scalable
member view management on large-scale virtual clusters.
The study of high-dimensional differential equations is challenging and
difficult due to the analytical and computational intractability. Here, we
significantly improve the speed of waveform relaxation (WR), a method to
simulate high-dimensional differential-algebraic equations. This new method
termed adaptive waveform relaxation (AWR) is tested on a communication network
example. Further we analyze different heuristics for computing graph partitions
tailored to adaptive waveform relaxation.
Gossip algorithms are attractive for in-network processing in sensor networks
because they do not require any specialized routing, there is no bottleneck or
single point of failure, and they are robust to unreliable wireless network
conditions. Recently, there has been a surge of activity in the computer
science, control, signal processing, and information theory communities,
developing faster and more robust gossip algorithms and deriving theoretical
performance guarantees. This article presents an overview of recent work in the
area.
Minimizing waiting time for tasks waiting in the queue for execution is one
of the important scheduling cri-teria which took a wide area in scheduling
preemptive tasks. In this paper we present Changeable Time Quan-tum (CTQ)
approach combined with the round-robin algorithm, we try to adjust the time
quantum according to the burst times of the tasks in the ready queue.
Cloud computing providers have setup several data centers at different
geographical locations over the Internet in order to optimally serve needs of
their customers around the world. However, existing systems do not support
mechanisms and policies for dynamically coordinating load distribution among
different Cloud-based data centers in order to determine optimal location for
hosting application services to achieve reasonable QoS levels.
Computing as you know it is about to change, your applications and documents
are going to move from the desktop into the cloud. I'm talking about cloud
computing, where applications and files are hosted on a "cloud" consisting of
thousands of computers and servers, all linked together and accessible via the
Internet. With cloud computing, everything you do is now web based instead of
being desktop based. You can access all your programs and documents from any
computer that's connected to the Internet. How will cloud computing change the
way you work?
Grid computing is the next logical step to distributed computing. Main
objective of grid computing is an innovative approach to share resources such
as CPU usage; memory sharing and software sharing. Data Grids provide
transparent access to semantically related data resources in a heterogeneous
system. The system incorporates both data mining and grid computing techniques
where Grid application reduces the time for sending results to several clients
at the same time and Data mining application on computational grids gives fast
and sophisticated results to users.
Cloud computing promises a radical shift in the provisioning of computing
resource within enterprise.
One of the biggest huddles faced by researchers studying algorithms for
massive graphs is the lack of large input graphs that are essential for the
development and test of the graph algorithms. This paper proposes two efficient
and highly scalable parallel graph generation algorithms that can produce
massive realistic graphs to address this issue. The algorithms, designed to
achieve high degree of parallelism by minimizing inter-processor
communications, are two of the fastest graph generators which are capable of
generating scale-free graphs with billions of vertices and edges.
Intensive experiences show and confirm that grid environments can be
considered as the most promising way to solve several kinds of problems
relating either to cooperative work especially where involved collaborators are
dispersed geographically or to some very greedy applications which require
enough power of computing or/and storage. Such environments can be classified
into two categories; first, dedicated grids where the federated computers are
solely devoted to a specific work through its end.
The commonly used asynchronous bounded delay (ABD) network models assume a
fixed bound on message delay. We propose a probabilistic network model, called
asynchronous bounded expected delay (ABE) model. Instead of a strict bound, the
ABE model requires only a bound on the expected message delay. While the
conditions of ABD networks restrict the set of possible executions, in ABE
networks all asynchronous executions are possible, but executions with
extremely long delays are less probable. In contrast to ABD networks, ABE
networks cannot be synchronised efficiently.
We consider the problem of developing an efficient multi-threaded
implementation of the matrix-vector multiplication algorithm for sparse
matrices with structural symmetry. Matrices are stored using the compressed
sparse row-column format (CSRC), designed for profiting from the symmetric
non-zero pattern observed in global finite element matrices. Unlike classical
compressed storage formats, performing the sparse matrix-vector product using
the CSRC requires thread-safe access to the destination vector. To avoid race
conditions, we have implemented two partitioning strategies.
We consider asynchronous message-passing systems in which some links are
timely and processes may crash. Each run defines a timeliness graph among
correct processes: (p; q) is an edge of the timeliness graph if the link from p
to q is timely (that is, there is bound on communication delays from p to q).
The main goal of this paper is to approximate this timeliness graph by graphs
having some properties (such as being trees, rings,...).
As more and more service providers choose Cloud platforms, a resource
provider needs to provision runtime environments (REs) for heterogeneous
workloads in different scenarios. Previous work fails to resolve this issue in
several ways: (1) it fails to pay attention to diverse RE requirements, and
does not enable creating coordinated REs on demand; (2) few work investigates
coordinated resource provisioning for heterogeneous workloads.
Previous work shows request tracing systems help understand and debug the
performance problems of multi-tier services. However, for large-scale data
centers, more than hundreds of thousands of service instances provide online
service at the same time. Previous work such as white-box or black box tracing
systems will produce large amount of log data, which would be correlated into
large quantities of causal paths for performance debugging. In this paper, we
propose an innovative algorithm to eliminate valueless logs of multitiers
services.
As more and more multi-tier services are developed from commercial components
or heterogeneous middleware without the source code available, both developers
and administrators need a precise request tracing tool to help understand and
debug performance problems of large concurrent services of black boxes.
Previous work fails to resolve this issue in several ways: they either accept
the imprecision of probabilistic correlation methods, or rely on knowledge of
protocols to isolate requests in pursuit of tracing accuracy.
FastFlow is a programming environment specifically targeting cache-coherent
shared-memory multi-cores. FastFlow is implemented as a stack of C++ template
libraries built on top of lock-free (fence-free) synchronization mechanisms. In
this paper we present a further evolution of FastFlow enabling programmers to
offload part of their workload on a dynamically created software accelerator
running on unused CPUs. The offloaded function can be easily derived from
pre-existing sequential code.
We consider how underused computing resources within an enterprise may be
harnessed to improve utilization and create an elastic computing
infrastructure. Most current cloud provision involves a data center model, in
which clusters of machines are dedicated to running cloud infrastructure
software. We propose an additional model, the ad hoc cloud, in which
infrastructure software is distributed over resources harvested from machines
already in existence within an enterprise.
We describe an algorithm for Byzantine agreement that is scalable in the
sense that each processor sends only $\tilde{O}(\sqrt{n})$ bits, where $n$ is
the total number of processors. Our algorithm succeeds with high probability
against an \emph{adaptive adversary}, which can take over processors at any
time during the protocol, up to the point of taking over arbitrarily close to a
1/3 fraction. We assume synchronous communication but a \emph{rushing}
adversary.
In this paper, we explore the limits of graphics processors (GPUs) for
general purpose parallel computing by studying problems that require highly
irregular data access patterns: parallel graph algorithms for list ranking and
connected components. Such graph problems represent a worst case scenario for
coalescing parallel memory accesses on GPUs which is critical for good GPU
performance. Our experimental study indicates that PRAM algorithms are a good
starting point for developing efficient parallel GPU methods but require
non-trivial modifications to ensure good GPU performance.
We present and evaluate GPU Bucket Sort, a parallel deterministic sample sort
algorithm for many-core GPUs. Our method is considerably faster than Thrust
Merge (Satish et.al., Proc. IPDPS 2009), the best comparison-based sorting
algorithm for GPUs, and it is as fast as the new randomized sample sort for
GPUs by Leischner et.al. (to appear in Proc. IPDPS 2010). Our deterministic
sample sort has the advantage that bucket sizes are guaranteed and therefore
its running time does not have the input data dependent fluctuations that can
occur for randomized sample sort.
For a large organization, different departments often maintain dedicated
cluster systems for different workloads, for example parallel batch jobs or Web
services. In this paper, we design and implement an innovative cloud computing
system software, Phoenix Cloud, to consolidate heterogeneous workloads of the
same organization on cloud computing platforms. For Phoenix Cloud, we propose
cooperative resource provision and management polices for the affiliated
departments of a large organization to share cluster systems.
Automatic performance debugging of parallel applications usually involves two
steps: automatic detection of performance bottlenecks and uncovering their root
causes for performance optimization. Previous work fails to resolve this
challenging issue in several ways: first, several previous efforts automate
analysis processes, but present the results in a confined way that only
identifies performance problems with apriori knowledge; second, several tools
take exploratory or confirmatory data analysis to automatically discover
relevant performance data relationships.
Outlier detection in data streams has gained wide importance presently due to
the increasing cases of fraud in various applications of data streams. The
techniques for outlier detection have been divided into either statistics
based, distance based, density based or deviation based. Till now, most of the
work in the field of fraud detection was distance based but it is incompetent
from computational point of view. In this paper we introduced a new clustering
based approach, which divides the stream in chunks and clusters each chunk
using kmedian into variable number of clusters.
We consider greedy contention managers for transactional memory for M x N
execution windows of transactions with M threads and N transactions per thread.
Assuming that each transaction conflicts with at most C other transactions
inside the window, a trivial greedy contention manager can schedule them within
CN time. In this paper, we show that there are much better schedules.
We consider a Mobile Ad-hoc NETwork (MANET) formed by n agents that move at
speed V according to the Manhattan Random-Way Point model over a square region
of side length L. The resulting stationary (agent) spatial probability
distribution is far to be uniform: the average density over the "central zone"
is asymptotically higher than that over the "suburb". Agents exchange data iff
they are at distance at most R within each other.
This paper considers distributed coding for multi-source single-sink data
collection wireless networks. A unified framework for network coding and
channel coding, termed "generalized adaptive network coded cooperation"
(GANCC), is proposed. Key ingredients of GANCC include: matching code graphs
with the dynamic network graphs on-the-fly, and integrating channel coding with
network coding through circulant low-density parity-check codes. Several code
constructing methods and several families of sparse-graph codes are proposed,
and information theoretical analysis is performed.
This paper considers N mobile nodes that move together in the vicinity of
each other, whose initial poses as well as subsequent movements must be
accurately tracked in real time with the assist of M(>=3) reference nodes. By
engaging the neighboring mobile nodes in a simple but effective cooperation,
and by exploiting both the time-of-arrival (TOA) information (between mobile
nodes and reference nodes) and the received-signal-strength (RSS) information
(between mobile nodes), an effective new localization strategy, termed
cooperative TOA and RSS (COTAR), is developed.
This case study illustrates the potential benefits and risks associated with
the migration of an IT system in the oil & gas industry from an in-house data
center to Amazon EC2 from a broad variety of stakeholder perspectives across
the enterprise, thus transcending the typical, yet narrow, financial and
technical analysis offered by providers. Our results show that the system
infrastructure in the case study would have cost 37% less over 5 years on EC2,
and using cloud computing could have potentially eliminated 21% of the support
calls for this system.
In this paper, we demonstrate, both theoretically and by numerical examples,
that adding a local prediction component to the update rule can significantly
improve the convergence rate of distributed averaging algorithms. We focus on
the case where the local predictor is a linear combination of the node's two
previous values (i.e., two memory taps), and our update rule computes a
combination of the predictor and the usual weighted linear combination of
values received from neighbouring nodes.
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
The purpose of this paper is to show how existing scientific software can be
parallelized using a separate thin layer of Python code where all parallel
communication is implemented. We provide specific examples on such layers of
code, and these examples may act as templates for parallelizing a wide set of
serial scientific codes. The use of Python for parallelization is motivated by
the fact that the language is well suited for reusing existing serial codes
programmed in other languages.
In this paper we present the Chelonia storage cloud middleware. It was
designed to fill the requirements gap between those of large, sophisticated
scientific collaborations which have adopted the grid paradigm for their
distributed storage needs, and of corporate business communities which are
gravitating towards the cloud paradigm. The similarities to and differences
between Chelonia and several well-known grid- and cloud-based storage solutions
are commented.
A local algorithm is a distributed algorithm that completes after a constant
number of synchronous communication rounds. We present local approximation
algorithms for the minimum dominating set problem and the maximum matching
problem in 2-coloured and weakly 2-coloured graphs. In a weakly 2-coloured
graph, both problems admit a local algorithm with the approximation factor
$(\Delta+1)/2$, where $\Delta$ is the maximum degree of the graph. We also give
a matching lower bound proving that there is no local algorithm with a better
approximation factor for either of these problems.
We present a new implementation of the Floyd-Warshall All-Pairs Shortest
Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than
the previously best reported algorithm. In order to achieve this speedup, we
applied a new technique to reduce usage of on-chip shared memory and allow the
CUDA scheduler to more effectively hide instruction latency.
This paper deals with generating of an optimized route for multiple Vehicle
routing Problems (mVRP). We used a methodology of clustering the given cities
depending upon the number of vehicles and each cluster is allotted to a
vehicle. k- Means clustering algorithm has been used for easy clustering of the
cities. In this way the mVRP has been converted into VRP which is simple in
computation compared to mVRP. After clustering, an optimized route is generated
for each vehicle in its allotted cluster.
In this paper, the severity prediction of drought through the implementation
of modern sensor networks is discussed. We describe how to design a drought
prediction system using wireless sensor networks. This paper will describe a
terrestrial interconnected wireless sensor network paradigm for the prediction
of severity of drought over a vast area of 10,000 sq km. The communication
architecture for sensor network is outlined and the protocols developed for
each layer is explored. The data integration model and sensor data analysis at
the central computer is explained.
With a goal of supporting the timely and cost-effective analysis of Terabyte
datasets on commodity components, we present and evaluate StoreTorrent, a
simple distributed filesystem with integrated fault tolerance for efficient
handling of small data records. Our contributions include an application-OS
pipelining technique and metadata structure to increase small write and read
performance by a factor of 1-10, and the use of peer-to-peer communication of
replica-location indexes to avoid transferring data during parallel analysis
even in a degraded state.
Contrary to the sequential world, the processes involved in a distributed
system do not necessarily know when a computation is globally finished. This
paper investigates the problem of the detection of the termination of local
computations. We define four types of termination detection: no detection,
detection of the local termination, detection by a distributed observer,
detection of the global termination.
This thesis introduces PEMS2, an improvement to PEMS (Parallel External
Memory System). PEMS executes Bulk-Synchronous Parallel (BSP) algorithms in an
External Memory (EM) context, enabling computation with very large data sets
which exceed the size of main memory. Many parallel algorithms have been
designed and implemented for Bulk-Synchronous Parallel models of computation.
Such algorithms generally assume that the entire data set is stored in main
memory at once.
Cloud computing represents a shift away from computing as a product that is
purchased, to computing as a service that is delivered to consumers over the
internet from large-scale data centers - or "clouds". This paper discusses some
of the research challenges for cloud computing from an enterprise or
organizational perspective, and puts them in context by reviewing the existing
body of literature in cloud computing.
Cloud computing is the latest effort in delivering computing resources as a
service. It represents a shift away from computing as a product that is
purchased, to computing as a service that is delivered to consumers over the
internet from large-scale data centres - or "clouds". Whilst cloud computing is
gaining growing popularity in the IT industry, academia appeared to be lagging
behind the rapid developments in this field.
We propose using trace-based assessment of the performance of distributed
file systems (DFS) under transactional IO load. The assessment includes
simulations and experiments using the IO traces. Our experiments suggest that
DFS, and specifically XtreemFS have a good potential to support transactional
IO load in distributed environments: they demonstrate good performance, high
availability and scalability, while at the same time opening the way to TCO
reduction.
In present study, in order to improve the performance and reduce the amount
of power which is dissipated in heterogeneous multicore processors, the ability
of detecting the program execution phases is investigated. The programs
execution intervals have been classified in different phases based on their
throughput and the utilization of the cores. The results of implementing the
phase detection technique are investigated on a single core processor and also
on a multicore processor.