As energy proportional computing has extended the success of DVFS (Dynamic
voltage and frequency scaling) to the entire system, DVFS control algorithms
will play a key role in reducing server clusters' power consumption. The focus
of this paper is to provide accurate cluster-level DVFS control for power
saving in a server cluster. To achieve this goal, we propose a request tracing
approach that online classifies the major causal path patterns and monitors
their performance data as a guide for accurate DVFS control.
In this paper, we intend to answer one key question to the success of cloud
computing: in cloud, do many task computing (MTC) or high throughput computing
(HTC) service providers, which offer the corresponding computing service to end
users, benefit from the economies of scale? Our research contributions are
three-fold: first, we propose an innovative usage model, called dynamic service
provision (DSP) model, for MTC or HTC service providers.
The basic idea behind Cloud computing is that resource providers offer
elastic resources to end users. In this paper, we intend to answer one key
question to the success of Cloud computing: in Cloud, can small or medium-scale
scientific computing communities benefit from the economies of scale?
To save cost, recently more and more users choose to provision virtual
machine resources in cluster systems, especially in data centres. Maintaining a
consistent member view is the foundation of reliable cluster managements, and
it also raises several challenge issues for large scale cluster systems
deployed with virtual machines (which we call virtualized clusters). In this
paper, we introduce our experiences in design and implementation of scalable
member view management on large-scale virtual clusters.
As more and more service providers choose Cloud platforms, a resource
provider needs to provision runtime environments (REs) for heterogeneous
workloads in different scenarios. Previous work fails to resolve this issue in
several ways: (1) it fails to pay attention to diverse RE requirements, and
does not enable creating coordinated REs on demand; (2) few work investigates
coordinated resource provisioning for heterogeneous workloads.
Previous work shows request tracing systems help understand and debug the
performance problems of multi-tier services. However, for large-scale data
centers, more than hundreds of thousands of service instances provide online
service at the same time. Previous work such as white-box or black box tracing
systems will produce large amount of log data, which would be correlated into
large quantities of causal paths for performance debugging. In this paper, we
propose an innovative algorithm to eliminate valueless logs of multitiers
services.
As more and more multi-tier services are developed from commercial components
or heterogeneous middleware without the source code available, both developers
and administrators need a precise request tracing tool to help understand and
debug performance problems of large concurrent services of black boxes.
Previous work fails to resolve this issue in several ways: they either accept
the imprecision of probabilistic correlation methods, or rely on knowledge of
protocols to isolate requests in pursuit of tracing accuracy.
For a large organization, different departments often maintain dedicated
cluster systems for different workloads, for example parallel batch jobs or Web
services. In this paper, we design and implement an innovative cloud computing
system software, Phoenix Cloud, to consolidate heterogeneous workloads of the
same organization on cloud computing platforms. For Phoenix Cloud, we propose
cooperative resource provision and management polices for the affiliated
departments of a large organization to share cluster systems.
Automatic performance debugging of parallel applications usually involves two
steps: automatic detection of performance bottlenecks and uncovering their root
causes for performance optimization. Previous work fails to resolve this
challenging issue in several ways: first, several previous efforts automate
analysis processes, but present the results in a confined way that only
identifies performance problems with apriori knowledge; second, several tools
take exploratory or confirmatory data analysis to automatically discover
relevant performance data relationships.