The electrical power consumed by typical magnetic hard disk drives (HDD) not
only increases linearly with the number of spindles but, more significantly, it
increases as very fast power-laws of speed (RPM) and diameter. Since the
theoretical basis for this relationship is neither well-known nor readily
accessible in the literature, we show how these exponents arise from
aerodynamic disk drag and discuss their import for green storage capacity
planning.
Statistical model checking avoids the exponential growth of states associated
with probabilistic model checking by estimating properties from multiple
executions of a system and by giving results within confidence bounds. Rare
properties are often very important but pose a particular challenge for
simulation-based approaches, hence a key objective under these circumstances is
to reduce the number and length of simulations necessary to produce a given
level of confidence.
Performance curves of queueing systems can be analyzed by separating them
into three regions: the flat region, the knee region, and the exponential
region. Practical considerations, usually locate the knee region between 70-90%
of the theoretical maximum utilization. However, there is not a clear agreement
about where the boundaries between regions are, and where exactly the
utilization knee is located. An open debate about knees in performance curves
was undertaken at least 20 years ago.
Volume reconstruction by backprojection is the computational bottleneck in
many interventional clinical computed tomography (CT) applications. Today
vendors in this field replace special purpose hardware accelerators by standard
hardware like multicore chips and GPGPUs. This paper presents low-level
optimizations for the backprojection algorithm, guided by a thorough
performance analysis on four generations of Intel multicore processors
(Harpertown, Westmere, Nehalem EX, and Sandy Bridge).
A service provisioning system is examined, where a number of servers are used
to offer different types of services to paying customers. A customer is charged
for the execution of a stream of jobs; the number of jobs in the stream and the
rate of their submission is specified. On the other hand, the provider promises
a certain quality of service (QoS), measured by the average waiting time of the
jobs in the stream. A penalty is paid if the agreed QoS requirement is not met.
The objective is to maximize the total average revenue per unit time.
A server farm is examined, where a number of servers are used to offer a
service to impatient customers. Every completed request generates a certain
amount of profit, running servers consume electricity for power and cooling,
while waiting customers might leave the system before receiving service if they
experience excessive delays. A dynamic allocation policy aiming at satisfying
the conflicting goals of maximizing the quality of users' experience while
minimizing the cost for the provider is introduced and evaluated.
We investigate the scheduling of a common resource between several concurrent
users when the feasible transmission rate of each user varies randomly over
time. Time is slotted and users arrive and depart upon service completion. This
may model for example the flow-level behavior of end-users in a narrowband HDR
wireless channel (CDMA 1xEV-DO). As performance criteria we consider the
stability of the system and the mean delay experienced by the users.
We present a novel modulation level classification (MLC) method based on
probability distribution distance functions. The proposed method uses modified
Kuiper and Kolmogorov- Smirnov (KS) distances to achieve low computational
complexity and outperforms the state of the art methods based on cumulants and
goodness-of-fit (GoF) tests. We derive the theoretical performance of the
proposed MLC method and verify it via simulations. The best classification
accuracy under AWGN with SNR mismatch and phase jitter is achieved with the
proposed MLC method using Kuiper distances.
Typical constraints on embedded systems include code size limits, upper
bounds on energy consumption and hard or soft deadlines. To meet these
requirements, it may be necessary to improve the software by applying various
kinds of transformations like compiler optimizations, specific mapping of code
and data in the available memories, code compression, etc. However, a
transformation that aims at improving the software with respect to a given
criterion might engender side effects on other criteria and these effects must
be carefully analyzed.
We present an analytical framework which enables performance evaluation of
different multi-channel multi-stage spectrum sensing protocols for
Opportunistic Spectrum Access networks. Analyzed performance metrics include
the average secondary user throughput and the average collision probability
between the primary and secondary users. The analysis framework takes into
account buffering of incoming secondary user traffic, parallel and single
channel access, as well as prolonged channel observation periods at the first
and last stage of sensing.
We present Mantis, a new framework that automatically predicts program
performance with high accuracy. Mantis integrates techniques from programming
language and machine learning for performance modeling, and is a radical
departure from traditional approaches. Mantis extracts program features, which
are information about program execution runs, through program instrumentation.
It uses machine learning techniques to select features relevant to performance
and creates prediction models as a function of the selected features.
In recent years, Reversible Logic is becoming more and more prominent
technology having its applications in Low Power CMOS, Quantum Computing,
Nanotechnology, and Optical Computing. Reversibility plays an important role
when energy efficient computations are considered. In this paper, Reversible
eight-bit Parallel Binary Adder/Subtractor with Design I, Design II and Design
III are proposed. In all the three design approaches, the full Adder and
Subtractors are realized in a single unit as compared to only full Subtractor
in the existing design.
The demand for Internet services that require frequent updates through small
messages, also known as microblogging, has tremendously grown in the past few
years. Although the use of such applications by domestic users is usually free,
their access from mobile devices is subject to fees and consumes energy from
limited batteries. If a user activates his mobile device and is in range of a
service provider, a content update is received at the expense of monetary and
energy costs. Thus, users face a tradeoff between such costs and their messages
aging.
We present a model of performance bound calculus on feedforward networks
where data packets are routed under wormhole routing discipline. We are
interested in determining maximum end-to-end delays and backlogs of messages or
packets going from a source node to a destination node, through a given virtual
path in the network. Our objective here is to give a network calculus approach
for calculating the performance bounds. First we propose a new concept of
curves that we call packet curves.
We present a new tool, GPA, that can generate key performance measures for
very large systems. Based on solving systems of ordinary differential equations
(ODEs), this method of performance analysis is far more scalable than
stochastic simulation. The GPA tool is the first to produce higher moment
analysis from differential equation approximation, which is essential, in many
cases, to obtain an accurate performance prediction. We identify so-called
switch points as the source of error in the ODE approximation.
To analyze complex and heterogeneous real-time embedded systems, recent works
have proposed interface techniques between real-time calculus (RTC) and timed
automata (TA), in order to take advantage of the strengths of each technique
for analyzing various components. But the time to analyze a state-based
component modeled by TA may be prohibitively high, due to the state space
explosion problem. In this paper, we propose a framework of granularity-based
interfacing to speed up the analysis of a TA modeled component.
In this paper, we consider a two-hop relay-assisted cognitive downlink OFDMA
system (named as secondary system) dynamically accessing a spectrum licensed to
a primary network, thereby improving the efficiency of spectrum usage. A
cluster-based relay-assisted architecture is proposed for the secondary system,
where relay stations are employed for minimizing the interference to the users
in the primary network and achieving fairness for cell-edge users.
To analyze complex and heterogeneous real-time embedded systems, recent works
have proposed interface techniques between real-time calculus (RTC) and timed
automata (TA), in order to take advantage of the strengths of each technique
for analyzing various components. But the time to analyze a state-based
component modeled by TA may be prohibitively high, due to the state space
explosion problem. In this paper, we propose a framework of granularity-based
interfacing to speed up the analysis of a TA modeled component.
We present magneto-hydrodynamic simulation results for heterogeneous systems.
Heterogeneous architectures combine high floating point performance many-core
units hosted in conventional server nodes. Examples include Graphics Processing
Units (GPU's) and Cell. They have potentially large gains in performance, at
modest power and monetary cost. We implemented a magneto-hydrodynamic (MHD)
simulation code on a variety of heterogeneous and multi-core architectures ---
multi-core x86, Cell, Nvidia and ATI GPU --- in different languages, FORTRAN,
C, Cell, CUDA and OpenCL.
Stencil computations consume a major part of runtime in many scientific
simulation codes. As prototypes for this class of algorithms we consider the
iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient
parallel implementations for cache-based multicore architectures. Temporal
cache blocking is a known advanced optimization technique, which can reduce the
pressure on the memory bus significantly. We apply and refine this optimization
for a recently presented temporal blocking strategy designed to explicitly
utilize multicore characteristics.
We consider a wireless network where each flow (instead of each link) runs
its own CSMA (Carrier Sense Multiple Access) algorithm. Specifically, each flow
attempts to access the radio channel after some random time and transmits a
packet if the channel is sensed idle. We prove that, unlike the standard CSMA
algorithm, this simple distributed access scheme is optimal in the sense that
the network is stable for all traffic intensities in the capacity region of the
network.
The cache replacement algorithm plays an important role in the overall
performance of Proxy-Server system. In this paper we have proposed VoD cache
memory replacement algorithm for a multimedia server system. We propose a Rank
based cache replacement policy to manage the cache space in individual proxy
server cache.
Parallel computing plays a major role in almost all the fields from research
to major concern problem solving purposes. Many researches are till now
focusing towards the area of parallel processing. Nowadays it extends its usage
towards the end user application such as GPU as well as multi-core processor
development.
Typical protocols for peer-to-peer file sharing over the Internet divide
files to be shared into pieces. New peers strive to obtain a complete
collection of pieces from other peers and from a seed. In this paper we
identify a problem that can occur if the seeding rate is not large enough. The
problem is that, even if the statistics of the system are symmetric in the
pieces, there can be symmetry breaking, with one piece becoming very rare. If
peers depart after obtaining a complete collection, they can tend to leave
before helping other peers receive the rare piece.
This paper is devoted to the theoretical analysis of a problem derived from
interaction between two Iplanet products: Web Proxy Server and the Directory
Server. In particular, a probabilistic and stochastic-approximation model is
proposed to minimize the occurrence of LDAP connection failures in Iplanet Web
Proxy 3.6 Server. The proposed model serves not only to provide a
parameterization of the aforementioned phenomena, but also to provide
meaningful insights illustrating and supporting these theoretical results.
Multiprocessor task scheduling is an important and computationally difficult
problem. This paper proposes a comparison study of genetic algorithm and list
scheduling algorithm. Both algorithms are naturally parallelizable but have
heavy data dependencies. Based on experimental results, this paper presents a
detailed analysis of the scalability, advantages and disadvantages of each
algorithm. Multiprocessors have emerged as a powerful computing means for
running real-time applications, especially where a uni-processor system would
not be sufficient enough to execute all the tasks.
The complexity of multimedia applications in terms of intensity of
computation and heterogeneity of treated data led the designers to embark them
on multiprocessor systems on chip. The complexity of these systems on one hand
and the expectations of the consumers on the other hand complicate the
designers job to conceive and supply strong and successful systems in the
shortest deadlines. They have to explore the different solutions of the design
space and estimate their performances in order to deduce the solution that
respects their design constraints.
Real time systems are systems in which there is a commitment for timely
response by the computer to external stimuli. Real time applications have to
function correctly even in presence of faults. Fault tolerance can be achieved
by either hardware or software or time redundancy. Safety-critical applications
have strict time and cost constraints, which means that not only faults have to
be tolerated but also the constraints should be satisfied. Deadline scheduling
means that the taskwith the earliest required response time is processed.
Stochastic network calculus requires special care in the search of proper
stochastic traffic arrival models and stochastic service models. Tradeoff must
be considered between the feasibility for the analysis of performance bounds,
the usefulness of performance bounds, and the ease of their numerical
calculation. In theory, transform between different traffic arrival models and
transform between different service models are possible. Nevertheless, the
impact of the model transform on performance bounds has not been thoroughly
investigated.
Recently, hybrid architectures using accelerators like GPGPUs or the Cell
processor have gained much interest in the HPC community. The RapidMind
Multi-Core Development Platform is a programming environment that allows
generating code which is able to seamlessly run on hardware accelerators like
GPUs or the Cell processor and multicore CPUs both from AMD and Intel.
Scheduling policies for real-time systems exhibit threshold behavior that is
related to the utilization of the task set they schedule, and in some cases
this threshold is sharp. For the rate monotonic scheduling policy, we show that
periodic workload with utilization less than a threshold $U_{RM}^{*}$ can be
scheduled almost surely and that all workload with utilization greater than
$U_{RM}^{*}$ is almost surely not schedulable.
Wimax (Worldwide Interoperability for Microwave Access) is a promising
technology which can offer high speed voice, video and data service up to the
customer end. The aim of this paper is the performance evaluation of an Wimax
system under different combinations of digital modulation (BPSK, QPSK, 4 QAM
and 16 QAM) and different communication channels AWGN and fading channels
(Rayleigh and Rician). And the Wimax system incorporates Reed Solomon (RS)
encoder with Convolutional encoder with half and two third rated codes in FEC
channel coding.
Does the advent of flash devices constitute a radical change for secondary
storage? How should database systems adapt to this new form of secondary
storage? Before we can answer these questions, we need to fully understand the
performance characteristics of flash devices. More specifically, we want to
establish what kind of IOs should be favored (or avoided) when designing
algorithms and architectures for flash-based systems. In this paper, we focus
on flash IO patterns, that capture relevant distribution of IOs in time and
space, and our goal is to quantify their performance.
In IEEE 802.11, load balancing algorithms (LBA) consider only the associated
stations to balance the load of the available access points (APs). However,
although the APs are balanced, it causes a bad situation if the AP has a lower
signal length (SNR) less than the neighbor APs. So, balance the load and
associate one mobile station to an access point without care about the signal
to noise ratio (SNR) of the AP cause possibly an unforeseen QoS, such as the
bit rate, the end to end delay, the packet loss.