This paper presents an efficient architecture for various image filtering
algorithms and tumor characterization using Xilinx System Generator (XSG). This
architecture offers an alternative through a graphical user interface that
combines MATLAB, Simulink and XSG and explores important aspects concerned to
hardware implementation. Performance of this architecture implemented in
SPARTAN-3E Starter kit (XC3S500E-FG320) exceeds those of similar or greater
resources architectures. The proposed architecture reduces the resources
available on target device by 50%.
Nowadays polar codes are becoming one of the most favorable capacity
achieving error correction codes for their low encoding and decoding
complexity. However, due to the large code length required by practical
applications, the few existing successive cancellation (SC) decoder
implementations still suffer from not only the high hardware cost but also the
long decoding latency. This paper presents novel several approaches to design
low-latency decoders for polar codes based on look-ahead techniques.
Polar codes have become one of the most favorable capacity achieving error
correction codes (ECC) along with their simple encoding method. However, among
the very few prior successive cancellation (SC) polar decoder designs, the
required long code length makes the decoding latency high. In this paper,
conventional decoding algorithm is transformed with look-ahead techniques. This
reduces the decoding latency by 50%. With pipelining and parallel processing
schemes, a parallel SC polar decoder is proposed.
This paper presents approaches to develop efficient network for non-binary
quasi-cyclic LDPC (QC-LDPC) decoders. By exploiting the intrinsic shifting and
symmetry properties of the check matrices, significant reduction of memory size
and routing complexity can be achieved. Two different efficient network
architectures for Class-I and Class-II non-binary QC-LDPC decoders have been
proposed, respectively.
Semi-parallel, or folded, VLSI architectures are used whenever hardware
resources need to be saved at design time. Most recent applications that are
based on Projective Geometry (PG) based balanced bipartite graph also fall in
this category. In this paper, we provide a high-level, top-down design
methodology to design optimal semi-parallel architectures for applications,
whose Data Flow Graph (DFG) is based on PG bipartite graph. Such applications
have been found e.g. in error-control coding and matrix computations.
Quantum computer requires quantum arithmetic. The sophisticated design of a
reversible arithmetic logic unit (reversible ALU) for quantum arithmetic has
been investigated in this letter. We provide explicit construction of
reversible ALU effecting basic arithmetic operations. By provided the
corresponding control unit, the proposed reversible ALU can combine the
classical arithmetic and logic operation in a reversible integrated system.
This letter provides actual evidence to prove the possibility of the
realization of reversible Programmable Logic Device (RPLD) using reversible
ALU.
Memory trace analysis is an important technology for architecture research,
system software (i.e., OS, compiler) optimization, and application performance
improvements. Hardware-snooping is an effective and efficient approach to
monitor and collect memory traces. Compared with software-based approaches,
memory traces collected by hardware-based approaches are usually lack of
semantic information, such as process/function/loop identifiers, virtual
address and I/O access.
This paper presents Multi-Amdahl, a resource allocation analytical tool for
heterogeneous systems. Our model includes multiple program execution segments,
where each one is accelerated by a specific hardware unit. The acceleration
speedup of the specific hardware unit is a function of a limited resource, such
as the unit area, power, or energy. Using the Lagrange theorem we discover the
optimal resource distribution between all specific units. We then illustrate
this general Multi-Amdahl technique using several examples of area and power
allocation among several cores and accelerators.
In this work novel results concerning Network-on-Chip-based turbo decoder
architectures are presented. Stemming from previous publications, this work
concentrates first on improving the throughput by exploiting adaptive-bandwidth
reduction techniques. This technique shows in the best case an improvement of
more than 60 Mb/s. Moreover, it is known that double-binary turbo decoders
require higher area than binary ones. This characteristic has the negative
effect of increasing the data width of the network nodes.
High speed Full-Adder (FA) module is a critical element in designing high
performance arithmetic circuits. In this paper, we propose a new high speed
multiple-valued logic FA module. The proposed FA is constructed by 14
transistors and 3 capacitors, using carbon nano-tube field effect transistor
(CNFET) technology. Furthermore, our proposed technique has been examined in
different voltages (i.e., 0.65v and 0.9v). The observed results reveal power
consumption and power delay product (PDP) improvements compared to existing FA
counterparts
We propose a new method for defragmenting the module layout of a
reconfigurable device, enabled by a novel approach for dealing with
communication needs between relocated modules and with inhomogeneities found in
commonly used FPGAs. Our method is based on dynamic relocation of module
positions during runtime, with only very little reconfiguration overhead; the
objective is to maximize the length of contiguous free space that is available
for new modules.
This article is about the architecture of a lossless wavelet filter bank with
reprogrammable logic. It is based on second generation of wavelets with a
reduced of number of operations. A new basic structure for parallel
architecture and modules to forward and backward integer discrete wavelet
transform is proposed.
In this paper a new solution is proposed for testing simple stwo stage
electronic circuits. It minimizes the number of tests to be performed to
determine the genuinity of the circuit. The main idea behind the present
research work is to identify the maximum number of indistinguishable faults
present in the given circuit and minimize the number of test cases based on the
number of faults that has been detected. Heuristic approach is used for test
minimization part, which identifies the essential tests from overall test
cases.
Considerable research has taken place in recent times in the area of
parameterization of software defined radio (SDR) architecture. Parameterization
decreases the size of the software to be downloaded and also limits the
hardware reconfiguration time. The present paper is based on the design and
development of a programmable baseband modulator that perform the QPSK
modulation schemes and as well as its other three commonly used variants to
satisfy the requirement of several established 2G and 3G wireless communication
standards.
Segmentation display plays a vital role to display numerals. But in today's
world matrix display is also used in displaying numerals. Because numerals has
lots of curve edges which is better supported by matrix display. But as matrix
display is costly and complex to implement and also needs more memory, segment
display is generally used to display numerals.
Segmented display is widely used for efficient display of alphanumeric
characters. English numerals are displayed by 7 segment and 16 segment display.
The segment size is uniform in this two display architecture. Display
architecture using 8, 10, 11, 18 segments have been proposed for Bengali
numerals 0...9 yet no display architecture is designed using segments of
uniform size and uniform power consumption. In this paper we have proposed a
uniform 10 segment architecture for Bengali numerals. This segment architecture
uses segments of uniform size and no bent segment is used.
Optimization techniques for decreasing the time and area of adder circuits
have been extensively studied for years mostly in binary logic system. In this
paper, we provide the necessary equations required to design a full adder in
quaternary logic system. We develop the equations for single-stage parallel
adder which works as a carry look-ahead adder. We also provide the design of a
logarithmic stage parallel adder which can compute the carries within log2(n)
time delay for n qudits.
We present a mixed analog-digital spectrum sensing method that is especially
suited to the typical wideband setting of cognitive radio (CR). The advantages
of our system with respect to current architectures are threefold. First, our
analog front-end is fixed and does not involve scanning hardware. Second, both
the analog-to-digital conversion (ADC) and the digital signal processing (DSP)
rates are substantially below Nyquist.
In this paper, we have introduced the notion of UselessGate and
ReverseOperation. We have also given an algorithm to implement a sorting
network for reversible logic synthesis based on swapping bit strings. The
network is constructed in terms of n*n Toffoli Gates read from left to right
and it has shown that there will be no more gates than the number of swappings
the algorithm requires. The gate complexity of the network is O(n2). The number
of gates in the network can be further reduced by template reduction technique
and removing UselessGate from the network.
In this paper, we have introduced an algorithm to implement a sorting network
for reversible logic synthesis based on swapping bit strings. The algorithm
first constructs a network in terms of n*n Toffoli gates read from left to
right. The number of gates in the circuit produced by our algorithm is then
reduced by template matching and removing useless gates from the network. We
have also compared the efficiency of the proposed method with the existing
ones.
Today every circuit has to face the power consumption issue for both portable
device aiming at large battery life and high end circuits avoiding cooling
packages and reliability issues that are too complex. It is generally accepted
that during logic synthesis power tracks well with area. This means that a
larger design will generally consume more power. The multiplier is an important
kernel of digital signal processors. Because of the circuit complexity, the
power consumption and area are the two important design considerations of the
multiplier.
This paper examines the problem of introducing advanced forms of
fault-tolerance via reconfiguration into safety-critical avionic systems. This
is required to enable increased availability after fault occurrence in
distributed integrated avionic systems(compared to static federated systems).
The approach taken is to identify a migration path from current architectures
to those that incorporate re-configuration to a lesser or greater degree.
For high throughput applications, turbo-like iterative decoders are
implemented with parallel architectures. However, to be efficient parallel
architectures require to avoid collision accesses i.e. concurrent read/write
accesses should not target the same memory block. This consideration applies to
the two main classes of turbo-like codes which are Low Density Parity Check
(LDPC) and Turbo-Codes. In this paper we propose a methodology which finds a
collision-free mapping of the variables in the memory banks and which optimizes
the resulting interleaving architecture.
DDR SDRAM is similar in function to the regular SDRAM but doubles the
bandwidth of the memory by transferring data on both edges of the clock cycles.
DDR SDRAM most commonly used in various embedded application like networking,
image or video processing, Laptops ete. Now a days many applications needs more
and more cheap and fast memory. Especially in the field of signal processing,
requires significant amount of memory. The most used type of dynamic memory for
that purpose is DDR SDRAM.
The aim of this paper is to present an adaptable Fat Tree NoC architecture
for Field Programmable Gate Array (FPGA) designed for image analysis
applications. Traditional NoCs (Network on Chip) are not optimal for dataflow
applications with large amount of data. On the opposite, point to point
communications are designed from the algorithm requirements but they are
expensives in terms of resource and wire. We propose a dedicated communication
architecture for image analysis algorithms.
This chapter describes the main architectures proposed in the literature to
implement the channel decoders required by the WiMax standard, namely
convolutional codes, turbo codes (both block and convolutional) and LDPC. Then
it shows a complete design of a convolutional turbo code encoder/decoder system
for WiMax.
Every year, the computing resources available on dynamically partially
reconfigurable devices increase enormously. In the near future, we expect many
applications to run on a single reconfigurable device. In this paper, we
present a concept for multitasking on dynamically partially reconfigurable
systems called virtual area management. We explain its advantages, show its
challenges, and discuss possible solutions.
The main goal of this research is to develop the concepts of a revolutionary
processor system called Functional Processor System. The fairly novel work
carried out in this proposal concentrates on decoding of function pipelines and
distributing it in FPUs as a part of scheduling approach. As the functional
programs are super-level programs that entails requirements only at functional
level, decoding of functions and distribution of functions in the heterogeneous
functional processor units are a challenge.
In this paper we propose an Intelligent Management System which is capable of
managing the automobile functions using the rigorous real-time principles and a
multicore processor in order to realize higher efficiency and safety for the
vehicle. It depicts how various automobile functionalities can be fine grained
and treated to fit in real time concepts. It also shows how the modern
multicore processors can be of good use in organizing vast amounts of
correlated functions to be executed in real-time with excellent time
commitments.
The paper describes the new computers architecture, the main features of
which has been claimed in the Russian Federation patent 2312388 and in the US
patent application 11/991331. This architecture is intended to effective
support of the General Purpose Parallel Computing (GPPC), the essence of which
is extremely frequent switching of threads between states of activity and
states of viewed in the paper the algorithmic latency.
To cope with the soft errors and make full use of the multi-core system, this
paper gives an efficient fault-tolerant hardware and software co-designed
architecture for multi-core systems. And with a not large number of test
patterns, it will use less than 33% hardware resources compared with the
traditional hardware redundancy (TMR) and it will take less than 50% time
compared with the traditional software redundancy (time redundant).Therefore,
it will be a good choice for the fault-tolerant architecture for the future
high-reliable multi-core systems.
Multiple-input multiple-output (MIMO) wireless transmission imposes huge
challenges on the design of efficient hardware architectures for iterative
receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping,
often approached by sphere decoding (SD). In this paper, we introduce the - to
our best knowledge - first VLSI architecture for SISO SD applying a single
tree-search approach. Compared with a soft-output-only base architecture
similar to the one proposed by Studer et al.
Graphics processing units (GPUs) are gaining widespread use in computational
chemistry and other scientific simulation contexts because of their huge
performance advantages relative to conventional CPUs. However, the reliability
of GPUs in error-intolerant applications is largely unproven. In particular, a
lack of error checking and correcting (ECC) capability in the memory subsystems
of graphics cards has been cited as a hindrance to the acceptance of GPUs as
high-performance coprocessors, but the impact of this design has not been
previously quantified.
This work proposes a general framework for the design and simulation of
network on chip based turbo decoder architectures. Several parameters in the
design space are investigated, namely the network topology, the parallelism
degree, the rate at which messages are sent by processing nodes over the
network and the routing strategy.
The growing amount of XML encoded data exchanged over the Internet increases
the importance of XML based publish-subscribe (pub-sub) and content based
routing systems. The input in such systems typically consists of a stream of
XML documents and a set of user subscriptions expressed as XML queries. The
pub-sub system then filters the published documents and passes them to the
subscribers. Pub-sub systems are characterized by very high input ratios,
therefore the processing time is critical.
At present, the mostly used and developed mechanism is hardware
virtualization which provides a common platform to run multiple operating
systems and applications in independent partitions. More precisely, it is all
about resource virtualization as the term hardware virtualization is
emphasized. In this paper, the aim is to find out the advantages and
limitations of current virtualization techniques, analyze their cost and
performance and also depict which forthcoming hardware virtualization
techniques will able to provide efficient solutions for multiprocessor
operating systems.