Driving support systems, such as car navigation systems are becoming common
and they support driver in several aspects. Non-intrusive method of detecting
Fatigue and drowsiness based on eye-blink count and eye directed instruction
controlhelps the driver to prevent from collision caused by drowsy driving. Eye
detection and tracking under various conditions such as illumination,
background, face alignment and facial expression makes the problem
complex.Neural Network based algorithm is proposed in this paper to detect the
eyes efficiently.
Image fusion produces a single fused image from a set of input images. A new
method for image fusion is proposed based on Weighted Average Merging Method
(WAMM) in the NonSubsampled Contourlet Transform (NSCT) domain. A performance
analysis on various statistical fusion rules are also analysed both in NSCT and
Wavelet domain. Analysis has been made on medical images, remote sensing images
and multi focus images.
With the introduction of spectral-domain optical coherence tomography (OCT),
much larger image datasets are routinely acquired compared to what was possible
using the previous generation of time-domain OCT. Thus, the need for 3-D
segmentation methods for processing such data is becoming increasingly
important.
We present a novel approach to background subtraction that is based on the
local shape of small image regions. In our approach, an image region centered
on a pixel is mod-eled using the local self-similarity descriptor. We aim at
obtaining a reliable change detection based on local shape change in an image
when foreground objects are moving.
Connected operators are filtering tools that act by merging elementary
regions of an image. A popular strategy is based on tree-based image
representations: for example, one can compute an attribute on each node of the
tree and keep only the nodes for which the attribute is sufficiently strong.
This operation can be seen as a thresholding of the tree, seen as a graph whose
nodes are weighted by the attribute. Rather than being satisfied with a mere
thresholding, we propose to expand on this idea, and to apply connected filters
on this latest graph.
In this paper we address the problem of tracking non-rigid objects whose
local appearance and motion changes as a function of time. This class of
objects includes dynamic textures such as steam, fire, smoke, water, etc., as
well as articulated objects such as humans performing various actions. We model
the temporal evolution of the object's appearance/motion using a Linear
Dynamical System (LDS). We learn such models from sample videos and use them as
dynamic templates for tracking objects in novel videos.
With the advancement of communication and security technologies, it has
become crucial to have robustness of embedded biometric systems. This paper
presents the realization of such technologies which demands reliable and
error-free biometric identity verification systems. High dimensional patterns
are not permitted due to eigen-decomposition in high dimensional feature space
and degeneration of scattering matrices in small size sample. Generalization,
dimensionality reduction and maximizing the margins are controlled by
minimizing weight vectors.
A modular method was suggested before to recover a band limited signal from
the sample and hold and linearly interpolated (or, in general, an
nth-order-hold) version of the regular samples. In this paper a novel approach
for compensating the distortion of any interpolation based on modular method
has been proposed. In this method the performance of the modular method is
optimized by adding only some simply calculated coefficients. This approach
causes drastic improvement in terms of signal-to-noise ratios with fewer
modules compared to the classical modular method.
We present an algebraic approach to the watershed adapted to edge or node
weighted graphs. Starting with the flooding adjunction, we introduce the
flooding graphs, for which node and edge weights may be deduced one from the
other. Each node weighted or edge weighted graph may be transformed in a
flooding graph, showing that there is no superiority in using one or the other,
both being equivalent. We then introduce pruning operators extract subgraphs of
increasing steepness. For an increasing steepness, the number of never
ascending paths becomes smaller and smaller.
In this paper we present a novel slanted-plane MRF model which reasons
jointly about occlusion boundaries as well as depth. We formulate the problem
as the one of inference in a hybrid MRF composed of both continuous (i.e.,
slanted 3D planes) and discrete (i.e., occlusion boundaries) random variables.
This allows us to define potentials encoding the ownership of the pixels that
compose the boundary between segments, as well as potentials encoding which
junctions are physically possible.
This paper presents a novel Coprime Blurred Pair (CBP) model for visual
data-hiding for security in camera surveillance. While most previous approaches
have focused on completely encrypting the video stream, we introduce a spatial
encryption scheme by blurring the image/video contents to create a CBP. Our
goal is to obscure detail in public video streams by blurring while allowing
behavior to be recognized and to quickly deblur the stream so that details are
available if behavior is recognized as suspicious. We create a CBP by blurring
the same latent image with two unknown kernels.
Texture classification is one of the problems which has been paid much
attention on by computer scientists since late 90s. If texture classification
is done correctly and accurately, it can be used in many cases such as Pattern
recognition, object tracking, and shape recognition. So far, there have been so
many methods offered to solve this problem. Near all these methods have tried
to extract and define features to separate different labels of textures really
well.
Marker-based motion capture (MoCap) systems can be composed by several dozens
of cameras with the purpose of reconstructing the trajectories of hundreds of
targets. With a large amount of cameras it becomes interesting to determine the
optimal reconstruction strategy. For such aim it is of fundamental importance
to understand the information provided by different camera measurements and how
they are combined, i.e. how the reconstruction error changes by considering
different cameras. In this work, first, an approximation of the reconstruction
error variance is derived.
We describe a method for fast approximation of sparse coding. The input space
is subdivided by a binary decision tree, and we simultaneously learn a
dictionary and assignment of allowed dictionary elements for each leaf of the
tree. We store a lookup table with the assignments and the pseudoinverses for
each node, allowing for very fast inference. We give an algorithm for learning
the tree, the dictionary and the dictionary element assignment, and In the
process of describing this algorithm, we discuss the more general problem of
learning the groups in group structured sparse modelling.
As the usage of 3D models increases, so does the importance of developing
accurate 3D shape retrieval algorithms. A common approach is to calculate a
shape descriptor for each object, which can then be compared to determine two
objects' similarity. However, these descriptors are often evaluated
independently and on different datasets, making them difficult to compare.
Using the SHREC 2011 Shape Retrieval Contest of Non-rigid 3D Watertight Meshes
dataset, we systematically evaluate a collection of local shape descriptors.
We propose the design of an original scalable image coder/decoder that is
inspired from the mammalians retina. Our coder accounts for the time-dependent
and also nondeterministic behavior of the actual retina. The present work
brings two main contributions: As a first step, (i) we design a deterministic
image coder mimicking most of the retinal processing stages and then (ii) we
introduce a retinal noise in the coding process, that we model here as a dither
signal, to gain interesting perceptual features.
Scene parsing, or semantic segmentation, consists in labeling each pixel in
an image with the category of the object it belongs to. It is a challenging
task that involves the simultaneous detection, segmentation and recognition of
all the objects in the image.
Determining optimal number of clusters in a dataset is a challenging task.
Though some methods are available, there is no algorithm that produces unique
clustering solution. The paper proposes an Automatic Merging for Single Optimal
Solution (AMSOS) which aims to generate unique and nearly optimal clusters for
the given datasets automatically. The AMSOS is iteratively merges the closest
clusters automatically by validating with cluster validity measure to find
single and nearly optimal clusters for the given data set.
Selection of initial seeds greatly affects the quality of the clusters and in
k-means type algorithms. Most of the seed selection methods result different
results in different independent runs. We propose a single, optimal, outlier
insensitive seed selection algorithm for k-means type algorithms as extension
to k-means++. The experimental results on synthetic, real and on microarray
data sets demonstrated that effectiveness of the new algorithm in producing the
clustering results
The appearance of microcalcifications in mammograms is one of the early signs
of breast cancer. So, early detection of microcalcification clusters (MCCs) in
mammograms can be helpful for cancer diagnosis and better treatment of breast
cancer. In this paper a computer method has been proposed to support
radiologists in detection MCCs in digital mammography. First, in order to
facilitate and improve the detection step, mammogram images have been enhanced
with wavelet transformation and morphology operation. Then for segmentation of
suspicious MCCs, two methods have been investigated.
A framework for adaptive and non-adaptive statistical compressive sensing is
developed, where a statistical model replaces the standard sparsity model of
classical compressive sensing. We propose within this framework optimal
task-specific sensing protocols specifically and jointly designed for
classification and reconstruction. A two-step adaptive sensing paradigm is
developed, where online sensing is applied to detect the signal class in the
first step, followed by a reconstruction step adapted to the detected class and
the observed samples.
Compressive sensing (CS) is a new approach for the acquisition and recovery
of sparse signals and images that enables sampling rates significantly below
the classical Nyquist rate. Despite significant progress in the theory and
methods of CS, little headway has been made in compressive video acquisition
and recovery. Video CS is complicated by the ephemeral nature of dynamic
events, which makes direct extensions of standard CS imaging architectures and
signal models difficult.
In this study we investigate the fast image filtering algorithm based on
Intro sort algorithm and fast noise reduction of infrared images. Main feature
of the proposed approach is that no prior knowledge of noise required. It is
developed based on Stefan- Boltzmann law and the Fourier law. We also
investigate the fast noise reduction approach that has advantage of less
computation load. In addition, it can retain edges, details, text information
even if the size of the window increases.
Recent results in Compressive Sensing have shown that, under certain
conditions, the solution to an underdetermined system of linear equations with
sparsity-based regularization can be accurately recovered by solving convex
relaxations of the original problem. In this work, we present a novel
primal-dual analysis on a class of sparsity minimization problems.
We present an algorithm using transformation groups and their irreducible
representations to generate an orthogonal basis for a signal in the vector
space of the signal. It is shown that multiresolution analysis can be done with
amplitudes using a transformation group. G-lets is thus not a single transform,
but a group of linear transformations related by group theory. The algorithm
also specifies that a multiresolution and multiscale analysis for each
resolution is possible in terms of frequencies.
We study the task of cleaning scanned text documents that are strongly
corrupted by dirt such as manual line strokes, spilled ink etc. We aim at
autonomously removing dirt from a single letter-size page based only on the
information the page contains. Our approach, therefore, has to learn character
representations without supervision and requires a mechanism to distinguish
learned representations from irregular patterns.
This paper presents a novel reaction-diffusion (RD) method for implicit
active contours, which is completely free of the costly re-initialization
procedure in level set evolution (LSE). A diffusion term is introduced into
LSE, resulting in a RD-LSE equation, to which a piecewise constant solution can
be derived. In order to have a stable numerical solution of the RD based LSE,
we propose a two-step splitting method (TSSM) to iteratively solve the RD-LSE
equation: first iterating the LSE equation, and then solving the diffusion
equation.
Crucial information barely visible to the human eye is often embedded in a
series of low-resolution images taken of the same scene. Super-resolution
enables the extraction of this information by reconstructing a single image, at
a high resolution than is present in any of the individual images. This is
particularly useful in forensic imaging, where the extraction of minute details
in an image can help to solve a crime.
A scattering transform defines a signal representation which is invariant to
translations and Lipschitz continuous relatively to deformations. It is
implemented with a non-linear convolution network that iterates over wavelet
and modulus operators. Lipschitz continuity locally linearizes deformations.
Complex classes of signals and textures can be modeled with low-dimensional
affine spaces, computed with a PCA in the scattering domain. Classification is
performed with a penalized model selection.
This paper addresses the problem of distributed coding of images whose
correlation is driven by the motion of objects or positioning of the vision
sensors. It concentrates on the problem where images are encoded with
compressed linear measurements. We propose a geometry-based correlation model
in order to describe the common information in pairs of images. We assume that
the constitutive components of natural images can be captured by visual
features that undergo local transformations (e.g., translation) in different
images.
Biometric technologies are the foundation of personal identification systems.
It provides an identification based on a unique feature possessed by the
individual. This paper provides a walkthrough for image acquisition,
segmentation, normalization, feature extraction and matching based on the Human
Iris imaging. A Canny Edge Detection scheme and a Circular Hough Transform, is
used to detect the iris boundaries in the eye's digital image. The extracted
IRIS region was normalized by using Image Registration technique.
An image articulation manifold (IAM) is the collection of images formed when
an object is articulated in front of a camera. IAMs arise in a variety of image
processing and computer vision applications, where they provide a natural
low-dimensional embedding of the collection of high-dimensional images.
3D motion tracking is a critical task in many computer vision applications.
Existing 3D motion tracking techniques require either a great amount of
knowledge on the target object or specific hardware. These requirements
discourage the wide spread of commercial applications based on 3D motion
tracking. 3D motion tracking systems that require no knowledge on the target
object and run on a single low-budget camera require estimations of the object
projection features (namely, area and position).
Texture is an important spatial feature which plays a vital role in content
based image retrieval. The enormous growth of the internet and the wide use of
digital data have increased the need for both efficient image database creation
and retrieval procedure. This paper describes a new approach for texture
classification by combining statistical texture features of Local Binary
Pattern and Texture spectrum.
The recent technological progress in acquisition, modeling and processing of
3D data leads to the proliferation of a large number of 3D objects databases.
Consequently, the techniques used for content based 3D retrieval has become
necessary. In this paper, we introduce a new method for 3D objects recognition
and retrieval by using a set of binary images CLI (Characteristic level
images). We propose a 3D indexing and search approach based on the similarity
between characteristic level images using Hu moments for it indexing.
Extending the Liouville-Caputo definition of a fractional derivative to a
nonlocal covariant generalization of arbitrary bound operators acting on
multidimensional Riemannian spaces an appropriate approach for the 3D shape
recovery of aperture afflicted 2D slide sequences is proposed. We demonstrate,
that the step from a local to a nonlocal algorithm yields an order of magnitude
in accuracy and by using the specific fractional approach an additional factor
2 in accuracy of the derived results.
This report concerns the use of techniques for sparse signal representation
and sparse error correction for automatic face recognition. Much of the recent
interest in these techniques comes from the paper "Robust Face Recognition via
Sparse Representation" by Wright et al. (2009), which showed how, under certain
technical conditions, one could cast the face recognition problem as one of
seeking a sparse representation of a given input face image in terms of a
"dictionary" of training images and images of individual pixels.
Spectral unmixing is an important tool in hyperspectral data analysis for
estimating endmembers and abundance fractions in a mixed pixel. This paper
examines the applicability of a recently developed algorithm called graph
regularized nonnegative matrix factorization (GNMF) for this aim. The proposed
approach exploits the intrinsic geometrical structure of the data besides
considering positivity and full additivity constraints. Simulated data based on
the measured spectral signatures, is used for evaluating the proposed
algorithm.
Recognition systems are commonly designed to authenticate users at the access
control levels of a system. A number of voice recognition methods have been
developed using a pitch estimation process which are very vulnerable in low
Signal to Noise Ratio (SNR) environments thus, these programs fail to provide
the desired level of accuracy and robustness. Also, most text independent
speaker recognition programs are incapable of coping with unauthorized attempts
to gain access by tampering with the samples or reference database.
Models including two $L^1$ -norm terms have been widely used in image
restoration. In this paper we first propose the alternating direction method of
multipliers (ADMM) to solve this class of models. Based on ADMM, we then
propose the proximal point method (PPM), which is more efficient than ADMM.
Following the operator theory, we also give the convergence analysis of the
proposed methods. Furthermore, we use the proposed methods to solve a class of
hybrid models combining the ROF model with the LLT model.
Construction of a scale space with a convolution filter has been studied
extensively in the past. It has been proven that the only convolution kernel
that satisfies the scale space requirements is a Gaussian type. In this paper,
we consider a matrix of convolution filters introduced in [1] as a building
kernel for a scale space, and shows that we can construct a non-Gaussian scale
space with a $2\times 2$ matrix of filters. The paper derives sufficient
conditions for the matrix of filters for being a scale space kernel, and
present some numerical demonstrations.
We revisit the additive model learning literature and adapt a penalized
spline formulation due to Eilers and Marx, to train additive classifiers
efficiently. We also propose two new embeddings based two classes of orthogonal
basis with orthogonal derivatives, which can also be used to efficiently learn
additive classifiers. This paper follows the popular theme in the current
literature where kernel SVMs are learned much more efficiently using a
approximate embedding and linear machine.
A fundamental operation in many vision tasks, including motion understanding,
stereopsis, visual odometry, or invariant recognition, is establishing
correspondences between images or between images and data from other
modalities. We present an analysis of the role that multiplicative interactions
play in learning such correspondences, and we show how learning and inferring
relationships between images can be viewed as detecting rotations in the
eigenspaces shared among a set of orthogonal matrices.
In this paper, we address the problem of discriminative dictionary learning
(DDL), where sparse linear representation and classification are combined in a
probabilistic framework. As such, a single discriminative dictionary and linear
binary classifiers are learned jointly. By encoding sparse representation and
discriminative classification models in a MAP setting, we propose a general
optimization framework that allows for a data-driven tradeoff between faithful
representation and accurate classification.
Most image labeling problems such as segmentation and image reconstruction
are fundamentally ill-posed and suffer from ambiguities and noise. Higher order
image priors encode high level structural dependencies between pixels and are
key to overcoming these problems. However, these priors in general lead to
computationally intractable models. This paper addresses the problem of
discovering compact representations of higher order priors which allow
efficient inference.
Scene understanding remains a significant challenge in the computer vision
community. The visual psychophysics literature has demonstrated the importance
of interdependence among parts of the scene. Yet, the majority of methods in
computer vision remain local. Pictorial structures have arisen as a fundamental
parts-based model for some vision problems, such as articulated object
detection. However, the form of classical pictorial structures limits their
applicability for global problems, such as semantic pixel labeling.
Object parsing and segmentation from point clouds are challenging tasks
because the relevant data is available only as thin structures along object
boundaries or other object features and is corrupted by large amounts of noise.
One way to handle this kind of data is by employing shape models that can
accurately follow the object boundaries.
In this paper, we propose a novel lower dimensional representation of a shape
sequence. The proposed dimension reduction is invertible and computationally
more efficient in comparison to other related works. Theoretically, the
differential geometry tools such as moving frame and parallel transportation
are successfully adapted into the dimension reduction problem of high
dimensional curves.
We introduce a novel tracking technique which uses dynamic confidence-based
fusion of two different information sources for robust and efficient tracking
of visual objects. Mean-shift tracking is a popular and well known method used
in object tracking problems. Originally, the algorithm uses a similarity
measure which is optimized by shifting a search area to the center of a
generated weight image to track objects. Recent improvements on the original
mean-shift algorithm involves using a classifier that differentiates the object
from its surroundings.
It was recently demonstrated in [4][arxiv:1105.4204] that the non-linear
bilateral filter \cite{Tomasi} can be efficiently implemented using an O(1) or
constant-time algorithm. At the heart of this algorithm was the idea of
approximating the Gaussian range kernel of the bilateral filter using
trigonometric functions. In this letter, we explain how the idea in [4] can be
extended to few other linear and non-linear filters [18,21,2]. While some of
these filters have received a lot of attention in recent years, they are known
to be computationally intensive.
We study linear models under heavy-tailed priors from a probabilistic
viewpoint. Instead of computing a single sparse most probable (MAP) solution as
in standard compressed sensing, the focus in the Bayesian framework shifts
towards capturing the full posterior distribution on the latent variables,
which allows quantifying the estimation uncertainty and learning model
parameters using maximum likelihood. The exact posterior distribution under the
sparse linear model is intractable and we concentrate on a number of
alternative variational Bayesian techniques to approximate it.
The IHS sharpening technique is one of the most commonly used techniques for
sharpening. Different transformations have been developed to transfer a color
image from the RGB space to the IHS space. Through literature, it appears that,
various scientists proposed alternative IHS transformations and many papers
have reported good results whereas others show bad ones as will as not those
obtained which the formula of IHS transformation were used. In addition to
that, many papers show different formulas of transformation matrix such as IHS
transformation.
This paper shows that the k-means quantization of a signal can be interpreted
both as a crisp indicator function and as a fuzzy membership assignment
describing fuzzy clusters and fuzzy boundaries. Combined crisp and fuzzy
indicator functions are defined here as natural generalizations of the ordinary
crisp and fuzzy indicator functions, respectively. An application to iris
segmentation is presented together with a demo program.
A new approach in iris recognition based on Circular Fuzzy Iris Segmentation
(CFIS) and Gabor Analytic Iris Texture Binary Encoder (GAITBE) is proposed and
tested here. CFIS procedure is designed to guarantee that similar iris segments
will be obtained for similar eye images, despite the fact that the degree of
occlusion may vary from one image to another. Its result is a circular iris
ring (concentric with the pupil) which approximates the actual iris. GAITBE
proves better encoding of statistical independence between the iris codes
extracted from different irides using Hilbert Transform.
We analyze and improve low rank representation (LRR), the state-of-the-art
algorithm for subspace segmentation of data. We prove that for the noiseless
case, the optimization model of LRR has a unique solution, which is the shape
interaction matrix (SIM) of the data matrix. So in essence LRR is equivalent to
factorization methods. We also prove that the minimum value of the optimization
model of LRR is equal to the rank of the data matrix. For the noisy case, we
show that LRR can be approximated as a factorization method that combines noise
removal by column sparse robust PCA.
This paper presents multi-font/multi-size Kannada numerals and vowels
recognition based on spatial features. Directional spatial features viz stroke
density, stroke length and the number of stokes in an image are employed as
potential features to characterize the printed Kannada numerals and vowels.
Based on these features 1100 numerals and 1400 vowels are classified with
Multi-class Support Vector Machines (SVM). The proposed system achieves the
recognition accuracy as 98.45% and 90.64% for numerals and vowels respectively.
We propose a traffic congestion estimation system based on unsupervised
on-line learning algorithm. The system does not rely on background extraction
or motion detection. It extracts local features inside detection regions of
variable size which are drawn on lanes in advance. The extracted features are
then clustered into two classes using K-means and Gaussian Mixture Models(GMM).
A Bayes classifier is used to detect vehicles according to the previous cluster
information which keeps updated whenever system is running by on-line EM
algorithm.
Using a vehicle toy as a moving object an automatic road lighting system
(ARLS) model is constructed. A video camera with 25 fps is used to capture the
vehicle toy motion as it moves in the test segment of the road. Captured images
are then processed to calculate vehicle toy speed. This information of the
speed together with position of vehicle toy is then used to switch on and off
the lighting system along the path that passes by the vehicle toy.
An algorithm for pose and motion estimation using corresponding features in
images and a digital terrain map is proposed. Using a Digital Terrain (or
Digital Elevation) Map (DTM/DEM) as a global reference enables recovering the
absolute position and orientation of the camera. In order to do this, the DTM
is used to formulate a constraint between corresponding features in two
consecutive frames. The utilization of data is shown to improve the robustness
and accuracy of the inertial navigation algorithm.
In this paper, we present a technique by which high-intensity feature vectors
extracted from the Gabor wavelet transformation of frontal face images, is
combined together with Independent Component Analysis (ICA) for enhanced face
recognition. Firstly, the high-intensity feature vectors are automatically
extracted using the local characteristics of each individual face from the
Gabor transformed images. Then ICA is applied on these locally extracted
high-intensity feature vectors of the facial images to obtain the independent
high intensity feature (IHIF) vectors.
This paper demonstrates two different fusion techniques at two different
levels of a human face recognition process. The first one is called data fusion
at lower level and the second one is the decision fusion towards the end of the
recognition process. At first a data fusion is applied on visual and
corresponding thermal images to generate fused image. Data fusion is
implemented in the wavelet domain after decomposing the images through
Daubechies wavelet coefficients (db2). During the data fusion maximum of
approximate and other three details coefficients are merged together.
This paper presents a comparative study of two different methods, which are
based on fusion and polar transformation of visual and thermal images. Here,
investigation is done to handle the challenges of face recognition, which
include pose variations, changes in facial expression, partial occlusions,
variations in illumination, rotation through different angles, change in scale
etc. To overcome these obstacles we have implemented and thoroughly examined
two different fusion techniques through rigorous experimentation.
This papers introduces a new family of iris encoders which use 2-dimensional
Haar Wavelet Transform for noise attenuation, and Hilbert Transform to encode
the iris texture. In order to prove the usefulness of the newly proposed iris
encoding approach, the recognition results obtained by using these new encoders
are compared to those obtained using the classical Log- Gabor iris encoder.
Twelve tests involving single/multienrollment and conducted on Bath Iris Image
Database are presented here.
It is well-known that spatial averaging can be realized (in space or
frequency domain) using algorithms whose complexity does not depend on the size
or shape of the filter. These fast algorithms are generally referred to as
constant-time or O(1) algorithms in the image processing literature. Along with
the spatial filter, the edge-preserving bilateral filter [bilateralFilter]
involves an additional range kernel. This is used to restrict the averaging to
those neighborhood pixels whose intensity are similar or close to that of the
pixel of interest.
Cohomology and cohomology ring of three-dimensional (3D) objects are
topological invariants that characterize holes and their relations. Cohomology
ring has been traditionally computed on simplicial complexes. Nevertheless,
cubical complexes deal directly with the voxels in 3D images, no additional
triangulation is necessary, facilitating efficient algorithms for the
computation of topological invariants in the image context. In this paper, we
present formulas to directly compute the cohomology ring of 3D cubical
complexes without making use of any additional triangulation.
Identity verification is an increasingly important process in our daily
lives, and biometric recognition is a natural solution to the authentication
problem.
One of the most important research directions in the field of biometrics is
the characterization of novel biometric traits that can be used in conjunction
with other traits, to limit their shortcomings or to enhance their performance.
Structural pattern recognition describes and classifies data based on the
relationships of features and parts. Topological invariants, like the Euler
number, characterize the structure of objects of any dimension. Cohomology can
provide more refined algebraic invariants to a topological space than does
homology. It assigns `quantities' to the chains used in homology to
characterize holes of any dimension. Graph pyramids can be used to describe
subdivisions of the same object at multiple levels of detail.
In this paper we investigate a technique to find out vocal source based
features from the LP residual of speech signal for automatic speaker
identification. Autocorrelation with some specific lag is computed for the
residual signal to derive these features. Compared to traditional features like
MFCC, PLPCC which represent vocal tract information, these features represent
complementary vocal cord information. Our experiment in fusing these two
sources of information in representing speaker characteristics yield better
speaker identification accuracy.
Statistical dependencies among wavelet coefficients are commonly represented
by graphical models such as hidden Markov trees(HMTs). However, in linear
inverse problems such as deconvolution, tomography, and compressed sensing, the
presence of a sensing or observation matrix produces a linear mixing of the
simple Markovian dependency structure. This leads to reconstruction problems
that are non-convex optimizations. Past work has dealt with this issue by
resorting to greedy or suboptimal iterative reconstruction methods.
The main goal of the GEOMIR2K9 project is to create a software program that
is able to find similar scenic images clustered by geographical location and
sorted by similarity based only on their visual content. The user should be
able to input a query image, based on this given query image the program should
find relevant visual content and present this to the user in a meaningful way.
Technically the goal for the GEOMIR2K9 project is twofold.
Template matching is one of the most prevalent pattern recognition methods
worldwide. It has found uses in most visual concept detection fields. In this
work, we investigate methods for improving template matching by adjusting the
weights of different regions of the template. We compare several weight maps
and test the methods using the FERET face test set in the context of human eye
detection.
Design of a fuzzy rule based classifier is proposed. The performance of the
classifier for multispectral satellite image classification is improved using
Dempster- Shafer theory of evidence that exploits information of the
neighboring pixels. The classifiers are tested rigorously with two known images
and their performance are found to be better than the results available in the
literature. We also demonstrate the improvement of performance while using D-S
theory along with fuzzy rule based classifiers over the basic fuzzy rule based
classifiers for all the test cases.
A new method is proposed to get image features' geometric information. Using
Gaussian as an input signal, a theoretical optimal solution to calculate
feature's affine shape is proposed. Based on analytic result of a feature
model, the method is different from conventional iterative approaches. From the
model, feature's parameters such as position, orientation, background
luminance, contrast, area and aspect ratio can be extracted. Tested with
synthesized and benchmark data, the method achieves or outperforms existing
approaches in term of accuracy, speed and stability.
The fundamental matrix and trifocal tensor are convenient algebraic
representations of the epipolar geometry of two and three view configurations,
respectively. The estimation of these entities is central to most
reconstruction algorithms, and a solid understanding of their properties and
constraints is therefore very important. The fundamental matrix has 1 internal
constraint which is well understood, whereas the trifocal tensor has 8
independent algebraic constraints.
Discontinuity preserving smoothing is a fundamentally important procedure
that is useful in a wide variety of image processing contexts. It is directly
useful for noise reduction, and frequently used as an intermediate step in
higher level algorithms. For example, it can be particularly useful in edge
detection and segmentation. Three well known algorithms for discontinuity
preserving smoothing are nonlinear anisotropic diffusion, bilateral filtering,
and mean shift filtering.
Dual energy computerized tomography has gained great interest because of its
ability to characterize the chemical composition of a material rather than
simply providing relative attenuation images as in conventional tomography.
The most common primary brain tumors are gliomas, evolving from the cerebral
supportive cells. For clinical follow-up, the evaluation of the preoperative
tumor volume is essential. Volumetric assessment of tumor volume with manual
segmentation of its outlines is a time-consuming process that can be overcome
with the help of computerized segmentation methods. In this contribution, two
methods for World Health Organization (WHO) grade IV glioma segmentation in the
human brain are compared using magnetic resonance imaging (MRI) patient data
from the clinical routine.
This short article presents a class of projection-based solution algorithms
to the problem considered in the pioneering work on compressed sensing -
perfect reconstruction of a phantom image from 22 radial lines in the frequency
domain. Under the framework of projection-based image reconstruction, we will
show experimentally that several old and new tools of nonlinear filtering
(including Perona-Malik diffusion, nonlinear diffusion, Translation-Invariant
thresholding and SA-DCT thresholding) all lead to perfect reconstruction of the
phantom image.
Diffusion Tensor Imaging (DTI) provides the possibility of estimating the
location and course of eloquent structures in the human brain. Knowledge about
this is of high importance for preoperative planning of neurosurgical
interventions and for intraoperative guidance by neuronavigation in order to
minimize postoperative neurological deficits. Therefore, the segmentation of
these structures as closed, three-dimensional object is necessary.
A novel framework of compressed sensing, namely statistical compressed
sensing (SCS), that aims at efficiently sampling a collection of signals that
follow a statistical distribution, and achieving accurate reconstruction on
average, is introduced.
A geometric model of sparse signal representations is introduced for classes
of signals. It is computed by optimizing co-occurrence groups with a maximum
likelihood estimate calculated with a Bernoulli mixture model. Applications to
face image compression and MNIST digit classification illustrate the
applicability of this model.
Finding a match between partially available deformable shapes is a
challenging problem with numerous applications. The problem is usually
approached by computing local descriptors on a pair of shapes and then
establishing a point-wise correspondence between the two. In this paper, we
introduce an alternative correspondence-less approach to matching fragments to
an entire shape undergoing a non-rigid deformation. We use diffusion geometric
descriptors and optimize over the integration domains on which the integral
descriptors of the two parts match.
In this paper, we explore the use of the diffusion geometry framework for the
fusion of geometric and photometric information in local and global shape
descriptors. Our construction is based on the definition of a diffusion process
on the shape manifold embedded into a high-dimensional space where the
embedding coordinates represent the photometric information. Experimental
results show that such data fusion is useful in coping with different
challenges of shape analysis where pure geometric and pure photometric methods
fail.
The efficient repair of cellular DNA is essential for the maintenance and
inheritance of genomic information. In order to cope with the high frequency of
spontaneous and induced DNA damage, a multitude of repair mechanisms have
evolved. These are enabled by a wide range of protein factors specifically
recognizing different types of lesions and finally restoring the normal DNA
sequence. This work focuses on the repair factor XPC (xeroderma pigmentosum
complementation group C), which identifies bulky DNA lesions and initiates
their removal via the nucleotide excision repair pathway.
The past decade has seen the growing popularity of Bag of Features (BoF)
approaches to many computer vision tasks, including image classification, video
search, robot localization, and texture recognition. Part of the appeal is
simplicity. BoF methods are based on orderless collections of quantized local
image descriptors; they discard spatial information and are therefore
conceptually and computationally simpler than many alternative methods.
Kernel-based machine learning algorithms are based on mapping data from the
original input feature space to a kernel feature space of higher dimensionality
to solve a linear problem in that space. Over the last decade, kernel based
classification and regression approaches such as support vector machines have
widely been used in remote sensing as well as in various civil engineering
applications.
English Character Recognition (CR) has been extensively studied in the last
half century and progressed to a level, sufficient to produce technology driven
applications. But same is not the case for Indian languages which are
complicated in terms of structure and computations. Rapidly growing
computational power may enable the implementation of Indic CR methodologies.
Digital document processing is gaining popularity for application to office and
library automation, bank and postal services, publishing houses and
communication technology.
Various applications of car plate recognition systems have been developed
using various kinds of methods and techniques by researchers all over the
world. The applications developed were only suitable for specific country due
to its standard specification endorsed by the transport department of
particular countries. The Road Transport Department of Malaysia also has
endorsed a specification for car plates that includes the font and size of
characters that must be followed by car owners. However, there are cases where
this specification is not followed.
This chapter presents a framework for detecting fake regions by using various
methods including watermarking technique and blind approaches. In particular,
we describe current categories on blind approaches which can be divided into
five: pixel-based techniques, format-based techniques, camera-based techniques,
physically-based techniques and geometric-based techniques. Then we take a
second look on the geometric-based techniques and further categorize them in
detail. In the following section, the state-of-the-art methods involved in the
geometric technique are elaborated.
We present a method for segmenting an arbitrary number of moving objects in
image sequences using the geometry of 6 points in 2D to infer motion
consistency. The method has been evaluated on the Hopkins 155 database and
surpasses current state-of-the-art methods such as SSC, both in terms of
overall performance on two and three motions but also in terms of maximum
errors. The method works by ?nding initial clusters in the spatial domain, and
then classifying each remaining point as belonging to the cluster that
minimizes a motion consistency score.
Rank-based analysis is a basic approach for many real world applications.
Recently, with the developments of compressive sensing, an interesting problem
was proposed to recover a lowrank matrix from sparse noise. In this paper, we
will address this problem and propose a low rank matrix recovery algorithm
based on sparsity tacking. The core of the proposed Sparsity Tracking
Recovery(STR) is a heuristic kernel, which is introduced to penalize the noise
distribution. With the heuristic method, the sparse entries in the noise matrix
can be accurately tracked and discouraged to be zero.
We propose a method for learning sparse representations of depth (disparity)
maps, which is able to cope with noise and unreliable depth measurements. The
proposed algorithm relaxes the usual assumption of the stationary noise model
in sparse coding and enables learning from data corrupted with spatially
varying noise or uncertainty. Different noise statistics at each pixel location
are inferred from the data, and the learning rule is adapted with respect to
the noise level.
The goal of this paper is the development of a novel approach for the problem
of Noise Removal, based on the theory of Reproducing Kernels Hilbert Spaces
(RKHS). The problem is cast as an optimization task in a RKHS, by taking
advantage of the celebrated semiparametric Representer Theorem. Examples verify
that in the presence of gaussian noise the proposed method performs relatively
well compared to wavelet based technics and outperforms them significantly in
the presence of impulse or mixed noise.
In this paper we propose a new wavelet transform applicable to functions
defined on graphs, high dimensional data and networks. The proposed method
generalizes the Haar-like transform proposed in \cite{gavish2010mwot}, and it
is similarly defined via a hierarchical tree, which is assumed to capture the
geometry and structure of the input data. It is applied to the data using a
multiscale filtering and decimation scheme, which can employ different wavelet
filters. We propose a tree construction method which results in efficient
representation of the input function in the transform domain.
In this paper a fuzzy clustering model for fuzzy data with outliers is
proposed. The model is based on Wasserstein distance between interval valued
data which is generalized to fuzzy data. In addition, Keller's approach is used
to identify outliers and reduce their influences. We have also defined a
transformation to change our distance to the Euclidean distance. With the help
of this approach, the problem of fuzzy clustering of fuzzy data is reduced to
fuzzy clustering of crisp data.
The Peirce quincuncial projection is a mapping of the surface of a sphere to
a square. It is a conformal mapping except for four points on the equator.
These points of non-conformality cause significant artifacts in photographic
applications. In this paper, we propose an algorithm and a user-interface to
mitigate these artifacts. We then promote the Peirce quincuncial projection as
a viable alternative to the stereographic projection in photographic
applications.
Classification is one of the most important tasks of machine learning.
Although the most well studied model is the two-class problem, in many
scenarios there is the opportunity to label critical items for manual revision,
instead of trying to automatically classify every item. In this paper we adapt
a paradigm initially proposed for the classification of ordinal data to address
the classification problem with reject option.
Contour tracking in adverse environments is a challenging problem due to
cluttered background, illumination variation, occlusion, and noise, among
others. This paper presents a robust contour tracking method by contributing to
some of the key issues involved, including (a) a region functional formulation
and its optimization; (b) design of a robust and effective feature; and (c)
development of an integrated tracking algorithm.
The problem of identifying the 3D pose of a known object from a given 2D
image has important applications in Computer Vision ranging from robotic vision
to image analysis. Our proposed method of registering a 3D model of a known
object on a given 2D photo of the object has numerous advantages over existing
methods: It does neither require prior training nor learning, nor knowledge of
the camera parameters, nor explicit point correspondences or matching features
between image and model.
Calibration in a multi camera network has widely been studied for over
several years starting from the earlier days of photogrammetry. Many authors
have presented several calibration algorithms with their relative advantages
and disadvantages. In a stereovision system, multiple view reconstruction is a
challenging task. However, the total computational procedure in detail has not
been presented before.
Color quantization is an important operation with numerous applications in
graphics and image processing. Most quantization methods are essentially based
on data clustering algorithms. However, despite its popularity as a general
purpose clustering algorithm, k-means has not received much respect in the
color quantization literature because of its high computational requirements
and sensitivity to initialization. In this paper, a fast color quantization
method based on k-means is presented.
A new framework of compressive sensing (CS), namely statistical compressive
sensing (SCS), that aims at efficiently sampling a collection of signals that
follow a statistical distribution and achieving accurate reconstruction on
average, is introduced.
Adaptive sparse coding methods learn a possibly overcomplete set of basis
functions, such that natural image patches can be reconstructed by linearly
combining a small subset of these bases. The applicability of these methods to
visual object recognition tasks has been limited because of the prohibitive
cost of the optimization algorithms required to compute the sparse
representation. In this work we propose a simple and efficient algorithm to
learn basis functions.
In this paper we present a simple and fast geometric method for modeling data
by a union of affine sets. The method begins by forming a collection of local
best fit affine subspaces. The correct sizes of the local neighborhoods are
determined automatically by the Jones' $\beta_2$ numbers; we prove under
certain geometric conditions that good local neighborhoods exist and are found
by our method. The collection is further processed by a greedy selection
procedure or a spectral method to generate the final model.
An inverse iterative algorithm for microwave imaging based on moment method
solution is presented here. The iterative scheme has been developed on
constrained optimization technique and is certain to converge. Different mesh
size for the model has been used here to overcome the Inverse Crime. The
synthetic data at the receivers is contaminated with different percentage of
noise. The ill-posedness of the problem is solved by Levenberg-Marquardt
method. The algorithm is applied to synthetic data and the reconstructed image
is then further enhanced through the Image enhancement technique
We present an exact method of greatly speeding up belief propagation (BP) for
a wide variety of potential functions in pairwise MRFs and other graphical
models. Specifically, our technique applies whenever the pairwise potentials
have been {\em truncated} to a constant value for most pairs of states, as is
commonly done in MRF models with robust potentials (such as stereo) that impose
an upper bound on the penalty assigned to discontinuities; for each of the $M$
possible states in one node, only a smaller number $m$ of compatible states in
a neighboring node are assigned milder penalties.
There is an abundant literature on face detection due to its important role
in many vision applications. Since Viola and Jones proposed the first real-time
AdaBoost based face detector, Haar-like features have been adopted as the
method of choice for frontal face detection. In this work, we show that simple
features other than Haar-like features can also be applied for training an
effective face detector.
This paper introduces a novel method for human face detection with its
orientation by using wavelet, principle component analysis (PCA) and redial
basis networks. The input image is analyzed by two-dimensional wavelet and a
two-dimensional stationary wavelet. The common goals concern are the image
clearance and simplification, which are parts of de-noising or compression. We
applied an effective procedure to reduce the dimension of the input vectors
using PCA.
This paper proposes a framework for modeling instantaneous changes natural
scenes in real time using Lagrangian Particle Framework and a fluid-particle
grid approach. This research can be divided into 3 distinct sections: the first
one discusses a multi-camera rig that can measure ego-motion accurately up to
88%, how this device becomes the backbone of our framework, and some
improvements devised to optimize a know framework for depth maps and 3d
structure estimation from a single still image called make3d.
Many algorithms for approximate nearest neighbor search in high-dimensional
spaces partition the data into clusters. At query time, in order to avoid
exhaustive search, an index selects the few (or a single) clusters nearest to
the query point. Clusters are often produced by the well-known $k$-means
approach since it has several desirable properties. On the downside, it tends
to produce clusters having quite different cardinalities. Imbalanced clusters
negatively impact both the variance and the expectation of query response
times.
This paper deals with an improvement of vertex based nonlinear diffusion for
mesh denoising. This method directly filters the position of the vertices using
Laplace, reduced centered Gaussian and Rayleigh probability density functions
as diffusivities. The use of these PDFs improves the performance of a
vertex-based diffusion method which are adapted to the underlying mesh
structure. We also compare the proposed method to other mesh denoising methods
such as Laplacian flow, mean, median, min and the adaptive MMSE filtering. To
evaluate these methods of filtering, we use two error metrics.
Real-time object detection is one of the core problems in computer vision.
The cascade boosting framework proposed by Viola and Jones has become the
standard for this problem. In this framework, the learning goal for each node
is asymmetric, which is required to achieve a high detection rate and a
moderate false positive rate. We develop new boosting algorithms to address
this asymmetric learning problem. We show that our methods explicitly optimize
asymmetric loss objectives in a totally corrective fashion.
Image hashing is the process of associating a short vector of bits to an
image. The resulting summaries are useful in many applications including image
indexing, image authentication and pattern recognition. These hashes need to be
invariant under transformations of the image that result in similar visual
content, but should drastically differ for conceptually distinct contents. This
paper proposes an image hashing method that is invariant under rotation,
scaling and translation of the image.
This paper addresses the problem of infants' cry fundamental frequency
estimation. The fundamental frequency is estimated using a modified simple
inverse filtering tracking (SIFT) algorithm. The performance of the modified
SIFT is studied using a real database of infants' cry.
In this paper, Deterministic Cellular Automata (DCA) based video shot
classification and retrieval is proposed. The deterministic 2D Cellular
automata model captures the human facial expressions, both spontaneous and
posed. The determinism stems from the fact that the facial muscle actions are
standardized by the encodings of Facial Action Coding System (FACS) and Action
Units (AUs). Based on these encodings, we generate the set of evolutionary
update rules of the DCA for each facial expression.
Background: Dermoscopy is one of the major imaging modalities used in the
diagnosis of melanoma and other pigmented skin lesions. Due to the difficulty
and subjectivity of human interpretation, automated analysis of dermoscopy
images has become an important research area. Border detection is often the
first step in this analysis. Methods: In this article, we present an
approximate lesion localization method that serves as a preprocessing step for
detecting borders in dermoscopy images. In this method, first the black frame
around the image is removed using an iterative algorithm.
Human ovarian reserve is defined by the population of nongrowing follicles
(NGFs) in the ovary. Direct estimation of ovarian reserve involves the
identification of NGFs in prepared ovarian tissue. Previous studies involving
human tissue have used hematoxylin and eosin (HE) stain, with NGF populations
estimated by human examination either of tissue under a microscope, or of
images taken of this tissue. In this study we replaced HE with proliferating
cell nuclear antigen (PCNA), and automated the identification and enumeration
of NGFs that appear in the resulting microscopic images.
Cascade classifiers are widely used in real-time object detection. Different
from conventional classifiers that are designed for a low overall
classification error rate, a classifier in each node of the cascade is required
to achieve an extremely high detection rate and moderate false positive rate.
Although there are a few reported methods addressing this requirement in the
context of object detection, there is no a principled feature selection method
that explicitly takes into account this asymmetric node learning objective. We
provide such an algorithm here.
The importance of manifolds and Riemannian geometry in mathematics is
spreading to applied fields in which the need to model non-linear structure has
spurred wide-spread interest in geometry. The transfer of interest has created
demand for methods for computing classical constructs of geometry on manifolds
occurring in practical applications. This paper develops initial value problems
for the computation of the differential of the exponential map and Jacobi
fields on parametrically and implicitly represented manifolds.