Determining optimal number of clusters in a dataset is a challenging task.
Though some methods are available, there is no algorithm that produces unique
clustering solution. The paper proposes an Automatic Merging for Single Optimal
Solution (AMSOS) which aims to generate unique and nearly optimal clusters for
the given datasets automatically. The AMSOS is iteratively merges the closest
clusters automatically by validating with cluster validity measure to find
single and nearly optimal clusters for the given data set.
Selection of initial seeds greatly affects the quality of the clusters and in
k-means type algorithms. Most of the seed selection methods result different
results in different independent runs. We propose a single, optimal, outlier
insensitive seed selection algorithm for k-means type algorithms as extension
to k-means++. The experimental results on synthetic, real and on microarray
data sets demonstrated that effectiveness of the new algorithm in producing the
clustering results
The appearance of microcalcifications in mammograms is one of the early signs
of breast cancer. So, early detection of microcalcification clusters (MCCs) in
mammograms can be helpful for cancer diagnosis and better treatment of breast
cancer. In this paper a computer method has been proposed to support
radiologists in detection MCCs in digital mammography. First, in order to
facilitate and improve the detection step, mammogram images have been enhanced
with wavelet transformation and morphology operation. Then for segmentation of
suspicious MCCs, two methods have been investigated.
A framework for adaptive and non-adaptive statistical compressive sensing is
developed, where a statistical model replaces the standard sparsity model of
classical compressive sensing. We propose within this framework optimal
task-specific sensing protocols specifically and jointly designed for
classification and reconstruction. A two-step adaptive sensing paradigm is
developed, where online sensing is applied to detect the signal class in the
first step, followed by a reconstruction step adapted to the detected class and
the observed samples.
Compressive sensing (CS) is a new approach for the acquisition and recovery
of sparse signals and images that enables sampling rates significantly below
the classical Nyquist rate. Despite significant progress in the theory and
methods of CS, little headway has been made in compressive video acquisition
and recovery. Video CS is complicated by the ephemeral nature of dynamic
events, which makes direct extensions of standard CS imaging architectures and
signal models difficult.
In this study we investigate the fast image filtering algorithm based on
Intro sort algorithm and fast noise reduction of infrared images. Main feature
of the proposed approach is that no prior knowledge of noise required. It is
developed based on Stefan- Boltzmann law and the Fourier law. We also
investigate the fast noise reduction approach that has advantage of less
computation load. In addition, it can retain edges, details, text information
even if the size of the window increases.
Recent results in Compressive Sensing have shown that, under certain
conditions, the solution to an underdetermined system of linear equations with
sparsity-based regularization can be accurately recovered by solving convex
relaxations of the original problem. In this work, we present a novel
primal-dual analysis on a class of sparsity minimization problems.
We present an algorithm using transformation groups and their irreducible
representations to generate an orthogonal basis for a signal in the vector
space of the signal. It is shown that multiresolution analysis can be done with
amplitudes using a transformation group. G-lets is thus not a single transform,
but a group of linear transformations related by group theory. The algorithm
also specifies that a multiresolution and multiscale analysis for each
resolution is possible in terms of frequencies.
We study the task of cleaning scanned text documents that are strongly
corrupted by dirt such as manual line strokes, spilled ink etc. We aim at
autonomously removing dirt from a single letter-size page based only on the
information the page contains. Our approach, therefore, has to learn character
representations without supervision and requires a mechanism to distinguish
learned representations from irregular patterns.
This paper presents a novel reaction-diffusion (RD) method for implicit
active contours, which is completely free of the costly re-initialization
procedure in level set evolution (LSE). A diffusion term is introduced into
LSE, resulting in a RD-LSE equation, to which a piecewise constant solution can
be derived. In order to have a stable numerical solution of the RD based LSE,
we propose a two-step splitting method (TSSM) to iteratively solve the RD-LSE
equation: first iterating the LSE equation, and then solving the diffusion
equation.
Crucial information barely visible to the human eye is often embedded in a
series of low-resolution images taken of the same scene. Super-resolution
enables the extraction of this information by reconstructing a single image, at
a high resolution than is present in any of the individual images. This is
particularly useful in forensic imaging, where the extraction of minute details
in an image can help to solve a crime.
A scattering transform defines a signal representation which is invariant to
translations and Lipschitz continuous relatively to deformations. It is
implemented with a non-linear convolution network that iterates over wavelet
and modulus operators. Lipschitz continuity locally linearizes deformations.
Complex classes of signals and textures can be modeled with low-dimensional
affine spaces, computed with a PCA in the scattering domain. Classification is
performed with a penalized model selection.
This paper addresses the problem of distributed coding of images whose
correlation is driven by the motion of objects or positioning of the vision
sensors. It concentrates on the problem where images are encoded with
compressed linear measurements. We propose a geometry-based correlation model
in order to describe the common information in pairs of images. We assume that
the constitutive components of natural images can be captured by visual
features that undergo local transformations (e.g., translation) in different
images.
Biometric technologies are the foundation of personal identification systems.
It provides an identification based on a unique feature possessed by the
individual. This paper provides a walkthrough for image acquisition,
segmentation, normalization, feature extraction and matching based on the Human
Iris imaging. A Canny Edge Detection scheme and a Circular Hough Transform, is
used to detect the iris boundaries in the eye's digital image. The extracted
IRIS region was normalized by using Image Registration technique.
An image articulation manifold (IAM) is the collection of images formed when
an object is articulated in front of a camera. IAMs arise in a variety of image
processing and computer vision applications, where they provide a natural
low-dimensional embedding of the collection of high-dimensional images.
3D motion tracking is a critical task in many computer vision applications.
Existing 3D motion tracking techniques require either a great amount of
knowledge on the target object or specific hardware. These requirements
discourage the wide spread of commercial applications based on 3D motion
tracking. 3D motion tracking systems that require no knowledge on the target
object and run on a single low-budget camera require estimations of the object
projection features (namely, area and position).
Texture is an important spatial feature which plays a vital role in content
based image retrieval. The enormous growth of the internet and the wide use of
digital data have increased the need for both efficient image database creation
and retrieval procedure. This paper describes a new approach for texture
classification by combining statistical texture features of Local Binary
Pattern and Texture spectrum.
The recent technological progress in acquisition, modeling and processing of
3D data leads to the proliferation of a large number of 3D objects databases.
Consequently, the techniques used for content based 3D retrieval has become
necessary. In this paper, we introduce a new method for 3D objects recognition
and retrieval by using a set of binary images CLI (Characteristic level
images). We propose a 3D indexing and search approach based on the similarity
between characteristic level images using Hu moments for it indexing.
Extending the Liouville-Caputo definition of a fractional derivative to a
nonlocal covariant generalization of arbitrary bound operators acting on
multidimensional Riemannian spaces an appropriate approach for the 3D shape
recovery of aperture afflicted 2D slide sequences is proposed. We demonstrate,
that the step from a local to a nonlocal algorithm yields an order of magnitude
in accuracy and by using the specific fractional approach an additional factor
2 in accuracy of the derived results.
This report concerns the use of techniques for sparse signal representation
and sparse error correction for automatic face recognition. Much of the recent
interest in these techniques comes from the paper "Robust Face Recognition via
Sparse Representation" by Wright et al. (2009), which showed how, under certain
technical conditions, one could cast the face recognition problem as one of
seeking a sparse representation of a given input face image in terms of a
"dictionary" of training images and images of individual pixels.
Spectral unmixing is an important tool in hyperspectral data analysis for
estimating endmembers and abundance fractions in a mixed pixel. This paper
examines the applicability of a recently developed algorithm called graph
regularized nonnegative matrix factorization (GNMF) for this aim. The proposed
approach exploits the intrinsic geometrical structure of the data besides
considering positivity and full additivity constraints. Simulated data based on
the measured spectral signatures, is used for evaluating the proposed
algorithm.
Recognition systems are commonly designed to authenticate users at the access
control levels of a system. A number of voice recognition methods have been
developed using a pitch estimation process which are very vulnerable in low
Signal to Noise Ratio (SNR) environments thus, these programs fail to provide
the desired level of accuracy and robustness. Also, most text independent
speaker recognition programs are incapable of coping with unauthorized attempts
to gain access by tampering with the samples or reference database.
Models including two $L^1$ -norm terms have been widely used in image
restoration. In this paper we first propose the alternating direction method of
multipliers (ADMM) to solve this class of models. Based on ADMM, we then
propose the proximal point method (PPM), which is more efficient than ADMM.
Following the operator theory, we also give the convergence analysis of the
proposed methods. Furthermore, we use the proposed methods to solve a class of
hybrid models combining the ROF model with the LLT model.
We revisit the additive model learning literature and adapt a penalized
spline formulation due to Eilers and Marx, to train additive classifiers
efficiently. We also propose two new embeddings based two classes of orthogonal
basis with orthogonal derivatives, which can also be used to efficiently learn
additive classifiers. This paper follows the popular theme in the current
literature where kernel SVMs are learned much more efficiently using a
approximate embedding and linear machine.
Construction of a scale space with a convolution filter has been studied
extensively in the past. It has been proven that the only convolution kernel
that satisfies the scale space requirements is a Gaussian type. In this paper,
we consider a matrix of convolution filters introduced in [1] as a building
kernel for a scale space, and shows that we can construct a non-Gaussian scale
space with a $2\times 2$ matrix of filters. The paper derives sufficient
conditions for the matrix of filters for being a scale space kernel, and
present some numerical demonstrations.
A fundamental operation in many vision tasks, including motion understanding,
stereopsis, visual odometry, or invariant recognition, is establishing
correspondences between images or between images and data from other
modalities. We present an analysis of the role that multiplicative interactions
play in learning such correspondences, and we show how learning and inferring
relationships between images can be viewed as detecting rotations in the
eigenspaces shared among a set of orthogonal matrices.
In this paper, we address the problem of discriminative dictionary learning
(DDL), where sparse linear representation and classification are combined in a
probabilistic framework. As such, a single discriminative dictionary and linear
binary classifiers are learned jointly. By encoding sparse representation and
discriminative classification models in a MAP setting, we propose a general
optimization framework that allows for a data-driven tradeoff between faithful
representation and accurate classification.
Most image labeling problems such as segmentation and image reconstruction
are fundamentally ill-posed and suffer from ambiguities and noise. Higher order
image priors encode high level structural dependencies between pixels and are
key to overcoming these problems. However, these priors in general lead to
computationally intractable models. This paper addresses the problem of
discovering compact representations of higher order priors which allow
efficient inference.
Scene understanding remains a significant challenge in the computer vision
community. The visual psychophysics literature has demonstrated the importance
of interdependence among parts of the scene. Yet, the majority of methods in
computer vision remain local. Pictorial structures have arisen as a fundamental
parts-based model for some vision problems, such as articulated object
detection. However, the form of classical pictorial structures limits their
applicability for global problems, such as semantic pixel labeling.
Object parsing and segmentation from point clouds are challenging tasks
because the relevant data is available only as thin structures along object
boundaries or other object features and is corrupted by large amounts of noise.
One way to handle this kind of data is by employing shape models that can
accurately follow the object boundaries.
In this paper, we propose a novel lower dimensional representation of a shape
sequence. The proposed dimension reduction is invertible and computationally
more efficient in comparison to other related works. Theoretically, the
differential geometry tools such as moving frame and parallel transportation
are successfully adapted into the dimension reduction problem of high
dimensional curves.
We introduce a novel tracking technique which uses dynamic confidence-based
fusion of two different information sources for robust and efficient tracking
of visual objects. Mean-shift tracking is a popular and well known method used
in object tracking problems. Originally, the algorithm uses a similarity
measure which is optimized by shifting a search area to the center of a
generated weight image to track objects. Recent improvements on the original
mean-shift algorithm involves using a classifier that differentiates the object
from its surroundings.
It was recently demonstrated in [4][arxiv:1105.4204] that the non-linear
bilateral filter \cite{Tomasi} can be efficiently implemented using an O(1) or
constant-time algorithm. At the heart of this algorithm was the idea of
approximating the Gaussian range kernel of the bilateral filter using
trigonometric functions. In this letter, we explain how the idea in [4] can be
extended to few other linear and non-linear filters [18,21,2]. While some of
these filters have received a lot of attention in recent years, they are known
to be computationally intensive.
We study linear models under heavy-tailed priors from a probabilistic
viewpoint. Instead of computing a single sparse most probable (MAP) solution as
in standard compressed sensing, the focus in the Bayesian framework shifts
towards capturing the full posterior distribution on the latent variables,
which allows quantifying the estimation uncertainty and learning model
parameters using maximum likelihood. The exact posterior distribution under the
sparse linear model is intractable and we concentrate on a number of
alternative variational Bayesian techniques to approximate it.
The IHS sharpening technique is one of the most commonly used techniques for
sharpening. Different transformations have been developed to transfer a color
image from the RGB space to the IHS space. Through literature, it appears that,
various scientists proposed alternative IHS transformations and many papers
have reported good results whereas others show bad ones as will as not those
obtained which the formula of IHS transformation were used. In addition to
that, many papers show different formulas of transformation matrix such as IHS
transformation.
A new approach in iris recognition based on Circular Fuzzy Iris Segmentation
(CFIS) and Gabor Analytic Iris Texture Binary Encoder (GAITBE) is proposed and
tested here. CFIS procedure is designed to guarantee that similar iris segments
will be obtained for similar eye images, despite the fact that the degree of
occlusion may vary from one image to another. Its result is a circular iris
ring (concentric with the pupil) which approximates the actual iris. GAITBE
proves better encoding of statistical independence between the iris codes
extracted from different irides using Hilbert Transform.
This paper shows that the k-means quantization of a signal can be interpreted
both as a crisp indicator function and as a fuzzy membership assignment
describing fuzzy clusters and fuzzy boundaries. Combined crisp and fuzzy
indicator functions are defined here as natural generalizations of the ordinary
crisp and fuzzy indicator functions, respectively. An application to iris
segmentation is presented together with a demo program.
We analyze and improve low rank representation (LRR), the state-of-the-art
algorithm for subspace segmentation of data. We prove that for the noiseless
case, the optimization model of LRR has a unique solution, which is the shape
interaction matrix (SIM) of the data matrix. So in essence LRR is equivalent to
factorization methods. We also prove that the minimum value of the optimization
model of LRR is equal to the rank of the data matrix. For the noisy case, we
show that LRR can be approximated as a factorization method that combines noise
removal by column sparse robust PCA.
This paper presents multi-font/multi-size Kannada numerals and vowels
recognition based on spatial features. Directional spatial features viz stroke
density, stroke length and the number of stokes in an image are employed as
potential features to characterize the printed Kannada numerals and vowels.
Based on these features 1100 numerals and 1400 vowels are classified with
Multi-class Support Vector Machines (SVM). The proposed system achieves the
recognition accuracy as 98.45% and 90.64% for numerals and vowels respectively.
We propose a traffic congestion estimation system based on unsupervised
on-line learning algorithm. The system does not rely on background extraction
or motion detection. It extracts local features inside detection regions of
variable size which are drawn on lanes in advance. The extracted features are
then clustered into two classes using K-means and Gaussian Mixture Models(GMM).
A Bayes classifier is used to detect vehicles according to the previous cluster
information which keeps updated whenever system is running by on-line EM
algorithm.
Using a vehicle toy as a moving object an automatic road lighting system
(ARLS) model is constructed. A video camera with 25 fps is used to capture the
vehicle toy motion as it moves in the test segment of the road. Captured images
are then processed to calculate vehicle toy speed. This information of the
speed together with position of vehicle toy is then used to switch on and off
the lighting system along the path that passes by the vehicle toy.
An algorithm for pose and motion estimation using corresponding features in
images and a digital terrain map is proposed. Using a Digital Terrain (or
Digital Elevation) Map (DTM/DEM) as a global reference enables recovering the
absolute position and orientation of the camera. In order to do this, the DTM
is used to formulate a constraint between corresponding features in two
consecutive frames. The utilization of data is shown to improve the robustness
and accuracy of the inertial navigation algorithm.
In this paper, we present a technique by which high-intensity feature vectors
extracted from the Gabor wavelet transformation of frontal face images, is
combined together with Independent Component Analysis (ICA) for enhanced face
recognition. Firstly, the high-intensity feature vectors are automatically
extracted using the local characteristics of each individual face from the
Gabor transformed images. Then ICA is applied on these locally extracted
high-intensity feature vectors of the facial images to obtain the independent
high intensity feature (IHIF) vectors.
This paper demonstrates two different fusion techniques at two different
levels of a human face recognition process. The first one is called data fusion
at lower level and the second one is the decision fusion towards the end of the
recognition process. At first a data fusion is applied on visual and
corresponding thermal images to generate fused image. Data fusion is
implemented in the wavelet domain after decomposing the images through
Daubechies wavelet coefficients (db2). During the data fusion maximum of
approximate and other three details coefficients are merged together.
This paper presents a comparative study of two different methods, which are
based on fusion and polar transformation of visual and thermal images. Here,
investigation is done to handle the challenges of face recognition, which
include pose variations, changes in facial expression, partial occlusions,
variations in illumination, rotation through different angles, change in scale
etc. To overcome these obstacles we have implemented and thoroughly examined
two different fusion techniques through rigorous experimentation.
This papers introduces a new family of iris encoders which use 2-dimensional
Haar Wavelet Transform for noise attenuation, and Hilbert Transform to encode
the iris texture. In order to prove the usefulness of the newly proposed iris
encoding approach, the recognition results obtained by using these new encoders
are compared to those obtained using the classical Log- Gabor iris encoder.
Twelve tests involving single/multienrollment and conducted on Bath Iris Image
Database are presented here.
Cohomology and cohomology ring of three-dimensional (3D) objects are
topological invariants that characterize holes and their relations. Cohomology
ring has been traditionally computed on simplicial complexes. Nevertheless,
cubical complexes deal directly with the voxels in 3D images, no additional
triangulation is necessary, facilitating efficient algorithms for the
computation of topological invariants in the image context. In this paper, we
present formulas to directly compute the cohomology ring of 3D cubical
complexes without making use of any additional triangulation.
It is well-known that spatial averaging can be realized (in space or
frequency domain) using algorithms whose complexity does not depend on the size
or shape of the filter. These fast algorithms are generally referred to as
constant-time or O(1) algorithms in the image processing literature. Along with
the spatial filter, the edge-preserving bilateral filter [bilateralFilter]
involves an additional range kernel. This is used to restrict the averaging to
those neighborhood pixels whose intensity are similar or close to that of the
pixel of interest.
Identity verification is an increasingly important process in our daily
lives, and biometric recognition is a natural solution to the authentication
problem.
One of the most important research directions in the field of biometrics is
the characterization of novel biometric traits that can be used in conjunction
with other traits, to limit their shortcomings or to enhance their performance.
Structural pattern recognition describes and classifies data based on the
relationships of features and parts. Topological invariants, like the Euler
number, characterize the structure of objects of any dimension. Cohomology can
provide more refined algebraic invariants to a topological space than does
homology. It assigns `quantities' to the chains used in homology to
characterize holes of any dimension. Graph pyramids can be used to describe
subdivisions of the same object at multiple levels of detail.
In this paper we investigate a technique to find out vocal source based
features from the LP residual of speech signal for automatic speaker
identification. Autocorrelation with some specific lag is computed for the
residual signal to derive these features. Compared to traditional features like
MFCC, PLPCC which represent vocal tract information, these features represent
complementary vocal cord information. Our experiment in fusing these two
sources of information in representing speaker characteristics yield better
speaker identification accuracy.
Statistical dependencies among wavelet coefficients are commonly represented
by graphical models such as hidden Markov trees(HMTs). However, in linear
inverse problems such as deconvolution, tomography, and compressed sensing, the
presence of a sensing or observation matrix produces a linear mixing of the
simple Markovian dependency structure. This leads to reconstruction problems
that are non-convex optimizations. Past work has dealt with this issue by
resorting to greedy or suboptimal iterative reconstruction methods.
The main goal of the GEOMIR2K9 project is to create a software program that
is able to find similar scenic images clustered by geographical location and
sorted by similarity based only on their visual content. The user should be
able to input a query image, based on this given query image the program should
find relevant visual content and present this to the user in a meaningful way.
Technically the goal for the GEOMIR2K9 project is twofold.
Template matching is one of the most prevalent pattern recognition methods
worldwide. It has found uses in most visual concept detection fields. In this
work, we investigate methods for improving template matching by adjusting the
weights of different regions of the template. We compare several weight maps
and test the methods using the FERET face test set in the context of human eye
detection.
Design of a fuzzy rule based classifier is proposed. The performance of the
classifier for multispectral satellite image classification is improved using
Dempster- Shafer theory of evidence that exploits information of the
neighboring pixels. The classifiers are tested rigorously with two known images
and their performance are found to be better than the results available in the
literature. We also demonstrate the improvement of performance while using D-S
theory along with fuzzy rule based classifiers over the basic fuzzy rule based
classifiers for all the test cases.
A new method is proposed to get image features' geometric information. Using
Gaussian as an input signal, a theoretical optimal solution to calculate
feature's affine shape is proposed. Based on analytic result of a feature
model, the method is different from conventional iterative approaches. From the
model, feature's parameters such as position, orientation, background
luminance, contrast, area and aspect ratio can be extracted. Tested with
synthesized and benchmark data, the method achieves or outperforms existing
approaches in term of accuracy, speed and stability.
The fundamental matrix and trifocal tensor are convenient algebraic
representations of the epipolar geometry of two and three view configurations,
respectively. The estimation of these entities is central to most
reconstruction algorithms, and a solid understanding of their properties and
constraints is therefore very important. The fundamental matrix has 1 internal
constraint which is well understood, whereas the trifocal tensor has 8
independent algebraic constraints.
Discontinuity preserving smoothing is a fundamentally important procedure
that is useful in a wide variety of image processing contexts. It is directly
useful for noise reduction, and frequently used as an intermediate step in
higher level algorithms. For example, it can be particularly useful in edge
detection and segmentation. Three well known algorithms for discontinuity
preserving smoothing are nonlinear anisotropic diffusion, bilateral filtering,
and mean shift filtering.
Dual energy computerized tomography has gained great interest because of its
ability to characterize the chemical composition of a material rather than
simply providing relative attenuation images as in conventional tomography.
The most common primary brain tumors are gliomas, evolving from the cerebral
supportive cells. For clinical follow-up, the evaluation of the preoperative
tumor volume is essential. Volumetric assessment of tumor volume with manual
segmentation of its outlines is a time-consuming process that can be overcome
with the help of computerized segmentation methods. In this contribution, two
methods for World Health Organization (WHO) grade IV glioma segmentation in the
human brain are compared using magnetic resonance imaging (MRI) patient data
from the clinical routine.
This short article presents a class of projection-based solution algorithms
to the problem considered in the pioneering work on compressed sensing -
perfect reconstruction of a phantom image from 22 radial lines in the frequency
domain. Under the framework of projection-based image reconstruction, we will
show experimentally that several old and new tools of nonlinear filtering
(including Perona-Malik diffusion, nonlinear diffusion, Translation-Invariant
thresholding and SA-DCT thresholding) all lead to perfect reconstruction of the
phantom image.
Diffusion Tensor Imaging (DTI) provides the possibility of estimating the
location and course of eloquent structures in the human brain. Knowledge about
this is of high importance for preoperative planning of neurosurgical
interventions and for intraoperative guidance by neuronavigation in order to
minimize postoperative neurological deficits. Therefore, the segmentation of
these structures as closed, three-dimensional object is necessary.
A geometric model of sparse signal representations is introduced for classes
of signals. It is computed by optimizing co-occurrence groups with a maximum
likelihood estimate calculated with a Bernoulli mixture model. Applications to
face image compression and MNIST digit classification illustrate the
applicability of this model.
A novel framework of compressed sensing, namely statistical compressed
sensing (SCS), that aims at efficiently sampling a collection of signals that
follow a statistical distribution, and achieving accurate reconstruction on
average, is introduced.
Finding a match between partially available deformable shapes is a
challenging problem with numerous applications. The problem is usually
approached by computing local descriptors on a pair of shapes and then
establishing a point-wise correspondence between the two. In this paper, we
introduce an alternative correspondence-less approach to matching fragments to
an entire shape undergoing a non-rigid deformation. We use diffusion geometric
descriptors and optimize over the integration domains on which the integral
descriptors of the two parts match.
In this paper, we explore the use of the diffusion geometry framework for the
fusion of geometric and photometric information in local and global shape
descriptors. Our construction is based on the definition of a diffusion process
on the shape manifold embedded into a high-dimensional space where the
embedding coordinates represent the photometric information. Experimental
results show that such data fusion is useful in coping with different
challenges of shape analysis where pure geometric and pure photometric methods
fail.
The efficient repair of cellular DNA is essential for the maintenance and
inheritance of genomic information. In order to cope with the high frequency of
spontaneous and induced DNA damage, a multitude of repair mechanisms have
evolved. These are enabled by a wide range of protein factors specifically
recognizing different types of lesions and finally restoring the normal DNA
sequence. This work focuses on the repair factor XPC (xeroderma pigmentosum
complementation group C), which identifies bulky DNA lesions and initiates
their removal via the nucleotide excision repair pathway.
The past decade has seen the growing popularity of Bag of Features (BoF)
approaches to many computer vision tasks, including image classification, video
search, robot localization, and texture recognition. Part of the appeal is
simplicity. BoF methods are based on orderless collections of quantized local
image descriptors; they discard spatial information and are therefore
conceptually and computationally simpler than many alternative methods.
Kernel-based machine learning algorithms are based on mapping data from the
original input feature space to a kernel feature space of higher dimensionality
to solve a linear problem in that space. Over the last decade, kernel based
classification and regression approaches such as support vector machines have
widely been used in remote sensing as well as in various civil engineering
applications.
English Character Recognition (CR) has been extensively studied in the last
half century and progressed to a level, sufficient to produce technology driven
applications. But same is not the case for Indian languages which are
complicated in terms of structure and computations. Rapidly growing
computational power may enable the implementation of Indic CR methodologies.
Digital document processing is gaining popularity for application to office and
library automation, bank and postal services, publishing houses and
communication technology.
Various applications of car plate recognition systems have been developed
using various kinds of methods and techniques by researchers all over the
world. The applications developed were only suitable for specific country due
to its standard specification endorsed by the transport department of
particular countries. The Road Transport Department of Malaysia also has
endorsed a specification for car plates that includes the font and size of
characters that must be followed by car owners. However, there are cases where
this specification is not followed.
This chapter presents a framework for detecting fake regions by using various
methods including watermarking technique and blind approaches. In particular,
we describe current categories on blind approaches which can be divided into
five: pixel-based techniques, format-based techniques, camera-based techniques,
physically-based techniques and geometric-based techniques. Then we take a
second look on the geometric-based techniques and further categorize them in
detail. In the following section, the state-of-the-art methods involved in the
geometric technique are elaborated.
We present a method for segmenting an arbitrary number of moving objects in
image sequences using the geometry of 6 points in 2D to infer motion
consistency. The method has been evaluated on the Hopkins 155 database and
surpasses current state-of-the-art methods such as SSC, both in terms of
overall performance on two and three motions but also in terms of maximum
errors. The method works by ?nding initial clusters in the spatial domain, and
then classifying each remaining point as belonging to the cluster that
minimizes a motion consistency score.
Rank-based analysis is a basic approach for many real world applications.
Recently, with the developments of compressive sensing, an interesting problem
was proposed to recover a lowrank matrix from sparse noise. In this paper, we
will address this problem and propose a low rank matrix recovery algorithm
based on sparsity tacking. The core of the proposed Sparsity Tracking
Recovery(STR) is a heuristic kernel, which is introduced to penalize the noise
distribution. With the heuristic method, the sparse entries in the noise matrix
can be accurately tracked and discouraged to be zero.
We propose a method for learning sparse representations of depth (disparity)
maps, which is able to cope with noise and unreliable depth measurements. The
proposed algorithm relaxes the usual assumption of the stationary noise model
in sparse coding and enables learning from data corrupted with spatially
varying noise or uncertainty. Different noise statistics at each pixel location
are inferred from the data, and the learning rule is adapted with respect to
the noise level.
The goal of this paper is the development of a novel approach for the problem
of Noise Removal, based on the theory of Reproducing Kernels Hilbert Spaces
(RKHS). The problem is cast as an optimization task in a RKHS, by taking
advantage of the celebrated semiparametric Representer Theorem. Examples verify
that in the presence of gaussian noise the proposed method performs relatively
well compared to wavelet based technics and outperforms them significantly in
the presence of impulse or mixed noise.
In this paper we propose a new wavelet transform applicable to functions
defined on graphs, high dimensional data and networks. The proposed method
generalizes the Haar-like transform proposed in \cite{gavish2010mwot}, and it
is similarly defined via a hierarchical tree, which is assumed to capture the
geometry and structure of the input data. It is applied to the data using a
multiscale filtering and decimation scheme, which can employ different wavelet
filters. We propose a tree construction method which results in efficient
representation of the input function in the transform domain.
In this paper a fuzzy clustering model for fuzzy data with outliers is
proposed. The model is based on Wasserstein distance between interval valued
data which is generalized to fuzzy data. In addition, Keller's approach is used
to identify outliers and reduce their influences. We have also defined a
transformation to change our distance to the Euclidean distance. With the help
of this approach, the problem of fuzzy clustering of fuzzy data is reduced to
fuzzy clustering of crisp data.
The Peirce quincuncial projection is a mapping of the surface of a sphere to
a square. It is a conformal mapping except for four points on the equator.
These points of non-conformality cause significant artifacts in photographic
applications. In this paper, we propose an algorithm and a user-interface to
mitigate these artifacts. We then promote the Peirce quincuncial projection as
a viable alternative to the stereographic projection in photographic
applications.
Classification is one of the most important tasks of machine learning.
Although the most well studied model is the two-class problem, in many
scenarios there is the opportunity to label critical items for manual revision,
instead of trying to automatically classify every item. In this paper we adapt
a paradigm initially proposed for the classification of ordinal data to address
the classification problem with reject option.
Contour tracking in adverse environments is a challenging problem due to
cluttered background, illumination variation, occlusion, and noise, among
others. This paper presents a robust contour tracking method by contributing to
some of the key issues involved, including (a) a region functional formulation
and its optimization; (b) design of a robust and effective feature; and (c)
development of an integrated tracking algorithm.
The problem of identifying the 3D pose of a known object from a given 2D
image has important applications in Computer Vision ranging from robotic vision
to image analysis. Our proposed method of registering a 3D model of a known
object on a given 2D photo of the object has numerous advantages over existing
methods: It does neither require prior training nor learning, nor knowledge of
the camera parameters, nor explicit point correspondences or matching features
between image and model.
Calibration in a multi camera network has widely been studied for over
several years starting from the earlier days of photogrammetry. Many authors
have presented several calibration algorithms with their relative advantages
and disadvantages. In a stereovision system, multiple view reconstruction is a
challenging task. However, the total computational procedure in detail has not
been presented before.
Color quantization is an important operation with numerous applications in
graphics and image processing. Most quantization methods are essentially based
on data clustering algorithms. However, despite its popularity as a general
purpose clustering algorithm, k-means has not received much respect in the
color quantization literature because of its high computational requirements
and sensitivity to initialization. In this paper, a fast color quantization
method based on k-means is presented.
A new framework of compressive sensing (CS), namely statistical compressive
sensing (SCS), that aims at efficiently sampling a collection of signals that
follow a statistical distribution and achieving accurate reconstruction on
average, is introduced.
Adaptive sparse coding methods learn a possibly overcomplete set of basis
functions, such that natural image patches can be reconstructed by linearly
combining a small subset of these bases. The applicability of these methods to
visual object recognition tasks has been limited because of the prohibitive
cost of the optimization algorithms required to compute the sparse
representation. In this work we propose a simple and efficient algorithm to
learn basis functions.
In this paper we present a simple and fast geometric method for modeling data
by a union of affine sets. The method begins by forming a collection of local
best fit affine subspaces. The correct sizes of the local neighborhoods are
determined automatically by the Jones' $\beta_2$ numbers; we prove under
certain geometric conditions that good local neighborhoods exist and are found
by our method. The collection is further processed by a greedy selection
procedure or a spectral method to generate the final model.
An inverse iterative algorithm for microwave imaging based on moment method
solution is presented here. The iterative scheme has been developed on
constrained optimization technique and is certain to converge. Different mesh
size for the model has been used here to overcome the Inverse Crime. The
synthetic data at the receivers is contaminated with different percentage of
noise. The ill-posedness of the problem is solved by Levenberg-Marquardt
method. The algorithm is applied to synthetic data and the reconstructed image
is then further enhanced through the Image enhancement technique
We present an exact method of greatly speeding up belief propagation (BP) for
a wide variety of potential functions in pairwise MRFs and other graphical
models. Specifically, our technique applies whenever the pairwise potentials
have been {\em truncated} to a constant value for most pairs of states, as is
commonly done in MRF models with robust potentials (such as stereo) that impose
an upper bound on the penalty assigned to discontinuities; for each of the $M$
possible states in one node, only a smaller number $m$ of compatible states in
a neighboring node are assigned milder penalties.
There is an abundant literature on face detection due to its important role
in many vision applications. Since Viola and Jones proposed the first real-time
AdaBoost based face detector, Haar-like features have been adopted as the
method of choice for frontal face detection. In this work, we show that simple
features other than Haar-like features can also be applied for training an
effective face detector.
This paper introduces a novel method for human face detection with its
orientation by using wavelet, principle component analysis (PCA) and redial
basis networks. The input image is analyzed by two-dimensional wavelet and a
two-dimensional stationary wavelet. The common goals concern are the image
clearance and simplification, which are parts of de-noising or compression. We
applied an effective procedure to reduce the dimension of the input vectors
using PCA.
This paper proposes a framework for modeling instantaneous changes natural
scenes in real time using Lagrangian Particle Framework and a fluid-particle
grid approach. This research can be divided into 3 distinct sections: the first
one discusses a multi-camera rig that can measure ego-motion accurately up to
88%, how this device becomes the backbone of our framework, and some
improvements devised to optimize a know framework for depth maps and 3d
structure estimation from a single still image called make3d.
Many algorithms for approximate nearest neighbor search in high-dimensional
spaces partition the data into clusters. At query time, in order to avoid
exhaustive search, an index selects the few (or a single) clusters nearest to
the query point. Clusters are often produced by the well-known $k$-means
approach since it has several desirable properties. On the downside, it tends
to produce clusters having quite different cardinalities. Imbalanced clusters
negatively impact both the variance and the expectation of query response
times.
This paper deals with an improvement of vertex based nonlinear diffusion for
mesh denoising. This method directly filters the position of the vertices using
Laplace, reduced centered Gaussian and Rayleigh probability density functions
as diffusivities. The use of these PDFs improves the performance of a
vertex-based diffusion method which are adapted to the underlying mesh
structure. We also compare the proposed method to other mesh denoising methods
such as Laplacian flow, mean, median, min and the adaptive MMSE filtering. To
evaluate these methods of filtering, we use two error metrics.
Real-time object detection is one of the core problems in computer vision.
The cascade boosting framework proposed by Viola and Jones has become the
standard for this problem. In this framework, the learning goal for each node
is asymmetric, which is required to achieve a high detection rate and a
moderate false positive rate. We develop new boosting algorithms to address
this asymmetric learning problem. We show that our methods explicitly optimize
asymmetric loss objectives in a totally corrective fashion.
Image hashing is the process of associating a short vector of bits to an
image. The resulting summaries are useful in many applications including image
indexing, image authentication and pattern recognition. These hashes need to be
invariant under transformations of the image that result in similar visual
content, but should drastically differ for conceptually distinct contents. This
paper proposes an image hashing method that is invariant under rotation,
scaling and translation of the image.
This paper addresses the problem of infants' cry fundamental frequency
estimation. The fundamental frequency is estimated using a modified simple
inverse filtering tracking (SIFT) algorithm. The performance of the modified
SIFT is studied using a real database of infants' cry.
In this paper, Deterministic Cellular Automata (DCA) based video shot
classification and retrieval is proposed. The deterministic 2D Cellular
automata model captures the human facial expressions, both spontaneous and
posed. The determinism stems from the fact that the facial muscle actions are
standardized by the encodings of Facial Action Coding System (FACS) and Action
Units (AUs). Based on these encodings, we generate the set of evolutionary
update rules of the DCA for each facial expression.
Background: Dermoscopy is one of the major imaging modalities used in the
diagnosis of melanoma and other pigmented skin lesions. Due to the difficulty
and subjectivity of human interpretation, automated analysis of dermoscopy
images has become an important research area. Border detection is often the
first step in this analysis. Methods: In this article, we present an
approximate lesion localization method that serves as a preprocessing step for
detecting borders in dermoscopy images. In this method, first the black frame
around the image is removed using an iterative algorithm.
Human ovarian reserve is defined by the population of nongrowing follicles
(NGFs) in the ovary. Direct estimation of ovarian reserve involves the
identification of NGFs in prepared ovarian tissue. Previous studies involving
human tissue have used hematoxylin and eosin (HE) stain, with NGF populations
estimated by human examination either of tissue under a microscope, or of
images taken of this tissue. In this study we replaced HE with proliferating
cell nuclear antigen (PCNA), and automated the identification and enumeration
of NGFs that appear in the resulting microscopic images.
Cascade classifiers are widely used in real-time object detection. Different
from conventional classifiers that are designed for a low overall
classification error rate, a classifier in each node of the cascade is required
to achieve an extremely high detection rate and moderate false positive rate.
Although there are a few reported methods addressing this requirement in the
context of object detection, there is no a principled feature selection method
that explicitly takes into account this asymmetric node learning objective. We
provide such an algorithm here.
The importance of manifolds and Riemannian geometry in mathematics is
spreading to applied fields in which the need to model non-linear structure has
spurred wide-spread interest in geometry. The transfer of interest has created
demand for methods for computing classical constructs of geometry on manifolds
occurring in practical applications. This paper develops initial value problems
for the computation of the differential of the exponential map and Jacobi
fields on parametrically and implicitly represented manifolds.
The physiological and behavioral trait is employed to develop biometric
authentication systems. The proposed work deals with the authentication of iris
and signature based on minimum variance criteria. The iris patterns are
preprocessed based on area of the connected components. The segmented image
used for authentication consists of the region with large variations in the
gray level values. The image region is split into quadtree components. The
components with minimum variance are determined from the training samples. Hu
moments are applied on the components.
In this paper we present an efficient computer aided mass classification
method in digitized mammograms using Artificial Neural Network (ANN), which
performs benign-malignant classification on region of interest (ROI) that
contains mass. One of the major mammographic characteristics for mass
classification is texture. ANN exploits this important factor to classify the
mass into benign or malignant. The statistical textural features used in
characterizing the masses are mean, standard deviation, entropy, skewness,
kurtosis and uniformity.
In this paper orthogonal multifilters for astronomical image processing are
presented. We obtained new orthogonal multifilters based on the orthogonal
wavelet of Haar and Daubechies. Recently, multiwavelets have been introduced as
a more powerful multiscale analysis tool. It adds several degrees of freedom in
multifilter design and makes it possible to have several useful properties such
as symmetry, orthogonality, short support, and a higher number of vanishing
moments simultaneously. Multifilter decomposition of scanned photographic
plates with astronomical images is made.
VERSA provides a general-purpose framework for defining and recognizing
events in live or recorded surveillance video streams. The approach for event
recognition in VERSA is using a declarative logic language to define the
spatial and temporal relationships that characterize a given event or activity.
Doing so requires the definition of certain fundamental spatial and temporal
relationships and a high-level syntax for specifying frame templates and query
parameters.
l1-minimization refers to finding the minimum l1-norm solution to an
underdetermined linear system b=Ax. It has recently received much attention,
mainly motivated by the new compressive sensing theory that shows that under
quite general conditions the minimum l1-norm solution is also the sparsest
solution to the system of linear equations. Although the underlying problem is
a linear program, conventional algorithms such as interior-point methods suffer
from poor scalability for large-scale real world problems.
We propose a new approach for constructing a 3D representation from a 2D
wireframe drawing. A drawing is simply a parallel projection of a 3D object
onto a 2D surface; humans are able to recreate mental 3D models from 2D
representations very easily, yet the process is very difficult to emulate
computationally. We hypothesize that our ability to perform this construction
relies on the angles in the 2D scene, among other geometric properties. Being
able to reproduce this reconstruction process automatically would allow for
efficient and robust 3D sketch interfaces.
A lot of image registration techniques have been developed with great
significance for data analysis in medicine, astrophotography, satellite imaging
and few other areas. This work proposes a method for medical image registration
using Fast Walsh Hadamard transform. This algorithm registers images of the
same or different modalities. Each image bit is lengthened in terms of Fast
Walsh Hadamard basis functions. Each basis function is a notion of determining
various aspects of local structure, e.g., horizontal edge, corner, etc.
Nonlinear bilateral filters (BF) deliver a fine blend of computational
simplicity and blur-free denoising. However, little is known about their
nature, noise-suppressing properties, and optimal choices of filter parameters.
Our study is meant to fill this gap-explaining the underlying mechanism of
bilateral filtering and providing the methodology for optimal filter selection.
Practical application to CT image denoising is discussed to illustrate our
results.
In this work we investigate a novel approach to handle the challenges of face
recognition, which includes rotation, scale, occlusion, illumination etc. Here,
we have used thermal face images as those are capable to minimize the affect of
illumination changes and occlusion due to moustache, beards, adornments etc.
The proposed approach registers the training and testing thermal face images in
polar coordinate, which is capable to handle complicacies introduced by scaling
and rotation. Line features are extracted from thermal polar images and feature
vectors are constructed using these line.
In this paper we present a simple novel approach to tackle the challenges of
scaling and rotation of face images in face recognition. The proposed approach
registers the training and testing visual face images by log-polar
transformation, which is capable to handle complicacies introduced by scaling
and rotation. Log-polar images are projected into eigenspace and finally
classified using an improved multi-layer perceptron. In the experiments we have
used ORL face database and Object Tracking and Classification Beyond Visible
Spectrum (OTCBVS) database for visual face images.
This paper presents a concept of image pixel fusion of visual and thermal
faces, which can significantly improve the overall performance of a face
recognition system. Several factors affect face recognition performance
including pose variations, facial expression changes, occlusions, and most
importantly illumination changes. So, image pixel fusion of thermal and visual
images is a solution to overcome the drawbacks present in the individual
thermal and visual face images. Fused images are projected into eigenspace and
finally classified using a multi-layer perceptron.
Here an efficient fusion technique for automatic face recognition has been
presented. Fusion of visual and thermal images has been done to take the
advantages of thermal images as well as visual images. By employing fusion a
new image can be obtained, which provides the most detailed, reliable, and
discriminating information. In this method fused images are generated using
visual and thermal face images in the first step. In the second step, fused
images are projected into eigenspace and finally classified using a radial
basis function neural network.
In this paper we present a technique for fusion of optical and thermal face
images based on image pixel fusion approach. Out of several factors, which
affect face recognition performance in case of visual images, illumination
changes are a significant factor that needs to be addressed. Thermal images are
better in handling illumination conditions but not very consistent in capturing
texture details of the faces.
Artificial neural networks have already shown their success in face
recognition and similar complex pattern recognition tasks. However, a major
disadvantage of the technique is that it is extremely slow during training for
larger classes and hence not suitable for real-time complex problems such as
pattern recognition. This is an attempt to develop a parallel framework for the
training algorithm of a perceptron. In this paper, two general architectures
for a Multilayer Perceptron (MLP) have been demonstrated.
In this paper we present a comparative study on fusion of visual and thermal
images using different wavelet transformations. Here, coefficients of discrete
wavelet transforms from both visual and thermal images are computed separately
and combined. Next, inverse discrete wavelet transformation is taken in order
to obtain fused face image. Both Haar and Daubechies (db2) wavelet transforms
have been used to compare recognition results. For experiments IRIS
Thermal/Visual Face Database was used.
In this paper fusion of visual and thermal images in wavelet transformed
domain has been presented. Here, Daubechies wavelet transform, called as D2,
coefficients from visual and corresponding coefficients computed in the same
manner from thermal images are combined to get fused coefficients. After
decomposition up to fifth level (Level 5) fusion of coefficients is done.
Inverse Daubechies wavelet transform of those coefficients gives us fused face
images.
This paper investigates the multiresolution level-1 and level-2 Quotient
based Fusion of thermal and visual images. In the proposed system, the method-1
namely "Decompose then Quotient Fuse Level-1" and the method-2 namely
"Decompose-Reconstruct then Quotient Fuse Level-2" both work on wavelet
transformations of the visual and thermal face images. The wavelet transform is
well-suited to manage different image resolution and allows the image
decomposition in different kinds of coefficients, while preserving the image
information without any loss.