This paper studies parallelization schemes for stochastic Vector Quantization
algorithms in order to obtain time speed-ups using distributed resources. We
show that the most intuitive parallelization scheme does not lead to better
performances than the sequential algorithm. Another distributed scheme is
therefore introduced which obtains the expected speed-ups. Then, it is improved
to fit implementation on distributed architectures where communications are
slow and inter-machines synchronization too costly.
We present a novel clustering approach for moving object trajectories that
are constrained by an underlying road network. The approach builds a similarity
graph based on these trajectories then uses modularity-optimization hiearchical
graph clustering to regroup trajectories with similar profiles. Our
experimental study shows the superiority of the proposed approach over classic
hierarchical clustering and gives a brief insight to visualization of the
clustering results.
We introduce in this paper a new way of optimizing the natural extension of
the quantization error using in k-means clustering to dissimilarity data. The
proposed method is based on hierarchical clustering analysis combined with
multi-level heuristic refinement. The method is computationally efficient and
achieves better quantization errors than the
In some real world applications, such as spectrometry, functional models
achieve better predictive performances if they work on the derivatives of order
m of their inputs rather than on the original functions. As a consequence, the
use of derivatives is a common practice in Functional Data Analysis, despite a
lack of theoretical guarantees on the asymptotically achievable performances of
a derivative based model.
This paper proposes an organized generalization of Newman and Girvan's
modularity measure for graph clustering. Optimized via a deterministic
annealing scheme, this measure produces topologically ordered graph clusterings
that lead to faithful and readable graph representations based on clustering
induced graphs. Topographic graph clustering provides an alternative to more
classical solutions in which a standard graph clustering method is applied to
build a simpler graph that is then represented with a graph layout algorithm.
We propose in this paper an exploratory analysis algorithm for functional
data. The method partitions a set of functions into $K$ clusters and represents
each cluster by a simple prototype (e.g., piecewise constant). The total number
of segments in the prototypes, $P$, is chosen by the user and optimally
distributed among the clusters via two dynamic programming algorithms. The
practical relevance of the method is shown on two real world datasets.
Median clustering extends popular neural data analysis methods such as the
self-organizing map or neural gas to general data structures given by a
dissimilarity matrix only. This offers flexible and robust global data
inspection methods which are particularly suited for a variety of data as
occurs in biomedical domains. In this chapter, we give an overview about median
clustering and its properties and extensions, with a particular focus on
efficient implementations adapted to large scale data analysis.
The selection of features that are relevant for a prediction or
classification problem is an important problem in many domains involving
high-dimensional data. Selecting features helps fighting the curse of
dimensionality, improving the performances of prediction or classification
methods, and interpreting the application. In a nonlinear context, the mutual
information is widely used as relevance criterion for features and sets of
features.