This paper studies parallelization schemes for stochastic Vector Quantization
algorithms in order to obtain time speed-ups using distributed resources. We
show that the most intuitive parallelization scheme does not lead to better
performances than the sequential algorithm. Another distributed scheme is
therefore introduced which obtains the expected speed-ups. Then, it is improved
to fit implementation on distributed architectures where communications are
slow and inter-machines synchronization too costly.
Motivated by the problem of effectively executing clustering algorithms on
very large data sets, we address a model for large scale distributed clustering
methods. To this end, we briefly recall some standards on the quantization
problem and some results on the almost sure convergence of the Competitive
Learning Vector Quantization (CLVQ) procedure. A general model for linear
distributed asynchronous algorithms well adapted to several parallel computing
architectures is also discussed.
Motivated by a broad range of potential applications, we address the quantile
prediction problem of real-valued time series. We present a sequential quantile
forecasting model based on the combination of a set of elementary nearest
neighbor-type predictors called "experts" and show its consistency under a
minimum of conditions. Our approach builds on the methodology developed in
recent years for prediction of individual sequences and exploits the quantile
structure as a minimizer of the so-called pinball loss function.