In the present work we have selected a collection of statistical and
mathematical tools useful for the exploration of multivariate data and we
present them in a form that is meant to be particularly accessible to a
classically trained mathematician. We give self contained and streamlined
introductions to principal component analysis, multidimensional scaling and
statistical hypothesis testing. Within the presented mathematical framework we
then propose a general exploratory methodology for the investigation of real
world high dimensional datasets that builds on statistical and knowledge
supported visualizations. We exemplify the proposed methodology by applying it
to several different genomewide DNA-microarray datasets. The exploratory
methodology should be seen as an embryo that can be expanded and developed in
many directions. As an example we point out some recent promising advances in
the theory for random matrices that, if further developed, potentially could
provide practically useful and theoretically well founded estimations of
information content in dimension reducing visualizations. We hope that the
present work can serve as an introduction to, and help to stimulate more
research within, the interesting and rapidly expanding field of data
exploration.