We introduce a method for analyzing high-dimensional data. Our approach is
inspired by Morse theory and uses the nudged elastic band method from
computational chemistry. As output, we produce an increasing sequence of cell
complexes modeling the dense regions of the data. We test the method on several
data sets and obtain small cell complexes revealing informative topological
structure.
Topological methods such as persistent homology are powerful tools for data
analysis of high-dimensional data sets but these methods almost exclusively
rely on thresholding techniques. However, in noisy data sets thesholding does
not always allow for the recovery of topological information. We present a
computationally-efficient algorithm to allow for topological data analysis on
noisy high-dimensional point cloud data sets. In many cases, the algorithm
returns data that has so few outliers that there is no need to threshold the
data before performing topological analysis.
In this paper we examine the use of topological methods for multivariate
statistics. Using persistent homology from computational algebraic topology, a
random sample is used to construct estimators of persistent homology. This
estimation procedure can then be evaluated using the bottleneck distance
between the estimated persistent homology and the true persistent homology. The
connection to statistics comes from the fact that when viewed as a
nonparametric regression problem, the bottleneck distance is bounded by the
sup-norm loss.