Vocal tract resonance characteristics in acoustic speech signals are
classically tracked using frame-by-frame point estimates of formant frequencies
followed by candidate selection and smoothing using dynamic programming methods
that minimize ad hoc cost functions. The goal of the current work is to provide
both point estimates and associated uncertainties of center frequencies and
bandwidths in a statistically principled state-space framework.
This article develops a general detection theory for speech analysis based on
time-varying autoregressive models, which themselves generalize the classical
linear predictive speech analysis framework. This theory leads to a
computationally efficient decision-theoretic procedure that may be applied to
detect the presence of vocal tract variation in speech waveform data.
In this article we introduce a broad family of adaptive, linear
time-frequency representations termed superposition frames, and show that they
admit desirable fast overlap-add reconstruction properties akin to standard
short-time Fourier techniques. This approach stands in contrast to many
adaptive time-frequency representations in the extant literature, which, while
more flexible than standard fixed-resolution approaches, typically fail to
provide efficient reconstruction and often lack the regular structure necessary
for precise frame-theoretic analysis.
Material indentation studies, in which a probe is brought into controlled
physical contact with an experimental sample, have long been a primary means by
which scientists characterize the mechanical properties of materials. More
recently, the advent of atomic force microscopy, which operates on the same
fundamental principle, has in turn revolutionized the nanoscale analysis of
soft biomaterials such as cells and tissues.