Peter Clifford

  1. A statistical analysis of probabilistic counting algorithms.

    Authors: Peter Clifford, Ioana A. Cosma
    Subjects: Computation
    Abstract

    We consider the problem of cardinality estimation in data stream
    applications, focusing on two techniques that use pseudo-random variates to
    form low-dimensional data sketches. We apply conventional statistical methods
    to compare algorithms based on storing either selected order statistics or
    random projections. We derive estimators of the cardinality in both cases and
    show that the maximal-term estimator is recursively computable and has
    exponentially decreasing error bounds.

  2. A simple sketching algorithm for entropy estimation.

    Authors: Peter Clifford, Ioana Ada Cosma
    Subjects: Computation
    Abstract

    We consider the problem of approximating the empirical Shannon entropy of a
    high-frequency data stream when space limitations make exact computation
    infeasible. It is known that \alpha-dependent quantities such as the Renyi and
    Tsallis entropies can be estimated efficiently and unbiasedly from
    low-dimensional \alpha-stable data sketches. An approximation to the Shannon
    entropy can be obtained from either of these quantities by taking \alpha
    sufficiently close to 1. However, practical guidelines for the choice of
    $\alpha$ are lacking. We avoid this problem by going directly to the limit.

RSS-материал