Ping Li

  1. Improving Compressed Counting.

    Authors: Ping Li
    Subjects: Data Structures and Algorithms
    Abstract

    Compressed Counting (CC) [22] was recently proposed for estimating the ath
    frequency moments of data streams, where 0 < a <= 2. CC can be used for
    estimating Shannon entropy, which can be approximated by certain functions of
    the ath frequency moments as a -> 1. Monitoring Shannon entropy for anomaly
    detection (e.g., DDoS attacks) in large networks is an important task. This
    paper presents a new algorithm for improving CC. The improvement is most
    substantial when a -> 1--. For example, when a = 0:99, the new algorithm
    reduces the estimation variance roughly by 100-fold.

  2. Approximating Higher-Order Distances Using Random Projections.

    Authors: Ping Li, Yiyuan She, Michael W. Mahoney
    Subjects: Artificial Intelligence
    Abstract

    We provide a simple method and relevant theoretical analysis for efficiently
    estimating higher-order lp distances. While the analysis mainly focuses on l4,
    our methodology extends naturally to p = 6,8,10..., (i.e., when p is even).
    Distance-based methods are popular in machine learning. In large-scale
    applications, storing, computing, and retrieving the distances can be both
    space and time prohibitive. Efficient algorithms exist for estimating lp
    distances if 0 < p <= 2. The task for p > 2 is known to be difficult. Our work
    partially fills this gap.

  3. Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost.

    Authors: Ping Li
    Subjects: Artificial Intelligence
    Abstract

    Logitboost is an influential boosting algorithm for classification. In this
    paper, we develop robust logitboost to provide an explicit formulation of
    tree-split criterion for building weak learners (regression trees) for
    logitboost. This formulation leads to a numerically stable implementation of
    logitboost. We then propose abc-logitboost for multi-class classification, by
    combining robust logitboost with the prior work of abc-boost. Previously,
    abc-boost was implemented as abc-mart using the mart algorithm.

  4. User-level sentiment analysis incorporating social networks.

    Authors: Ping Li, Lillian Lee, Chenhao Tan, Jie Tang, Long Jiang, Ming Zhou
    Subjects: Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
    Abstract

    We show that information about social relationships can be used to improve
    user-level sentiment analysis. The main motivation behind our approach is that
    users that are somehow "connected" may be more likely to hold similar opinions;
    therefore, relationship information can complement what we can extract about a
    user's viewpoints from their utterances.

  5. Hashing Algorithms for Large-Scale Learning.

    Authors: Ping Li, Arnd Christian Konig, Joshua Moore, Anshumali Shrivastava
    Subjects: Machine Learning
    Abstract

    In this paper, we first demonstrate that b-bit minwise hashing, whose
    estimators are positive definite kernels, can be naturally integrated with
    learning algorithms such as SVM and logistic regression. We adopt a simple
    scheme to transform the nonlinear (resemblance) kernel into linear (inner
    product) kernel; and hence large-scale problems can be solved extremely
    efficiently. Our method provides a simple effective solution to large-scale
    learning in massive and extremely high-dimensional datasets, especially when
    data do not fit in memory.

  6. b-Bit Minwise Hashing for Large-Scale Linear SVM.

    Authors: Ping Li, Joshua Moore, Christian Konig
    Subjects: Learning
    Abstract

    In this paper, we propose to (seamlessly) integrate b-bit minwise hashing
    with linear SVM to substantially improve the training (and testing) efficiency
    using much smaller memory, with essentially no loss of accuracy. Theoretically,
    we prove that the resemblance matrix, the minwise hashing matrix, and the b-bit
    minwise hashing matrix are all positive definite matrices (kernels).
    Interestingly, our proof for the positive definiteness of the b-bit minwise
    hashing kernel naturally suggests a simple strategy to integrate b-bit hashing
    with linear SVM.

  7. Fast ABC-Boost for Multi-Class Classification.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    Abc-boost is a new line of boosting algorithms for multi-class
    classification, by utilizing the commonly used sum-to-zero constraint. To
    implement abc-boost, a base class must be identified at each boosting step.
    Prior studies used a very expensive procedure based on exhaustive search for
    determining the base class at each boosting step. Good testing performances of
    abc-boost (implemented as abc-mart and abc-logitboost) on a variety of datasets
    were reported.

  8. On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting.

    Authors: Ping Li
    Subjects: Data Structures and Algorithms
    Abstract

    Estimating the p-th frequency moment of data stream is a very heavily studied
    problem. The problem is actually trivial when p = 1, assuming the strict
    Turnstile model. The sample complexity of our proposed algorithm is essentially
    O(1) near p=1. This is a very large improvement over the previously believed
    O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of
    entropy estimation an easy task, as verified by the experiments included in the
    appendix.

  9. Chern numbers and the indices of some elliptic differential operators.

    Authors: Ping Li
    Subjects: Differential Geometry
    Abstract

    Libgober and Wood proved that the Chern number $c_{1}c_{n-1}$ of a
    $n$-dimensional compact complex manifold can be determined by its Hirzebruch
    $\chi_{y}$-genus. Inspired by the idea of their proof, we show that, for
    compact, spin, almost-complex manifolds, more Chern numbers can be determined
    by the indices of some twisted Dirac and signature operators. As a byproduct,
    we get a divisibility result of certain characteristic number for such
    manifolds. Using our method, we give a direct proof of Libgober-Wood's result,
    which was originally proved by induction.

  10. An Empirical Evaluation of Four Algorithms for Multi-Class Classification: Mart, ABC-Mart, Robust LogitBoost, and ABC-LogitBoost.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    This empirical study is mainly devoted to comparing four tree-based boosting
    algorithms: mart, abc-mart, robust logitboost, and abc-logitboost, for
    multi-class classification on a variety of publicly available datasets. Some of
    those datasets have been thoroughly tested in prior studies using a broad range
    of classification algorithms including SVM, neural nets, and deep learning.

    In terms of the empirical classification errors, our experiment results
    demonstrate:

  11. b-Bit Minwise Hashing.

    Authors: Ping Li, Arnd Christian Konig
    Subjects: Data Structures and Algorithms
    Abstract

    This paper establishes the theoretical framework of b-bit minwise hashing.
    The original minwise hashing method has become a standard technique for
    estimating set similarity (e.g., resemblance) with applications in information
    retrieval, data management, social networks and computational advertising.

  12. On the Sample Complexity of Compressed Counting.

    Authors: Ping Li
    Subjects: Data Structures and Algorithms
    Abstract

    Compressed Counting (CC), based on maximally skewed stable random
    projections, was recently proposed for estimating the p-th frequency moments of
    data streams. The case p->1 is extremely useful for estimating Shannon entropy
    of data streams. In this study, we provide a very simple algorithm based on the
    sample minimum estimator and prove a much improved sample complexity bound,
    compared to prior results.

  13. ABC-LogitBoost for Multi-class Classification.

    Authors: Ping Li
    Subjects: Learning
    Abstract

    We develop abc-logitboost, based on the prior work on abc-boost and robust
    logitboost. Our extensive experiments on a variety of datasets demonstrate the
    considerable improvement of abc-logitboost over logitboost and abc-mart.

RSS-материал