Shlomo Geva

  1. K-tree: Large Scale Document Clustering.

    Authors: Christopher M. De Vries, Shlomo Geva
    Subjects: Information Retrieval
    Abstract

    We introduce K-tree in an information retrieval context. It is an efficient
    approximation of the k-means clustering algorithm. Unlike k-means it forms a
    hierarchy of clusters. It has been extended to address issues with sparse
    representations. We compare performance and quality to CLUTO using document
    collections. The K-tree has a low time complexity that is suitable for large
    document collections. This tree structure allows for efficient disk based
    implementations where space requirements exceed that of main memory.

  2. Document Clustering with K-tree.

    Authors: Christopher M. De Vries, Shlomo Geva
    Subjects: Information Retrieval
    Abstract

    This paper describes the approach taken to the XML Mining track at INEX 2008
    by a group at the Queensland University of Technology. We introduce the K-tree
    clustering algorithm in an Information Retrieval context by adapting it for
    document clustering. Many large scale problems exist in document clustering.
    K-tree scales well with large inputs due to its low complexity. It offers
    promising results both in terms of efficiency and quality. Document
    classification was completed using Support Vector Machines.

  3. Random Indexing K-tree.

    Authors: Christopher M. De Vries, Lance De Vine, Shlomo Geva
    Subjects: Information Retrieval
    Abstract

    Random Indexing (RI) K-tree is the combination of two algorithms for
    clustering. Many large scale problems exist in document clustering. RI K-tree
    scales well with large inputs due to its low complexity. It also exhibits
    features that are useful for managing a changing collection. Furthermore, it
    solves previous issues with sparse document vectors when using K-tree. The
    algorithms and data structures are defined, explained and motivated. Specific
    modifications to K-tree are made for use with RI. Experiments have been
    executed to measure quality.

RSS-материал