Ryan McDonald

  1. A Universal Part-of-Speech Tagset.

    Authors: Slav Petrov, Dipanjan Das, Ryan McDonald
    Subjects: Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
    Abstract

    To facilitate future research in unsupervised induction of syntactic
    structure and to standardize best-practices, we propose a tagset that consists
    of twelve universal part-of-speech categories. In addition to the tagset, we
    develop a mapping from 25 different treebank tagsets to this universal set. As
    a result, when combined with the original treebank data, this universal tagset
    and mapping produce a dataset consisting of common parts-of-speech for 22
    different languages.

Syndicate content