The goal of the present chapter is to explore the possibility of providing
the research (but also the industrial) community that commonly uses spoken
corpora with a stable portfolio of well-documented standardised formats that
allow a high re-use rate of annotated spoken resources and, as a consequence,
better interoperability across tools used to produce or exploit such resources.
It is usual to consider that standards generate mixed feelings among
scientists. They are often seen as not really reflecting the state of the art
in a given domain and a hindrance to scientific creativity. Still, scientists
should theoretically be at the best place to bring their expertise into
standard developments, being even more neutral on issues that may typically be
related to competing industrial interests.
After two decades of repository development, some conclusions may be drawn as
to which type of repository and what kind of service best supports digital
scholarly communication, and thus the production of new knowledge. Four types
of publication repository may be distinguished, namely the subject-based
repository, research repository, national repository system and institutional
repository. Two important shifts in the role of repositories may be noted. With
regard to content, a well-defined and high quality corpus is essential.
After two decades of repository development, some conclusions may be drawn as
to which type of repository and what kind of service best supports digital
scholarly communication, and thus the production of new knowledge. Four types
of publication repository may be distinguished, namely the subject-based
repository, research repository, national repository system and institutional
repository. Two important shifts in the role of repositories may be noted. With
regard to content, a well-defined and high quality corpus is essential.
In this chapter we present the main issues in representing machine readable
dictionaries in XML, and in particular according to the Text Encoding
Dictionary (TEI) guidelines.
A survey of dictionary models and formats is presented as well as a
presentation of corresponding recent standardisation activities.
The goal of this paper is two-fold: to present an abstract data model for
linguistic annotations and its implementation using XML, RDF and related
standards; and to outline the work of a newly formed committee of the
International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource
Management, which will use this work as its starting point.
This paper provides an overview (in French) of the European PEER project,
focusing on its origins, the actual objectives and the technical deployment.
Multimodal interfaces, combining the use of speech, graphics, gestures, and
facial expressions in input and output, promise to provide new possibilities to
deal with information in more effective and efficient ways, supporting for
instance: - the understanding of possibly imprecise, partial or ambiguous
multimodal input; - the generation of coordinated, cohesive, and coherent
multimodal presentations; - the management of multimodal interaction (e.g.,
task completion, adapting the interface, error prevention) by representing and
exploiting models of the user, the domain, the task, the intera
This paper presents an abstract data model for linguistic annotations and its
implementation using XML, RDF and related standards; and to outline the work of
a newly formed committee of the International Standards Organization (ISO),
ISO/TC 37/SC 4 Language Resource Management, which will use this work as its
starting point. The primary motive for presenting the latter is to solicit the
participation of members of the research community to contribute to the work of
the committee.
It is widely recognized that the proliferation of annotation schemes runs
counter to the need to re-use language resources, and that standards for
linguistic annotation are becoming increasingly mandatory. To answer this need,
we have developed a framework comprised of an abstract model for a variety of
different annotation types (e.g., morpho-syntactic tagging, syntactic
annotation, co-reference annotation, etc.), which can be instantiated in
different ways depending on the annotator's approach and goals.
We describe an encoding scheme for discourse structure and reference, based
on the TEI Guidelines and the recommendations of the Corpus Encoding
Specification (CES). A central feature of the scheme is a CES-based data
architecture enabling the encoding of and access to multiple views of a
marked-up document. We describe a tool architecture that supports the encoding
scheme, and then show how we have used the encoding scheme and the tools to
perform a discourse analytic task in support of a model of global discourse
cohesion called Veins Theory (Cristea & Ide, 1998).
Providing on-line services on the Internet will require the definition of
flexible interfaces that are capable of adapting to the user's characteristics.
This is all the more important in the context of medical applications like home
monitoring, where no two patients have the same medical profile. Still, the
problem is not limited to the capacity of defining generic interfaces, as has
been made possible by UIML, but also to define the underlying information
structures from which these may be generated.
Following the principles of Cognitive Grammar, we concentrate on a model for
reference resolution that attempts to overcome the difficulties previous
approaches, based on the fundamental assumption that all reference (independent
on the type of the referring expression) is accomplished via access to and
restructuring of domains of reference rather than by direct linkage to the
entities themselves.
This paper presents a general xml-based distributed software architecture in
the aim of accessing and sharing resources in an opened client/server
environment. The paper is organized as follows : First, we introduce the idea
of a "General Distributed Software Architecture". Second, we describe the
general framework in which this architecture is used. Third, we describe the
process of information exchange and we introduce some technical issues involved
in the implementation of the proposed architecture.
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3.