Project D1 (Lüdeling, Stede)

Linguistic Database for Information Structure: Annotation and Retrieval


The goals of Project D1 are:
1. Creating linguistic corpora with information structural annotations, making these available to researchers and evaluating them.
2. Further development of software infrastructure for corpus search and data sustainability.
3. Consulting and integration of data from empirically oriented projects within Collaborative Research Centre 632 on Information Structure.
In this phase, the project will concentrate on the implementation of several remaining search and visualization possibilities, as well as the processing of larger (semi?)automatically annotated data and the creation of automatic annotation tools relevant to studies of information structure. These tools should enable both the acquisition of new data and the further annotation of existing data from other projects.

The annotated data made available through our database will be analyzed quantitatively with regard to the interaction between underlying features which are responsible for the distinction of language specific information structural categories. The goal here is to empirically describe the correlations between annotation levels which can be based on gradual and discrete categories.

In a complementary effort to the direct annotation of information structure categories practiced in phase 2, we intend to make use of more superficial features which influence the assignment of IS categories. Our target is to concentrate on reliably operationalizable categories such as definiteness, discourse-newness, coreference, animacy, and topological fields (for German). As part of this effort, we will build a robust parser for topological fields in different varieties of German and part-of-speech taggers for various languages, which should dramatically improve the state of the data available to several research projects within the Centre.

