D2 (Typology of Information structure)

This data base was elicited with the Questionnaire for Information Structure (see here). It contains data for the investigation of information structure from a typological perspective.


  • modality: The speech data consists of spoken language material (monologues, dialogues, question-answer pairs etc.) which directly addresses the different specific topics of the information structure.
  • formats: The data base itself is a collection of speech data (wav-sound files), their appropriate annotations (xml-annotation files) and related metadata (PDF-files) from various different languages that were acquired from 2003 to 2007.
  • languages: Egyptian Arabic, Mandarin Chinese, Dutch, American English, French, Quebecois, German, Konkani, Georgian, Greek, Hungarian, Mawng, Niue, Northern Sotho, Prinmi, Teribe, Yucatec Maya
  • subcorpora / versions:
    • 100 sentences: Additionally there is data elicited for a research called "100 sentences" which contains detailed annotations of a range of information structure categories. This data is transcribed, and manually annotated at the following levels: translation to English, phonetic transcription (SAMPA), stress, accent, syntactic phrases, phonetic tones, intonational tones, morphological analysis, glosses, parts of speech, grammatical function, semantic roles, information status, topic, focus and available in PAULA. "100 sentences" was conducted for: Georgian,German,Prinmi,Teribe.