Corpora & empirical data

Access to corpora after project completion:

The annotated corpus on multiple fronting in German (project A6) is hosted at IDS Mannheim and is not freely available, due to copy­right re­stric­tions. Corpus queries are possible via a web interface. For more information see: Korpus zur mehrfachen Vorfeldbesetzung and CLARIN ISD-Mannheim

The corpora of the B projects are freely accessible via the server of the linguistic database ANNIS3 developed in the SFB: ANNIS 3. This includes:

  • corpora of Gur and Kwa languages (project B1, Aja, Fon, Foodo, Yom), of Chadic languages (projects A5 and B2, Hausa, Bura, Guruntum, Tangale, Marghi), and of Wolof (project B7)
  • corpora of Old High German and Old Saxon (project B4)

In addition, some of the corpora are available at the CLARIN centre HZSK (The Hamburg Center for Language Corpora): here

Automatically searchable corpus data compiled by project B6 can be found here: KiezDeutsch-Korpus, they include:

  • Recordings of spontaneous speech in informal peer group conversations (including transcription, multi-level annotation, meta data, and, where necessary, translation)
  • Spontaneous texts with data on attitudes and language ideologies in the public debate (on-line comments, emails) with metadata.
  • Spontaneous, written language productions in public space from multilingual urban settings, with meta data and tags
  • The Potsdam Commentary Corpus (from project D1) is freely available here
  • It is additionally available via the CLARIN centre Tübingen here
  • The multimedia teaching material created by project T1 for kindergartens, primary and secondary schools, and teacher training can be found here

Some data were published as volumes in Interdisciplinary Studies in Information Structure (ISIS):

  • Linguistic Fieldnotes I: Information Structure in different African Languages: Data from Akan (Project D2) and Ngizim (Project A5)
  • Linguistic Fieldnotes II: Information structure in different variants of written German: written German (Project B6)
  • Linguistic Fieldnotes III: Information Structure in Gur and Kwa Languages: Data from Buli, Kɔnni, Baatɔnum, Yom, Aja, Anii and Foodo (Projects B1, B7, D2)

Annotated information-structural data from 17 typologically unrelated languages, collected in the project D2, are available on this page via the following link: