Sustainability of Linguistic Data (Co-Project)

In the Collaborative Research Centers 441, 538 and 632, electronic collections of linguistic data are created that are used to investigate linguistic questions. These empirical resources are of great benefit to linguistic and philological research beyond the respective SFBs. The aim of project C2, which is affiliated with the CRC 441, is to create the prerequisites for the sustained general availability of this data even after the completion of the CRC.

The data available in the three participating SFBs are characterized by a high degree of heterogeneity. There is already a significant diversity of resources within the individual SFBs. If one looks at the data collections of all three SFBs, this diversity becomes even more pronounced. Overall, the resources cover a wide range of central data types and typical data (written and spoken language; synchronous and diachronic data; hierarchical and timeline-based annotations on different levels; lexical resources and other secondary data, etc.).

The goal of sustainable data availability is associated with fundamental challenges that are exemplary for the sustainability of linguistic data collections as a whole. Project C2 aims to develop generic solutions that can be transferred to other linguistic data collections. A general infrastructural framework is to be developed that is open to linguistic resources and compatible with other sustainability initiatives.

Additional information can be found here