D4 (Methoden zur interaktiven linguistischen Korpusanalyse von Informationsstruktur)

contact: Jonas Kuhn Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein! , Gerlof Bouma Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!

Large Parallel Corpus of Cleft Constructions

modality:

Written, partly translated. Parallel - sentence aligned.

formats:

German-Dutch accessible through CQP interface at link.

Data and queries accessible in Prolog format from link.

source data:

Retokenization of Europarl v3. Cleft(-like) constructions automatically identified as described in Bouma, Gerlof, Lilja Øvrelid & Jonas Kuhn. 2010. Towards a Large Parallel Corpus of Cleft Constructions. Proceedings of LREC 2010.

languages:

Dutch (nl), German (de), English (en), Swedish (sv).

subcorpora / versions:

Four language strata, up to 1.5M sentences in each, divided over 11 years of European parliament minutes (1996-2006).