D4 (Methods for interactive linguistic corpus analysis of information structure)
Large Parallel Corpus of Cleft Constructions
- modality: Written, partly translated. Parallel - sentence aligned.
- formats: German-Dutch accessible through CQP interface Data and queries accessible in Prolog format
- source data: Retokenization of Europarl v3. Cleft(-like) constructions automatically identified as described in Bouma, Gerlof, Lilja Øvrelid & Jonas Kuhn. 2010. Towards a Large Parallel Corpus of Cleft Constructions. Proceedings of LREC 2010.
- languages: Dutch (nl), German (de), English (en), Swedish (sv).
- subcorpora / versions:
- Four language strata: up to 1.5M sentences in each, divided over 11 years of European parliament minutes (1996-2006).