Working with Heterogeneous Annotations

In the last 15 years, the heterogeneity of linguistic annotations has been identified as a key problem in NLP and corpus linguistics. Data produced by different tools for automatic or manual annotation comes in different, often conceptually incompatible formats, and even if these formats can be aligned, the different annotation schemes applied by different tools limit the interoperability and reusability of NLP tools and linguistic data collections.

I've worked on both problems in two projects funded by the German Research Foundation (DFG):

Discourse and Generation

My interest in heterogeneous annotations originally arose from my specific interest in discourse phenomena, e.g., anaphora, discourse structure, information structure. The empirical study and the computational modelling of discourse phenomena require the consideration of multiple, heterogeneous annotations, and therefore motivated my interest in these.

I am particularly interested in potential overlaps between functional/cognitive linguistics and artificial intelligence/computational linguistics, whose synergies have -- in my view -- not yet been explored too deeply explored and therefore represent an potential source of novel approaches to discourse and NLP.