Jörg Tiedemann


ParCor 1.0 is a parallel corpus of texts in which pronoun coreference -- reduced coreference in which pronouns are used as referring expressions -- has been annotated. It consists of a collection of parallel English-German documents from two different text genres: TED Talks (transcribed planned speech), and EU Bookshop publications (written text). All documents in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, its antecedent.


A tool for converting PDF files to XML


A collection of tools for processing parallel corpora


Tools for converting and aligning (translated) movie subtitles

Blacklist Classifier

A simple and fast classifier for language discrimination between closely related languages based on word blacklists


A package for word and tree alignment