From annotation to learners’ corpora

Neven Jovanović, Department of Classical Philology, Faculty of Humanities and Social Sciences, University of Zagreb

Baška, Croatia, June 2017


A presentation for the Linguistic Annotation and Philology Workshop, July 6-7, 2017, Leipzig.


It is usually said that 10,000 hours of practice are needed to achieve mastery in a field. How to do this for historical languages, where contact with teachers is necessarily limited? A possible means of support are computer-generated (and assessed) exercises, which will help the student learn, recognize, and produce words, phrases, parts of sentences or even whole sentences, practicing briefly, but often, and even in situations when they would usually be in "idle speed" (while commuting etc). Such exercises are part of standard learning environments, for example Moodle; in these environments, reporting on user activity is also well supported. The exercise modules, however, seem to expect activities to be created primarily "by hand", to be put together by teachers. Treebanks, vocabulary lists, and similar collections of linguistic annotations offer possibility to create a large number of exercises from authentic (not made-up) language automatically, by retrieving necessary linguistic material from the collections and then transforming it into the format required for import into the learning environment (for example, Moodle Questions XML); the task of the teacher is then simply to select a set of questions for an activity. Such re-use of linguistic annotations will be illustrated on the example of existing Greek and Latin treebanks (PROIEL, Perseus DL, Late Latin Charters Treebank) and word frequency lists (Dickinson College Core Vocabulary). It will be shown as well that, by serving as source for exercises, collections of linguistic annotations easily and naturally connect research and teaching.


The slides for the presentation are made with the reveal.js HTML presentation framework.