EXMARaLDA (Dulko) is a set of tools for the EXMARaLDA Partitur-Editor with transformation scenarios (actually, XSLT 2.0 stylesheets) for the annotation of learner data in learner corpora, supporting tokenisation, part-of-speech tagging, lemmatisation, sentence-span computation, editing target hypotheses, detection of differences between target hypotheses and the learner text, error analysis, and metadata management (Hirschmann and Nolda (2019), Nolda (2019). It has been developed for the Dulko learner-corpus project at the University of Szeged.

This repository provides the sources of EXMARaLDA (Dulko) as well as a ZIP archive exmaralda-dulko-<VERSION>.zip which contains, in particular, an executable for Microsoft Windows (exmaralda-dulko.exe) as well as start-up scripts for Linux ( and MacOS (exmaralda-dulko.command).

Installation instructions

  1. Unless already installed, install a recent Java runtime environment (JRE) or Java development kit (JDK), e.g. Oracle Java on Microsoft Windows and MacOS, or OpenJDK on Linux. Note that at least version 8 is required.

  2. Unless already installed, install TreeTagger into some directory <DIR1> (e.g into C:\Program Files\TreeTagger on Microsoft Windows, into /opt/treetagger on Linux, or into a directory called tree-tagger on the MacOS Desktop).

    Also download the German parameter file for TreeTagger and uncompress this gzipped file (on Microsoft Windows, 7-Zip can be used for uncompressing gzipped files). Create the directory <DIR1>/lib, copy the uncompressed parameter file there, and rename it to german-utf8.par.

  3. Unless already installed, install the release version of EXMARaLDA (1.6) into some directory <DIR2> (e.g. into C:\Program Files\EXMARaLDA on Microsoft Windows or into /opt/exmaralda on Linux; on MacOS, there should be an application called PartiturEditor in the Applications folder).

  4. Install EXMARaLDA (Dulko) into some directory <DIR3> (e.g. into C:\Program Files\EXMARaLDA (Dulko) on Microsoft Windows, into /opt/exmaralda-dulko on Linux, or onto the Desktop on MacOS).

    On Microsoft Windows, you can use for this task the setup program exmaralda-dulko-<VERSION>-setup.exe, which is included in the ZIP archive. Please note that on this system, <DIR3> must be a sister directory of <DIR2>, which is the setup program’s default.

    On Linux, extract the ZIP archive to <DIR3> and adapt the paths in, if needed.

    On MacOS, simply extract the ZIP archive onto the Desktop. There should be a directory called exmaralda-dulko-<VERSION>.

Configuration instructions

  1. Set the environment variable TREETAGGER_HOME to the <DIR1>.

    On Microsoft Windows, search for SystemPropertiesAdvanced and create a system environment variable with the name TREETAGGER_HOME and <DIR1> as its value.

    On Linux, add export TREETAGGER_HOME=<DIR1> to /etc/profile or $HOME/.profile.

    On MacOS, the environment variable TREETAGGER_HOME is set by the start-up script exmaralda-dulko.command to $HOME/Desktop/tree-tagger by default. This is the location of the directory tree-tagger on the MacOS desktop. If you installed TreeTagger into a different directory <DIR1>, you have to adapt the value of the environment variable in the start-up script accordingly.

  2. Run EXMARaLDA (Dulko).

    On Microsoft Windows, click on the EXMARaLDA (Dulko) icon on the desktop or run the application from the EXMARaLDA submenu in the start menu.

    On Linux, run the start-up script in <DIR3>.

    On MacOS, run the start-up script exmaralda-dulko.command in <DIR3>.

  3. Open the annotation panel (‘View’ > ‘Annotation panel’) and open the file <DIR3>/annotation-panel.xml.

  4. Optionally, open the preferences (‘Edit’ > ‘Preferences’), switch to the ‘Stylesheets’ tab, and set the ‘Transcription to format table’ stylesheet to <DIR3>/format-table.xsl.

Usage instructions

  1. Open <DIR3>/dulko.template.exb in EXMARaLDA (Dulko) (‘File’ > ‘Open’) and save it under a new name (‘File’ > ‘Save as’).1

  2. Open the metainformation dialog (‘Transcription’ > ‘Metainformation’) and edit general metadata.

  3. Open the speakertable (‘Transcription’ > ‘Speakertable’) and edit the speaker metadata.

  4. On the main window, write or paste the learner text into one or several cells of the first tier. You can also first work on a proper part of the learner text (e.g. the first sentence) and add further parts later on.

  5. Apply the transformation scenario ‘Dulko: tok-Spur (Lernertext)’ (‘Transcription’ > ‘Transformation’), which tokenises the learner text and normalises punctuation marks.

  6. If you want to annotate editorial changes by the learner, apply the transformation scenario ‘Dulko: orig-Spur (Lernertext)’, which adds a tier for the original, unchanged, learner text. When editing this tier, you can use the symbols , |, -, and _ for marking paragraph breaks, line breaks, hyphenations, and omissions, respectively.2

  7. Apply the transformation scenario ‘Dulko: S-, pos- und lemma-Spuren (Lernertext)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the learner text.3

  8. If you have added a tier for the original learner text in step 6, apply the transformation scenario ‘Dulko: edit-Spur (Lernertext)’, which detects editorial changes.

  9. Apply the transformation scenario ‘Dulko: trans-Spur (Lernertext)’ in case the learner text is a translation. Write or paste the text translated by the learner into the cells of the new tier.

  10. Apply the transformation scenario ‘Dulko: ZH- und Fehler-Spuren (1. Zielhypothese)’, which adds tiers for a target hypothesis and for error analysis. Edit the target hypothesis, and tag errors by means of the annotation panel.

  11. Apply the transformation scenario ‘Dulko: ZHS-, ZHpos- und ZHlemma-Spuren (1. Zielhypothese)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the target hypothesis.

  12. Finally, apply the transformation scenario ‘Dulko: ZHDiff-Spur (1. Zielhypothese)’, which detects differences between the target hypothesis and the learner text.4

In order to annotate further target hypotheses, apply the transformation scenarios for ‘2. Zielhypothese’, ‘3. Zielhypothese’, or ‘weitere Zielhypothese’. These transformation scenarios do not operate on the learner text but on the preceding target hypothesis.

Note that you can re-apply any of the above transformation scenarios in case you want to update the corresponding tiers, e.g. in order to revise the annotations or annotate further parts of the learner text.5

If required, additional timeline items can be inserted by clicking on the next timeline item and choosing ‘Timeline’ > ‘Insert timeline item’. The transformation scenario ‘Dulko: Zeitachse’, in turn, removes unused timeline items.

Apply the transformation scenario ‘Dulko: HTML-Version’ for exporting the table sentence-wise into a HTML file, which can be viewed and printed by means of your favourite browser.

Run ‘Transcription’ > ‘Export segmented transcription’ for exporting the table to an EXS file, which can be used in COMA and EXAKT.6

Apply the transformation scenarios ‘Dulko: ANNIS-kompatible Version’ and ‘Dulko: Pepper-kompatible Metadaten-Liste’ before exporting the final EXMARaLDA file to ANNIS via Pepper. The former transformation scenario deletes redundant annotations and adds namespace prefixes like ZH1 and ZH2 to the target-hypothesis and error tiers; those namespace prefixes are needed for properly ordering the tiers in ANNIS. The latter transformation scenario outputs an attribute-value list with corpus-level metadata for Pepper (cf. Pepper’s customisation property pepper.before.readMeta).

Andreas Nolda (

  1. Alternatively, you may start from a blank table (‘File’ > ‘New’). Metadata can be imported from <DIR3>/dulko.template.exb by applying the transformation scenario ‘Dulko: Metadaten’. 

  2. In order to mark a hyphenation in the learner text, the corresponding word on the tier for the original learner text has to be split into three events consisting of the first part of the word, the symbol -, and the second part of the word, respectively. 

  3. The stylesheets for sentence-span tiers (Satzspannen) automatically identifies sentence spans ending in a punctuation character that TreeTagger tags as $. or ending in an abbreviation followed by a capitalised version of a non-noun. Sentence spans with different endings have to be tagged manually by splitting the corresponding sentence-span event inserted by the stylesheet; the sentence-span names can then be regenerated with the transformation scenario ‘Dulko: Satzspannen’. 

  4. The stylesheet for difference tiers (Differenz-Spuren) tries hard to detect movement source and target pairs, which are tagged with MOV[EMENT]S[OURCE] and MOV[EMENT]T[ARGET], respectively. If unsure, it tags potential movement sources and targets with the tags MOVS/DEL and MOVT/INS, which have to be manually disambiguated (e.g. by means of the annotation panel). 

  5. The only exception is the transformation scenario ‘Dulko: ZH- und Fehler-Spuren (weitere Zielhypothese)’, which always creates new tiers. 

  6. In EXMARaLDA (Dulko), this menu entry runs the XSLT stylesheet exb2exs.xsl on the current EXB file.