Clone wiki

feat-morph / Home

feat-morph (LAW)

feat-morph is a tool for manual morphological annotation of corpora.

law.png

  1. Main Window (Word List) - for navigating through words of the document (browsing, filtering, searching, sorting, etc.)

  2. Da Panel - for displaying and disambiguating morphological information (lemmas, tags) of a word. The panel consists of two windows - a filter box (2a) and a list of items (2b). The list of items displays all the lemma-tag items associated with the current word(s) (selected in the main window). The filter box makes it possible to restrict the items to a particular group, e.g., items with a particular lemma, pos or gender.

  3. Context View - displays the text of the document with current word(s) highlighted.

Note: Our plan is to eventually incorporate this tool into feat as a plugin for morphological annotation.

How to

Versions

Development notes

License

The code is published under the MIT License. License text

See this page for a synopsis. In short: You can do whatever you want as long as you include the original copyright. Also, we are not responsible for anything.

We use several libraries withe their own licenses:

Installation

  1. Make sure you have Java Runtime Environment Version 8 (aka 1.8) installed. You can use this online test to determine it.
  2. Download the latest feat version from the Downloads section of this web.
  3. Unpack the file to a directory of your choice.
    • If using MS Windows: Run feat_vert/feat.bat (you can right-click the file and send it as a shortcut to the desktop).
    • If using Linux: Run feat_vert/bin/feat_vert
  4. Run update (Help > Check for Updates) after installation. Note: The updates are unsigned, you can ignore warnings about that.

Input format

Important: The file's extension has to be "vert" and the encoding is hardcoded to be UTF-8. (Yes, it should be user-configurable, see org.purl.jh.law.data.io.VertReader.processLines).

The native format of this tool is so-called vertikala, an sgml format.

Each token is on a separate line followed by lemmas, each lemma followed by tags. Lemmas are preceded by a tab, tag is preceded by a space. Each token line must be within a sentence (see below).

Except token lines, the file can contain sgml tags. Similarly to xml, each opening tag should be paired by a closing tag. Unlike in xml, tags do not have to be nested, so a sequence <a> <b> </a> </b> is ok. Currently we give a special meaning to the following tags:

  • <p> - paragraph
  • <s> - sentence (each sentence must be within a paragraph)

  • ignoring tags (whatever is within these tags, it is not offered for disambiguation)

  • <h> - meta information
  • <str> - page info
  • <e> - foreign text
  • <o> - corrected text

If a paragraph or sentence is missing, it is automatically inserted.

TODO add a sample

Support

Development of this application has been supported by:

The first version was based on feat, a tool for layered error annotation of learner corpora.

Updated