feat-morph is a tool for manual morphological annotation of corpora.
Main Window (Word List) - for navigating through words of the document (browsing, filtering, searching, sorting, etc.)
Da Panel - for displaying and disambiguating morphological information (lemmas, tags) of a word. The panel consists of two windows - a filter box (2a) and a list of items (2b). The list of items displays all the lemma-tag items associated with the current word(s) (selected in the main window). The filter box makes it possible to restrict the items to a particular group, e.g., items with a particular lemma, pos or gender.
Context View - displays the text of the document with current word(s) highlighted.
Note: Our plan is to eventually incorporate this tool into feat as a plugin for morphological annotation.
See this page for a synopsis. In short: You can do whatever you want as long as you include the original copyright. Also, we are not responsible for anything.
We use several libraries withe their own licenses:
- commons-io - Apache 2 license
- glazedlists - LGPL License and MPL License
- guava - Apache 2 license
- jdom - Apache style license
- Make sure you have Java Runtime Environment Version 8 (aka 1.8) installed. You can use this online test to determine it.
- Download the latest feat version from the Downloads section of this web.
- Unpack the file to a directory of your choice.
- If using MS Windows: Run feat_vert/feat.bat (you can right-click the file and send it as a shortcut to the desktop).
- If using Linux: Run feat_vert/bin/feat_vert
- Run update (Help > Check for Updates) after installation. Note: The updates are unsigned, you can ignore warnings about that.
Important: The file's extension has to be "vert" and the encoding is hardcoded to be UTF-8. (Yes, it should be user-configurable, see org.purl.jh.law.data.io.VertReader.processLines).
The native format of this tool is so-called vertikala, an sgml format.
Each token is on a separate line followed by lemmas, each lemma followed by tags. Lemmas are preceded by a tab, tag is preceded by a space. Each token line must be within a sentence (see below).
Except token lines, the file can contain sgml tags. Similarly to xml, each opening tag should be paired by a closing tag. Unlike in xml, tags do not have to be nested, so a sequence <a> <b> </a> </b> is ok. Currently we give a special meaning to the following tags:
- <p> - paragraph
<s> - sentence (each sentence must be within a paragraph)
ignoring tags (whatever is within these tags, it is not offered for disambiguation)
- <h> - meta information
- <str> - page info
- <e> - foreign text
- <o> - corrected text
If a paragraph or sentence is missing, it is automatically inserted.
TODO add a sample
Development of this application has been supported by:
The first version was based on feat, a tool for layered error annotation of learner corpora.