feat-morph (LAW)

feat-morph is a tool for manual morphological annotation of corpora.

Main Window (Word List) - for navigating through words of the document (browsing, filtering, searching, sorting, etc.)
Da Panel - for displaying and disambiguating morphological information (lemmas, tags) of a word. The panel consists of two windows - a filter box (2a) and a list of items (2b). The list of items displays all the lemma-tag items associated with the current word(s) (selected in the main window). The filter box makes it possible to restrict the items to a particular group, e.g., items with a particular lemma, pos or gender.
Context View - displays the text of the document with current word(s) highlighted.

Note: Our plan is to eventually incorporate this tool into feat as a plugin for morphological annotation.

License

The code is published under the MIT License. License text

See this page for a synopsis. In short: You can do whatever you want as long as you include the original copyright. Also, we are not responsible for anything.

We use several libraries with their own licenses:

commons-io - Apache 2 license
glazedlists - LGPL License and MPL License
guava - Apache 2 license
jdom - Apache style license

Installation

Make sure you have Java Runtime Environment Version 8 (aka 1.8) installed. You can use this online test to determine it.
Download the latest feat version from the Downloads section of this web.
Unpack the file to a directory of your choice.
- If using MS Windows: Run feat_vert/feat.bat (you can right-click the file and send it as a shortcut to the desktop).
- If using Linux: Run feat_vert/bin/feat_vert
Run update (Help > Check for Updates) after installation. Note: The updates are unsigned; you can ignore warnings about that.

Input format

Important: The file's extension has to be "vert" and the encoding is hardcoded to be UTF-8. (Yes, it should be user-configurable; see org.purl.jh.law.data.io.VertReader.processLines).

The native format of this tool is the so-called vertikala, an SGML format.

Each token is on a separate line followed by lemmas, each lemma followed by tags. Lemmas are preceded by a tab, tag is preceded by a space. Each token line must be within a sentence (see below).

Except for token lines, the file can contain sgml tags. Similarly to XML, each opening tag should be paired with a closing tag. Unlike in XML, tags do not have to be nested, so a sequence is ok. Currently, we give a special meaning to the following tags:

<p> - paragraph
<s> - sentence (each sentence must be within a paragraph)
ignoring tags (whatever is within these tags it is not offered for disambiguation)
<h> - meta information
<str> - page info
<e> - foreign text
<o> - corrected text

If a paragraph or sentence is missing, it is automatically inserted.

TODO add a sample

Support

Development of this application has been supported by:

Grant LM2011023 - Czech National Corpus by the Czech Ministry of education
Geneea

The first version was based on feat, a tool for layered error annotation of learner corpora.

Wiki

feat-morph / Home

feat-morph (LAW)

License

Installation

Input format

Support