Overview

Japanese morphological analyzer

Set up

You will need FOMA to compile and run the finite state transducer implementation of our analyzer. With FOMA installed on your $PATH, you can compile the FST by running make.

Using the FST on nouns

Since MeCab produces morphological analysis for nouns, our morphological analyzer rely on segmented noun phrases and part of speech IDs returned by MeCab in the following format:

N#御/30 守り/38$

You can check out scripts/extract-nouns.py for more details of how we extract these noun phrases from MeCab sentence outputs.

You can then pass the above string directly to the FST by for example:

$ echo N#御/30 守り/38$ | flookup -xi nouns.fst | flookup -xi nouns-output.fst | grep -v '^$' 守り+POLITE $

The output will contain at least a line with possible analyses followed by possibly empty lines, which you can safely ignore.

Using the FST on verbs and adjectives

For tokens that MeCab has recognized as verbs and adjectives (check out scripts/extract-terms.py to see how we extract these terms from MeCab output), we can directly pass them through our FST:

$ echo 言っている | flookup verbs.fst 言っている 言う+V+TE+PROG $

Note that verbs.fst works for both verbs and adjectives.

Implementation details

The implementation details can be found in report/report.pdf.

Cite

If you use this software, please cite

@article{sim2014morphological,
  author = {Yanchuan Sim},
  title = {A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives},
  journal = {ArXiv e-prints},
  eprint = {1410.0291},
  primaryClass = "cs.CL",
  volume = {abs/1410.0291},
  year = 2014,
  month = oct,
  url = {http://arxiv.org/abs/1410.0291},
  abstract = {We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit. The source code and tool is available at https://bitbucket.org/skylander/yc-nlplab/},
  code = {https://bitbucket.org/skylander/yc-nlplab/},
}