1. Michael Granger
  2. Linguistics


Linguistics / README

= Linguistics

== Authors

* Michael Granger <ged@FaerieMUD.org>
* Martin Chase <stillflame@FaerieMUD.org>

== Requirements

* Ruby >= 1.8.6

== Optional

* Ruby-WordNet (>= 0.0.5) - adds integration for the Ruby binding for the
  WordNetŽ lexical refrence system.

  URL: http://deveiate.org/projects/Ruby-WordNet

* LinkParser (>= 1.0.5)

  URL: http://deveiate.org/projects/Ruby-LinkParser

== General Information

Linguistics is a framework for building linguistic utilities for Ruby objects
in any language. It includes a generic language-independant front end, a
module for mapping language codes into language names, and a module which
contains various English-language utilities.

=== Method Interface

The Linguistics module comes with a language-independant mechanism for
extending core Ruby classes with linguistic methods.

It consists of three parts: a core linguistics module which contains the
class-extension framework for languages, a generic inflector class that serves
as a delegator for linguistic methods on Ruby objects, and one or more
language-specific modules which contain the actual linguistic functions.

The module works by adding a single instance method for each language named
after the language's two-letter code (or three-letter code, if no two-letter
code is defined by ISO639) to various Ruby classes. This allows many
language-specific methods to be added to objects without cluttering up the
interface or risking collision between them, albeit at the cost of three or four
more characters per method invocation.

If you don't like extending core Ruby classes, the language modules should
also allow you to use them as a function library as well.

For example, the English-language module contains a #plural function which can
be accessed via a method on a core class:

  Linguistics::use( :en )
  # => "geese"
or via the Linguistics::EN::plural function directly:

  include Linguistics::EN
  plural( "goose" )
  # => "geese"

The class-extension mechanism actually uses the functional interface behind
the scenes.

A new feature with the 0.02 release: You can now omit the language-code method
for unambiguous methods by calling Linguistics::use with the +:installProxy+
configuration key, with the language code of the language module whose methods
you wish to be available. For example, instead of having to call:


from the example above, you can now do this:

  Lingusitics::use( :en, :installProxy => :en )
  # => "geese"

More about how this works in the documentation for Linguistics::use.

==== Adding Language Modules

To add a new language to the framework, create a file named the same as the
ISO639 2- or 3-letter language code for the language you're adding. It must be
placed under lib/linguistics/ to be recognized by the linguistics module, but
you can also just require it yourself prior to calling Linguistics::use().
This file should define a module under Linguistics that is an all-caps version
of the code used in the filename. Any methods you wish to be exposed to users
should be declared as module functions (ie., using Module#module_function).

You may also wish to add your module to the list of default languages by
adding the appropriate symbol to the Linguistics::DefaultLanguages array.

For example, to create a Portuguese-language module, create a file called
'lib/linguistics/pt.rb' which contains the following:

  module Linguistics
    module PT
      Linguistics::DefaultLanguages << :pt

	  <language methods here>

See the English language module (lib/linguistics/en.rb) for an example.

=== English Language Module

See the README.english file for a synopsis.

The English-language module currently contains linguistic functions ported
from a few excellent Perl modules:


See the lib/linguistics/en.rb file for specific attributions.

New with version 0.02: integration with the Ruby WordNetŽ and LinkParser
modules (which must be installed separately).

== To Do
* I am planning on improving the results from the infinitive functions, which
  currently return useful results only part of the time. Investigations into
  additional stemming functions and some other strategies are ongoing.

* Martin Chase <stillflame at FaerieMUD dot org> is working on an integration
  module for his excellent work on a Ruby interface to the CMU Link Grammar
  (an english-sentence parser). This will make writing fairly accurate natural
  language parsers in Ruby much easier.

* Suggestions (and patches) for any of these items or additional features are

== Legal

This module is Open Source Software which is Copyright (c) 2003 by The
FaerieMUD Consortium. All rights reserved.

You may use, modify, and/or redistribute this software under the terms of the
Perl Artistic License, a copy of which should have been included in this
distribution (See the file Artistic). If it was not, a copy of it may be
obtained from http://language.perl.com/misc/Artistic.html or