Source

Linguistics / README.english

Full commit

= English Ruby Linguistics Module - Synopsis

This is an overview of the functionality currently in the English functions of
the Ruby Linguistics module as of version 0.02:


== Pluralization

  require 'linguistics'
  Linguistics::use( :en ) # extends Array, String, and Numeric

  "box".en.plural
  # => "boxes"

  "mouse".en.plural
  # => "mice"

  "ruby".en.plural
  # => "rubies"


== Indefinite Articles

  "book".en.a
  # => "a book"

  "article".en.a
  # => "an article"


== Present Participles

  "runs".en.present_participle
  # => "running"

  "eats".en.present_participle
  # => "eating"

  "spies".en.present_participle
  # => "spying"


== Ordinal Numbers

  5.en.ordinal
  # => "5th"

  2004.en.ordinal
  # => "2004th"


== Numbers to Words

  5.en.numwords
  # => "five"

  2004.en.numwords
  # => "two thousand and four"

  2385762345876.en.numwords
  # => "two trillion, three hundred and eighty-five billion,
  seven hundred and sixty-two million, three hundred and
  forty-five thousand, eight hundred and seventy-six"


== Quantification

  "cow".en.quantify( 5 )
  # => "several cows"

  "cow".en.quantify( 1005 )
  # => "thousands of cows"

  "cow".en.quantify( 20_432_123_000_000 )
  # => "tens of trillions of cows"


== Conjunctions

  animals = %w{dog cow ox chicken goose goat cow dog rooster llama 
  pig goat dog cat cat dog cow goat goose goose ox alpaca}
  puts "The farm has: " + animals.en.conjunction

  # => The farm has: four dogs, three cows, three geese, three goats,
  two oxen, two cats, a chicken, a rooster, a llama, a pig, 
  and an alpaca

Note that 'goose' and 'ox' are both correctly pluralized, and the correct
indefinite article 'an' has been used for 'alpaca'.

You can also use the generalization function of the #quantify method to give
general descriptions of object lists instead of literal counts:

  allobjs = []
  ObjectSpace::each_object {|obj| allobjs << obj.class.name}

  puts "The current Ruby objectspace contains: " +
    allobjs.en.conjunction( :generalize => true )

which will print something like:

  The current Ruby objectspace contains: thousands of Strings,
  thousands of Arrays, hundreds of Hashes, hundreds of
  Classes, many Regexps, a number of Ranges, a number of
  Modules, several Floats, several Procs, several MatchDatas,
  several Objects, several IOS, several Files, a Binding, a
  NoMemoryError, a SystemStackError, a fatal, a ThreadGroup,
  and a Thread


== Infinitives

New in version 0.02:

  "leaving".en.infinitive
  # => "leave"

  "left".en.infinitive
  # => "leave"

  "leaving".en.infinitive.suffix
  # => "ing"


== WordNet Integration

Also new in version 0.02, if you have the Ruby-WordNet module installed, you can
look up WordNet synsets using the Linguistics interface:

  # Test to be sure the WordNet module loaded okay.
  Linguistics::EN.has_wordnet?
  # => true

  # Fetch the default synset for the word "balance"
  "balance".synset
  # => #<WordNet::Synset:0x40376844 balance (noun): "a state of equilibrium"
   (derivations: 3, antonyms: 1, hypernyms: 1, hyponyms: 3)>

  # Fetch the synset for the first verb sense of "balance"
  "balance".en.synset( :verb )
  # => #<WordNet::Synset:0x4033f448 balance, equilibrate, equilibrize, equilibrise
  (verb): "bring into balance or equilibrium; "She has to balance work and her
  domestic duties"; "balance the two weights"" (derivations: 7, antonyms: 1,
  verbGroups: 2, hypernyms: 1, hyponyms: 5)>

  # Fetch the second noun sense
  "balance".en.synset( 2, :noun )
  # => #<WordNet::Synset:0x404ebb24 balance (noun): "a scale for weighing; depends
  on pull of gravity" (hypernyms: 1, hyponyms: 5)>

  # Fetch the second noun sense's hypernyms (more-general words, like a superclass)
  "balance".en.synset( 2, :noun ).hypernyms
  # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
  instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
  hyponyms: 2)>]

  # A simpler way of doing the same thing:
  "balance".en.hypernyms( 2, :noun )
  # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring
  instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1,
  hyponyms: 2)>]

  # Fetch the first hypernym's hypernyms
  "balance".en.synset( 2, :noun ).hypernyms.first.hypernyms
  # => [#<WordNet::Synset:0x404c60b8 measuring instrument, measuring system,
  measuring device (noun): "instrument that shows the extent or amount or quantity
  or degree of something" (hypernyms: 1, hyponyms: 83)>]

  # Find the synset to which both the second noun sense of "balance" and the
  # default sense of "shovel" belong.
  ("balance".en.synset( 2, :noun ) | "shovel".en.synset)
  # => #<WordNet::Synset:0x40473da4 instrumentality, instrumentation (noun): "an
  artifact (or system of artifacts) that is instrumental in accomplishing some
  end" (derivations: 1, hypernyms: 1, hyponyms: 13)>

  # Fetch just the words for the other kinds of "instruments"
  "instrument".en.hyponyms.collect {|synset| synset.words}.flatten
  # => ["analyzer", "analyser", "cautery", "cauterant", "drafting instrument",
  "extractor", "instrument of execution", "instrument of punishment", "measuring
  instrument", "measuring system", "measuring device", "medical instrument",
  "navigational instrument", "optical instrument", "plotter", "scientific
  instrument", "sonograph", "surveying instrument", "surveyor's instrument",
  "tracer", "weapon", "arm", "weapon system", "whip"]

There are many more WordNet methods supported  too many to list here. See the
documentation for the complete list.


== LinkParser Integration

Another new feature in version 0.02 is integration with the Ruby version of the
CMU Link Grammar Parser by Martin Chase. If you have the LinkParser module
installed, you can create linkages from English sentences that let you query for
parts of speech:

  # Test to see whether or not the link parser is loaded.
  Linguistics::EN.has_link_parser?
  # => true

  # Diagram the first linkage for a test sentence
  puts "he is a big dog".sentence.linkages.first.to_s
	  +---O*---+ 
	  | +--Ds--+ 
   +Ss+ |  +-A-+ 
   |  | |  |   | 
  he is a big dog

  # Find the verb in the sentence
  "he is a big dog".en.sentence.verb.to_s      
  # => "is"

  # Combined infinitive + LinkParser: Find the infinitive form of the verb of the
  given sentence.
  "he is a big dog".en.sentence.verb.infinitive
  # => "be"

  # Find the direct object of the sentence
  "he is a big dog".en.sentence.object.to_s
  # => "dog"

  # Look at the raw LinkParser::Word for the direct object of the sentence.
  "he is a big dog".en.sentence.object     
  # => #<LinkParser::Word:0x403da0a0 @definition=[[{@A-}, Ds-, {@M+}, J-], [{@A-},
  Ds-, {@M+}, Os-], [{@A-}, Ds-, {@M+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-, {@M+},
  Ss+, R-], [{@A-}, Ds-, {@M+}, SIs-], [{@A-}, Ds-, {R+}, {Bs+}, J-], [{@A-}, Ds-,
  {R+}, {Bs+}, Os-], [{@A-}, Ds-, {R+}, {Bs+}, Ss+, {@CO-}, {C-}], [{@A-}, Ds-,
  {R+}, {Bs+}, Ss+, R-], [{@A-}, Ds-, {R+}, {Bs+}, SIs-]], @right=[], @suffix="",
  @left=[#<LinkParser::Connection:0x403da028 @rword=#<LinkParser::Word:0x403da0a0
  ...>, @lword=#<LinkParser::Word:0x403da0b4 @definition=[[Ss-, O+, {@MV+}], [Ss-,
  B-, {@MV+}], [Ss-, P+], [Ss-, AF-], [RS-, Bs-, O+, {@MV+}], [RS-, Bs-, B-,
  {@MV+}], [RS-, Bs-, P+], [RS-, Bs-, AF-], [{Q-}, SIs+, O+, {@MV+}], [{Q-}, SIs+,
  B-, {@MV+}], [{Q-}, SIs+, P+], [{Q-}, SIs+, AF-]],
  @right=[#<LinkParser::Connection:0x403da028 ...>], @suffix="", @left=[],
  @name="is", @position=1>, @subName="*", @name="O", @length=3>], @name="dog",
  @position=4>

  # Combine WordNet + LinkParser to find the definition of the direct object of
  # the sentence
  "he is a big dog".en.sentence.object.gloss
  # => "a member of the genus Canis (probably descended from the common wolf) that
  has been domesticated by man since prehistoric times; occurs in many breeds;
  \"the dog barked all night\""