Commits

Michael Granger  committed b6c59a7

Documentation update, refactored Lexicon#initialize.

- Added/updated docs for WordNet::Lexicon and WordNet::Synset
- Refactored Lexicon#initialize into multiple methods to
reduce method complexity.

  • Participants
  • Parent commits 10a3c01

Comments (0)

Files changed (9)

File History.rdoc

-== v1.0.0 [2012-01-30] Michael Granger <ged@FaerieMUD.org>
+== v1.0.0 [2012-08-22] Michael Granger <ged@FaerieMUD.org>
 
 Converted to use Sequel and wnsql.
 
 
 == Description
 
-This library is a Ruby interface to WordNet®. WordNet® is an online
-lexical reference system whose design is inspired by current
-psycholinguistic theories of human lexical memory. English nouns, verbs,
-adjectives and adverbs are organized into synonym sets, each
+This library is a Ruby interface to WordNet®[http://wordnet.princeton.edu/].
+WordNet® is an online lexical reference system whose design is inspired
+by current psycholinguistic theories of human lexical memory. English
+nouns, verbs, adjectives and adverbs are organized into synonym sets, each
 representing one underlying lexical concept. Different relations link
 the synonym sets.
 
-It uses WordNet-SQL, which is a conversion of the lexicon flatfiles into
-a relational database format. You can either install the 'wordnet-
-defaultdb' gem, which packages up the SQLite3 version of WordNet-SQL, or
-install your own and point the lexicon at it by passing a Sequel URL to
-the constructor.
+This library uses WordNet-SQL[http://wnsql.sourceforge.net/], which is a
+conversion of the lexicon flatfiles into a relational database format. You
+can either install the 'wordnet-defaultdb' gem, which packages up the
+SQLite3 version of WordNet-SQL, or install your own and point the lexicon
+at it by passing 
+{Sequel connection parameters}[http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html]
+to the constructor.
 
+=== Usage
+
+There are three major parts to this library:
+
+[WordNet::Lexicon]   the interface to the dictionary, used to connect to the
+                     database and look up Words and Synsets.
+[WordNet::Word]      the English word entries in the Lexicon that are mapped
+                     to Synsets via one or more Senses.
+[WordNet::Synset]    the main artifact of WordNet: a "synonym set". These
+                     are connected to one or more Words through a Sense,
+                     and are connected to each other via SemanticLinks.
+
+The other object classes exist mostly as a way of representing relationships
+between the main three:
+
+[WordNet::Sense]         represents a link between one or more Words and
+                         one or more Synsets for one meaning of the word.
+[WordNet::SemanticLink]  represents a link between Synsets
+[WordNet::LexicalLink]   represents a link between Words in Synsets
+[WordNet::Morph]         an interface to a lookup table of irregular word
+                         forms mapped to their base form (lemma)
+
+The last class (WordNet::Model) is the abstract superclass for all the others,
+and inherits most of its functionality from Sequel::Model, the ORM layer
+of the Sequel toolkit. It's mostly just a container for the database
+connection, with some convenience methods to allow the database connection
+to be deferred until runtime instead of when the library loads.
+
+The library also comes with the beginnings of support for the SUMO-WordNet
+mapping:
+
+[WordNet::SumoTerm]      {Suggested Upper Merged Ontology}[http://www.ontologyportal.org/]
+                         terms, with associations back to related Synsets.
+
+This is only supported by a subset of the WordNetSQL databases, and there
+is a fair amount of work left to be done before it's really functional. Drop
+me a note if you're interested in working on this.
 
 
 == Requirements
 
-* Ruby >= 1.9.2
-* Sequel >= 3.29.0
+* Ruby >= 1.9.3
+* Sequel >= 3.38.0
 
 
 == Authors

File lib/wordnet.rb

 	REVISION = %q$Revision: $
 
 	# Abort if not >=1.9.2
-	abort "This version of WordNet requires Ruby 1.9.2 or greater." unless
-		RUBY_VERSION >= '1.9.2'
+	abort "This version of WordNet requires Ruby 1.9.3 or greater." unless
+		RUBY_VERSION >= '1.9.3'
 
 
 	### Lexicon exception - something has gone wrong in the internals of the

File lib/wordnet/lexicallink.rb

 
 	set_primary_key [:word1id, :synset1id, :word2id, :synset2id, :linkid]
 
+	##
+	# The WordNet::Sense the link is pointing *from*.
 	many_to_one :origin,
 		:class       => :"WordNet::Sense",
 		:key         => :synset1id,
 		:primary_key => :synsetid
 
+	##
+	# The WordNet::Synset the link is pointing *to*.
 	one_to_many :target,
 		:class       => :"WordNet::Synset",
 		:key         => :synsetid,

File lib/wordnet/lexicon.rb

 require 'wordnet/word'
 
 
-# WordNet lexicon class - abstracts access to the WordNet lexical
-# database, and provides factory methods for looking up words and synsets.
+# WordNet lexicon class - provides access to the WordNet lexical
+# database, and provides factory methods for looking up words[rdoc-ref:WordNet::Word]
+# and synsets[rdoc-ref:WordNet::Synset].
+#
+# == Creating a Lexicon
+#
+# To create a Lexicon, either point it at a database using [Sequel database connection
+# criteria]{http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html}:
+#
+#     lex = WordNet::Lexicon.new( 'postgres://localhost/wordnet30' )
+#     # => #<WordNet::Lexicon:0x7fd192a76668 postgres://localhost/wordnet30>
+#
+#     # Another way of doing the same thing:
+#     lex = WordNet::Lexicon.new( adapter: 'postgres', database: 'wordnet30', host: 'localhost' )
+#     # => #<WordNet::Lexicon:0x7fd192d374b0 postgres>
+#
+# Alternatively, if you have the 'wordnet-defaultdb' gem (which includes an
+# embedded copy of the SQLite WordNET-SQL database) installed, just call ::new
+# without any arguments:
+#
+#     lex = WordNet::Lexicon.new
+#     # => #<WordNet::Lexicon:0x7fdbfac1a358 sqlite:[...]/gems/wordnet-defaultdb-1.0.1
+#     #     /data/wordnet-defaultdb/wordnet30.sqlite>
+#
+# == Looking Up Synsets
+#
+# Once you have a Lexicon created, the main lookup method for Synsets is
+# #[], which will return the first of any Synsets that are found:
+#
+#    synset = lex[ :language ]
+#    # => #<WordNet::Synset:0x7fdbfaa987a0 {105650820} 'language, speech' (noun):
+#    #      [noun.cognition] the mental faculty or power of vocal communication>
+#
+# If you want to look up *all* matching Synsets, use the #lookup_synsets
+# method:
+#
+#    synsets = lex.lookup_synsets( :language )
+#    # => [#<WordNet::Synset:0x7fdbfaac46c0 {105650820} 'language, speech' (noun):
+#    #       [noun.cognition] the mental faculty or power of vocal
+#    #       communication>,
+#    #     #<WordNet::Synset:0x7fdbfaac45a8 {105808557} 'language, linguistic process'
+#    #       (noun): [noun.cognition] the cognitive processes involved
+#    #       in producing and understanding linguistic communication>,
+#    #     #<WordNet::Synset:0x7fdbfaac4490 {106282651} 'language, linguistic
+#    #       communication' (noun): [noun.communication] a systematic means of
+#    #       communicating by the use of sounds or conventional symbols>,
+#    #     #<WordNet::Synset:0x7fdbfaac4378 {106304059} 'language, nomenclature,
+#    #       terminology' (noun): [noun.communication] a system of words used to
+#    #       name things in a particular discipline>,
+#    #     #<WordNet::Synset:0x7fdbfaac4260 {107051975} 'language, lyric, words'
+#    #       (noun): [noun.communication] the text of a popular song or musical-comedy
+#    #       number>,
+#    #     #<WordNet::Synset:0x7fdbfaac4120 {107109196} 'language, oral communication,
+#    #       speech, speech communication, spoken communication, spoken language,
+#    #       voice communication' (noun): [noun.communication] (language)
+#    #       communication by word of mouth>]
+#
+# Sometime, the first Synset isn't necessarily what you want; you want to look up
+# a particular one. Both #[] and #lookup_synsets also provide several
+# ways of filtering or selecting synsets.
+#
+# The first is the ability to select one based on its offset:
+#
+#    lex[ :language, 2 ]
+#    # => #<WordNet::Synset:0x7ffa78e74d78 {105808557} 'language, linguistic
+#    #       process' (noun): [noun.cognition] the cognitive processes involved in
+#    #       producing and understanding linguistic communication>
+#
+# You can also select one with a particular word in its definition:
+#
+#    lex[ :language, 'sounds' ]
+#    # => #<WordNet::Synset:0x7ffa78ee01b8 {106282651} 'linguistic communication,
+#    #       language' (noun): [noun.communication] a systematic means of
+#    #       communicating by the use of sounds or conventional symbols>
+#
+# If you're using a database that supports using regular expressions (e.g.,
+# PostgreSQL), you can use that to select one with a matching definition:
+#
+#    lex[ :language, %r:name.*discipline: ]
+#    # => #<WordNet::Synset:0x7ffa78f235a8 {106304059} 'language, nomenclature,
+#    #       terminology' (noun): [noun.communication] a system of words used
+#    #       to name things in a particular discipline>
+#
+# You can also select certain parts of speech:
+#
+#    lex[ :right, :noun ]
+#    # => #<WordNet::Synset:0x7ffa78f30b68 {100351000} 'right' (noun):
+#    #       [noun.act] a turn toward the side of the body that is on the south
+#    #       when the person is facing east>
+#    lex[ :right, :verb ]
+#    # => #<WordNet::Synset:0x7ffa78f09590 {200199659} 'correct, right, rectify'
+#    #       (verb): [verb.change] make right or correct>
+#    lex[ :right, :adjective ]
+#    # => #<WordNet::Synset:0x7ffa78ea8060 {300631391} 'correct, right'
+#    #       (adjective): [adj.all] free from error; especially conforming to
+#    #       fact or truth>
+#    lex[ :right, :adverb ]
+#    # => #<WordNet::Synset:0x7ffa78e5b2d8 {400032299} 'powerful, mightily,
+#    #       mighty, right' (adverb): [adv.all] (Southern regional intensive)
+#    #       very; to a great degree>
+#
+# or by lexical domain, which is a more-specific part of speech (see
+# <tt>WordNet::Synset.lexdomains.keys</tt> for the list of valid ones):
+#
+#    lex.lookup_synsets( :right, 'verb.social' )
+#    # => [#<WordNet::Synset:0x7ffa78d817e0 {202519991} 'redress, compensate,
+#    #       correct, right' (verb): [verb.social] make reparations or amends
+#    #       for>]
+#
 class WordNet::Lexicon
 	extend Loggability
 	include WordNet::Constants
 	# Loggability API -- log to the WordNet module's logger
 	log_to :wordnet
 
-	# class LogTracer
-	# 	def method_missing( sym, msg, &block )
-	# 		if msg =~ /does not exist/
-	# 			$stderr.puts ">>> DOES NOT EXIST TRACE"
-	# 			$stderr.puts( caller(1).grep(/wordnet/i) )
-	# 		end
-	# 	end
-	# end
-
 
 	# Add the logger device to the default options after it's been loaded
 	WordNet::DEFAULT_DB_OPTIONS.merge!( :logger => [Loggability[WordNet]] )
-	# WordNet::DEFAULT_DB_OPTIONS.merge!( :logger => [LogTracer.new] )
 
 
 	### Get the Sequel URI of the default database, if it's installed.
 	### Create a new WordNet::Lexicon object that will use the database connection specified by
 	### the given +dbconfig+.
 	def initialize( *args )
-		uri = if args.empty?
-				WordNet::Lexicon.default_db_uri or
-					raise WordNet::LexiconError,
-						"No default WordNetSQL database! You can install it via the " +
-						"wordnet-defaultdb gem, or download a version yourself from " +
-						"http://sourceforge.net/projects/wnsql/"
+		if args.empty?
+			self.initialize_with_defaultdb( args.shift )
+		elsif args.first.is_a?( String )
+			self.initialize_with_uri( *args )
+		else
+			self.initialize_with_opthash( args.shift )
+		end
 
-			elsif args.first.is_a?( String )
-				args.shift
-			else
-				nil
-			end
+		@db.sql_log_level = :debug
+		WordNet::Model.db = @db
+	end
 
-		options = WordNet::DEFAULT_DB_OPTIONS.merge( args.shift || {} )
+
+	### Connect to the WordNet DB using an optional options hash.
+	def initialize_with_defaultdb( options )
+		uri = WordNet::Lexicon.default_db_uri or raise WordNet::LexiconError,
+			"No default WordNetSQL database! You can install it via the " +
+			"wordnet-defaultdb gem, or download a version yourself from " +
+			"http://sourceforge.net/projects/wnsql/"
+		@db = self.connect( uri, options )
+	end
+
+
+	### Connect to the WordNet DB using a URI and an optional options hash.
+	def initialize_with_uri( uri, options )
+		@db = self.connect( uri, options )
+	end
+
+
+	### Connect to the WordNet DB using a connection options hash.
+	def initialize_with_opthash( options )
+		@db = self.connect( nil, options )
+	end
+
+
+	### Connect to the WordNet DB and return a Sequel::Database object.
+	def connect( uri, options )
+		options = WordNet::DEFAULT_DB_OPTIONS.merge( options || {} )
 
 		if uri
 			self.log.debug "Connecting using uri + options style: uri = %s, options = %p" %
 				[ uri, options ]
-			@db = Sequel.connect( uri, options )
+			return Sequel.connect( uri, options )
 		else
 			self.log.debug "Connecting using hash style connect: options = %p" % [ options ]
-			@db = Sequel.connect( options )
+			return Sequel.connect( options )
 		end
-
-		@uri = @db.uri
-		self.log.debug "  setting model db to: %s" % [ @uri ]
-
-		@db.sql_log_level = :debug
-		WordNet::Model.db = @db
 	end
 
 

File lib/wordnet/morph.rb

 class WordNet::Morph < WordNet::Model( :morphs )
 	include WordNet::Constants
 
+	#                 Table "public.morphs"
+	#  Column  |         Type          |     Modifiers
+	# ---------+-----------------------+--------------------
+	#  morphid | integer               | not null default 0
+	#  morph   | character varying(70) | not null
+	# Indexes:
+	#     "pk_morphs" PRIMARY KEY, btree (morphid)
+	#     "unq_morphs_morph" UNIQUE, btree (morph)
+	# Referenced by:
+	#     TABLE "morphmaps" CONSTRAINT "fk_morphmaps_morphid" FOREIGN KEY (morphid) REFERENCES morphs(morphid)
+	#
+
 	set_primary_key :morphid
 
-	many_to_one :word,
+	#                 Table "public.morphmaps"
+	#  Column  |     Type     |           Modifiers
+	# ---------+--------------+-------------------------------
+	#  wordid  | integer      | not null default 0
+	#  pos     | character(1) | not null default NULL::bpchar
+	#  morphid | integer      | not null default 0
+	# Indexes:
+	#     "pk_morphmaps" PRIMARY KEY, btree (morphid, pos, wordid)
+	#     "k_morphmaps_morphid" btree (morphid)
+	#     "k_morphmaps_wordid" btree (wordid)
+	# Foreign-key constraints:
+	#     "fk_morphmaps_morphid" FOREIGN KEY (morphid) REFERENCES morphs(morphid)
+	#     "fk_morphmaps_wordid" FOREIGN KEY (wordid) REFERENCES words(wordid)
+	many_to_many :words,
 		:join_table => :morphmaps,
-		:left_key => :wordid,
-		:right_key => :morphid
+		:right_key  => :wordid,
+		:left_key   => :morphid
 
 
 	### Return the stringified word; alias for #lemma.

File lib/wordnet/sumoterm.rb

 require 'wordnet/constants'
 
 
-# SUMO terms
+# Experimental support for the WordNet mapping for the {Suggested Upper Merged 
+# Ontology}[http://www.ontologyportal.org/] (SUMO).
+# This is still a work in progress, and isn't supported by all of the WordNet-SQL
+# databases.
 class WordNet::SumoTerm < WordNet::Model( :sumoterms )
 	include WordNet::Constants
 
+	#                       Table "public.sumoterms"
+	#         Column         |          Type          |     Modifiers
+	# -----------------------+------------------------+--------------------
+	#  sumoid                | integer                | not null default 0
+	#  sumoterm              | character varying(128) | not null
+	#  ischildofattribute    | boolean                |
+	#  ischildoffunction     | boolean                |
+	#  ischildofpredicate    | boolean                |
+	#  ischildofrelation     | boolean                |
+	#  iscomparisonop        | boolean                |
+	#  isfunction            | boolean                |
+	#  isinstance            | boolean                |
+	#  islogical             | boolean                |
+	#  ismath                | boolean                |
+	#  isquantifier          | boolean                |
+	#  isrelationop          | boolean                |
+	#  issubclass            | boolean                |
+	#  issubclassofattribute | boolean                |
+	#  issubclassoffunction  | boolean                |
+	#  issubclassofpredicate | boolean                |
+	#  issubclassofrelation  | boolean                |
+	#  issubrelation         | boolean                |
+	#  issuperclass          | boolean                |
+	#  issuperrelation       | boolean                |
+	# Indexes:
+	#     "pk_sumoterms" PRIMARY KEY, btree (sumoid)
+	#     "unq_sumoterms_sumoterm" UNIQUE, btree (sumoterm)
+	# Referenced by:
+	#     TABLE "sumomaps" CONSTRAINT "fk_sumomaps_sumoid" FOREIGN KEY (sumoid) REFERENCES sumoterms(sumoid)
+	#     TABLE "sumoparsemaps" CONSTRAINT "fk_sumoparsemaps_sumoid" FOREIGN KEY (sumoid) REFERENCES sumoterms(sumoid)
 	set_primary_key :sumoid
 
+
+	#
+	# Associations
+	#
+
 	# SUMO Term -> [ SUMO Map ] -> [ Synset ]
+
+	#             Table "public.sumomaps"
+	#   Column   |     Type     |     Modifiers
+	# -----------+--------------+--------------------
+	#  synsetid  | integer      | not null default 0
+	#  sumoid    | integer      | not null default 0
+	#  sumownrel | character(1) | not null
+	# Indexes:
+	#     "pk_sumomaps" PRIMARY KEY, btree (synsetid)
+	#     "k_sumomaps_sumoid" btree (sumoid)
+	#     "k_sumomaps_sumownrel" btree (sumownrel)
+	# Foreign-key constraints:
+	#     "fk_sumomaps_sumoid" FOREIGN KEY (sumoid) REFERENCES sumoterms(sumoid)
+	#     "fk_sumomaps_synsetid" FOREIGN KEY (synsetid) REFERENCES synsets(synsetid)
+
+	##
+	# WordNet::Synsets that are related to this term
 	many_to_many :synsets,
 		:join_table => :sumomaps,
-		:left_key => :sumoid,
-		:right_key => :synsetid
+		:left_key   => :sumoid,
+		:right_key  => :synsetid
 
 end # class WordNet::SumoTerm
 

File lib/wordnet/synset.rb

 # WordNet lexical database. A synonym set is a set of words that are
 # interchangeable in some context.
 #
-#   ss = WordNet::Synset[ 106286395 ]
-#   # => #<WordNet::Synset @values={:synsetid=>106286395, :pos=>"n",
-#       :lexdomainid=>10,
-#       :definition=>"a unit of language that native speakers can identify"}>
+# We can either fetch the synset from a connected Lexicon:
 #
-#   ss.words.map( &:lemma )
-#   # => ["word"]
+#    lexicon = WordNet::Lexicon.new( 'postgres://localhost/wordnet30' )
+#    ss = lexicon[ :first, 'time' ]
+#    # => #<WordNet::Synset:0x7ffbf2643bb0 {115265518} 'commencement, first,
+#    #       get-go, offset, outset, start, starting time, beginning, kickoff,
+#    #       showtime' (noun): [noun.time] the time at which something is
+#    #       supposed to begin>
 #
-#   ss.hypernyms
-#   # => [#<WordNet::Synset @values={:synsetid=>106284225, :pos=>"n",
-#       :lexdomainid=>10,
-#       :definition=>"one of the natural units into which [...]"}>]
+# or if you've already created a Lexicon, use its connection indirectly to
+# look up a Synset by its ID:
 #
-#   ss.hyponyms
-#   # => [#<WordNet::Synset @values={:synsetid=>106287620, :pos=>"n",
-#       :lexdomainid=>10,
-#       :definition=>"a word or phrase spelled by rearranging [...]"}>,
-#     #<WordNet::Synset @values={:synsetid=>106287859, :pos=>"n",
-#       :lexdomainid=>10,
-#       :definition=>"a word (such as a pronoun) used to avoid [...]"}>,
-#     #<WordNet::Synset @values={:synsetid=>106288024, :pos=>"n",
-#       :lexdomainid=>10,
-#       :definition=>"a word that expresses a meaning opposed [...]"}>,
-#     ...
-#    ]
+#    ss = WordNet::Synset[ 115265518 ]
+#    # => #<WordNet::Synset:0x7ffbf257e928 {115265518} 'commencement, first,
+#    #       get-go, offset, outset, start, starting time, beginning, kickoff,
+#    #       showtime' (noun): [noun.time] the time at which something is
+#    #       supposed to begin>
+#
+# You can fetch a list of the lemmas (base forms) of the words included in the
+# synset:
+#
+#    ss.words.map( &:lemma )
+#    # => ["commencement", "first", "get-go", "offset", "outset", "start",
+#    #     "starting time", "beginning", "kickoff", "showtime"]
+#
+# But the primary reason for a synset is its lexical and semantic links to
+# other words and synsets. For instance, its *hypernym* is the equivalent
+# of its superclass: it's the class of things of which the receiving
+# synset is a member.
+#
+#    ss.hypernyms
+#    # => [#<WordNet::Synset:0x7ffbf25c76c8 {115180528} 'point, point in
+#    #        time' (noun): [noun.time] an instant of time>]
+#
+# The synset's *hypernyms*, on the other hand, are kind of like its
+# subclasses:
+#
+#    ss.hyponyms
+#    # => [#<WordNet::Synset:0x7ffbf25d83b0 {115142167} 'birth' (noun):
+#    #       [noun.time] the time when something begins (especially life)>,
+#    #     #<WordNet::Synset:0x7ffbf25d8298 {115268993} 'threshold' (noun):
+#    #       [noun.time] the starting point for a new state or experience>,
+#    #     #<WordNet::Synset:0x7ffbf25d8180 {115143012} 'incipiency,
+#    #       incipience' (noun): [noun.time] beginning to exist or to be
+#    #       apparent>,
+#    #     #<WordNet::Synset:0x7ffbf25d8068 {115266164} 'starting point,
+#    #       terminus a quo' (noun): [noun.time] earliest limiting point>]
 #
 class WordNet::Synset < WordNet::Model( :synsets )
 	include WordNet::Constants

File lib/wordnet/word.rb

 
 	set_primary_key :wordid
 
+	#
+	# Associations
+	#
+
 	##
 	# The WordNet::Sense objects that relate the word with its Synsets
 	one_to_many :senses,
 		:right_key => :morphid
 
 
+	#
+	# Dataset methods
+	#
+
+	##
+	# Return a dataset for words matching the given +lemma+.
+	def_dataset_method( :by_lemma ) {|lemma| filter( lemma: lemma ) }
+
+
+	#
+	# Other methods
+	#
+
 	### Return the stringified word; alias for #lemma.
 	def to_s
 		return self.lemma