Wiki

Clone wiki

BibSonomy GoogleDocs Add-on / Indexing

Indexing-Package

The Indexing package contains all classes and functions for creating, maintaining and receiving documents from an Index.

Indexing Package

TreeIndex

The current indexing is done via a tree structured index class named TreeIndex. To save documents into the TreeIndex one has to call the createIndex method. Doing this, the class first saves all the given Interhashes of the documents received from the BibSonomyAPI into an array. Next it parses the title of each document. For every character of every word that is in the title the algorithm searches for a TreeIndexNode for that character in the index. If there is not TreeIndexNode for the character the index creates a new TreeIndexNode and puts it into the index HashMap with that character as the key. The next character will be parsed inside the TreeIndexNode using the same algorithm.

To filter words that one does not want to index there is a stopword list to put those words into.

The TreeIndex has an own search algorithm. To find documents in the index you call the retrieveMatchingDocuments method with an array of search terms. There is also an option for whether or not to retrieve documents with words that contain the searchterm. Example: Setting this value to true. If you search for 'info' you will get documents containing 'information'.

There is currently no option to remove documents from the index or to add more documents after the index is created, because there is no need for those in the moment.

TreeIndexNode

The TreeIndexNode is a node object for the TreeIndex. It contains all documents that match the chain of TreeIndexNodes representing the characters of the words in the title of the documents. This class always knows the root TreeIndex to determine whether or not to search for words that match the searchterm partially.

There are two lists of documents: The documents list contains all documents that match the chain of TreeIndexNodes completely and the childDocuments list contains all the documents that match partially.

Updated