1. jonathanderque
  2. minindex

Overview

HTTPS SSH
                               _       _           _
                     _ __ ___ (_)_ __ (_)_ __   __| | _____  __
                    | '_ ` _ \| | '_ \| | '_ \ / _` |/ _ \ \/ /
                    | | | | | | | | | | | | | | (_| |  __/>  <
                    |_| |_| |_|_|_| |_|_|_| |_|\__,_|\___/_/\_\

                     a self-contained embeddable search engine

. OVERVIEW .

  Minindex is a library implementing a small search engine. The rationale is
that many applications would benefit from a search facility without having to
rely on heavy-weight solutions such as Lucene for instance.

  Minindex has no runtime dependency and is distributed under the MIT Licence.


. STATUS .

  Minindex is a very young library and currently exhibits the following features:
   * indexes can be saved on disk
   * indexes can be partitioned in separate subsets (for instance, you can index
     mail subjects and mail content as different content). This allows:
   * queries on different subsets (eg. 'which mails have a title containing
     "car" and a body containing "red"')
   * a minimal query language supporting subsets
   * a lua bridge (used in development for rapid prototyping)

and limitations:
   * no update of previously indexed documents (you have to re-index
     the whole document set at the moment)


. INSTALLATION .

  Minindex is currently distributed on Bitbucket. The simplest way to obtain the
library is to clone the mercurial repository:

--- SHELL ---
   $ hg clone https://bitbucket.org/jonathanderque/minindex
-------------

  Minindex requires CMake (http://cmake.org) to compile. Lua is also needed if
you want to build the lua bridge (this is optional).

  To compile minindex from source, proceed as follows:

--- SHELL ---
   $ mkdir build
   $ cd build
   $ cmake ..
     [...]
   $ make
     [...]
   $ make tests   # optional
-------------

. USAGE .

  There are two sides of minindex. On one hand you add documents to the index.
On the other hand, you query the index for search results. For both usage, a
minindex handle should be created:

--- C ---
  #include <minindex.h>

  minindex midx = minindex_load("test_index/", MININDEX_CREATE | MININDEX_CLEAR);
---------

  The first argument of minindex_load specifies the location on disk of the
index. The second argument precises some options to the loading process:
MININDEX_CREATE will open a blank index if the location does not exists.
MININDEX_CLEAR will make sure we start with a blank index if there is already
one at the specified location.

  When the index handle is no longer used, you can release the resources taken
by the handle by calling minindex_destroy():

--- C ---
  minindex_destroy(midx);
---------


Indexing documents:

  Adding documents to the index is a two step process. You first build a
document structure and fill it with content. You eventually add this document to
the index. Documents consists of a list of tokenized words. Tokens are optionally
added with a index subset name. The whole process looks like this:

--- C ---
  minindex_document doc = minindex_document_create();
  assert(doc != NULL);

  minindex_document_add_word(doc, "title", "moby");
  minindex_document_add_word(doc, "title", "dick");
  minindex_document_add_word(doc, "text", "call");
  minindex_document_add_word(doc, "text", "me");
  minindex_document_add_word(doc, "text", "ishmael");
  /* ... */

  minindex_add_document(midx, doc);
  minindex_document_destroy(doc);
---------


  Minindex will buffer documents to minimize disk-access, you could force the
index to be written on disk at any point by calling minindex_write:

--- C ---
  minindex_write(midx);
---------


Searching documents:

  The function minindex_search allows you to query the index. The query specify
the words as well as the index subsets to query. You should also specify a
maximum number of search results you want to obtain. The returned list of
results only contain document id and associated scores. If you want more
information about the result documents (such as user data), you should the
minindex_get_document() function

--- C ---
   char *search_query = "title:moby title:dick";
   minindex_search_result_t search_results;

   search_results = minindex_search(midx, search_query, 50);
   assert(search_results.status == MININDEX_OK);

   for (i = 0; i < search_results.hit_count; ++i) {
     json_document search_result =
       minindex_get_document(midx, search_results.hits[i].doc_id);

     printf("%i. (score: %i)\n", i, search_results[i].score);
     json_document_destroy(search_result);
   }
   minindex_search_cleanup(search_results);
---------



. LUA INTERFACE .

  The lua bridge is very close to the C API. The following snippet accomplishes
the same operation as the C code above:

--- LUA ---
  midx = minindex.load("test_index/", minindex.create_mode)
  midx_doc = minindex_document.create()
  midx_doc:add_word("title","moby")
  midx_doc:add_word("title","dick")
  midx_doc:add_word("text","call")
  midx_doc:add_word("text","me")
  midx_doc:add_word("text","ishmael")
  midx:add_document(midx_doc)
  midx_doc:destroy()
  midx:write()
  query = "title:moby title:dick"
  search_results = midx:search(query, 50)
  for i,v in ipairs(search_results) do
    local doc = midx:get_document(v.doc_id);
    print(i, v.score);
    doc:destroy()
  end
  midx:destroy()
-----------