Pull requests

#3 Declined
Repository
ont ont
Branch
default
Repository
codernity codernity
Branch
default

Multiple keys for index.

Author
  1. ont
Reviewers
Description

This is implementation of multiple keys in index for single record. I guess it can be useful for making full-text search indexes or similar. Yes, keys collision is not very good and slow down insertion. But queries which return many entries for text searching is also useless. So may be it is possible to create such indexes with reasonable hash collisions (for example search only for words with which length is greater than 4 chars etc...).

Example code:

from CodernityDB.database_thread_safe import ThreadSafeDatabase
from CodernityDB.hash_index import HashIndex


class DigitsIndex( HashIndex ):
    def __init__( self, *args, **kwargs ):
        kwargs[ 'key_format' ] = 'i'
        kwargs[ 'hash_lim' ] = 100
        HashIndex.__init__( self, *args, **kwargs )

    def make_key_value( self, data ):
        val = data.get( 'num' )
        if type( val ) is not int: raise StopIteration

        while val:
            yield val % 100, None
            val /= 10

    def make_key( self, key ):
        return key


def main():
    db = ThreadSafeDatabase( '/tmp/tut1' )

    if db.exists():
        db.open()
        #db.reindex()
    else:
        db.create()

        digs_ind = DigitsIndex( db.path, 'digit' )
        db.add_index( digs_ind )

        for x in xrange( 1300 ):
            db.insert( dict(num=x) )
            if x % 100 == 0:
                print 'already inserted %s numbers...' % x


    for curr in db.get_many( 'digit', 10, limit = -1, with_doc = True ):
        print curr


if __name__ == '__main__':
    main()

Pros: easy searching for records.

Cons: slow db.reindex() and insertions of records (due to increased numbers of hash collisions).

Comments (2)

  1. codernity repo owner

    Hey,

    Thank you for this pull request BUT i have to decline it because it will cause slowdown for non multiple indexes. Because it's the second pull request about that feature and the second where we can't accept it we will push our MultipleIndex solution right now. That solution uses "index power" to archive exactly that without overhead on non multiple indexes. Expect that commit in hour or so.

    Thank very much for your approach to solve it. But you forgot about delete / update operations.

  2. ont author

    My bad... :) Exactly, I forgot about delete/update! Thank you for explaining my mistakes. You commit should be useful!