1. codernity
  2. CodernityDB
  3. Issues
Issue #8 resolved

Custom storage is not used

ont NA
created an issue

Consider this example code:

from CodernityDB.database_thread_safe import ThreadSafeDatabase
from CodernityDB.hash_index import UniqueHashIndex
from CodernityDB.storage import Storage

import zlib
import marshal

class ZipStorage( Storage ):
    def __init__( self, db_path, name ):
        super( ZipStorage, self ).__init__( db_path, name )

    def data_to( self, data ):
        for k, v in data.iteritems():
            if k[ :2 ] == 'z_':  ## we want to zip this field
                data[ k ] = zlib.compress( v )

        print '>>> we are used!!!'
        exit( 0 )
        return marshal.dumps( data )

    def data_from( self, data ):
        data = marshal.loads( data )

        for k, v in data.iteritems():
            if k[ :2 ] == 'z_':  ## saved data is in compressed format
                data[ k ] = zlib.decompress( v )

        return data


class ZipUniqueHashIndex( UniqueHashIndex ):
    custom_header = """import zlib"""
    def __init__(self, *args, **kwargs):
        print '===>', args, kwargs
        UniqueHashIndex.__init__( self, *args, **kwargs )


def main():
    db = ThreadSafeDatabase( '/tmp/tut_zip' )

    if db.exists():
        db.open()
        #db.reindex()
    else:
        id_ind = ZipUniqueHashIndex( db.path, 'id', storage_class = 'ZipStorage' )
        print '-->', id_ind.storage
        db.set_indexes([ id_ind ])

        db.create()
        print '-->', id_ind.storage

        for x in xrange( 1000 ):  ## insert 1000 * 100000 = 100 Mb of data
            db.insert( dict( num=x, z_ttt = 'a' * 100000 ) )

if __name__ == '__main__':
    main()

With this code I want to do custom main storage. I use same approach as in Salsa20Storage example ( http://labs.codernity.com/codernitydb/examples.html ), but seems it is also broken.

Expected output:

....
>>> we are used!!!'

But I receive this output:

===> ('/tmp/tut_zip', 'id') {'storage_class': 'ZipStorage'}
--> None
===> ('/tmp/tut_zip', 'id') {}
--> None

Size of database directory also indicates that storage is not used.

Comments (5)

  1. codernity repo owner

    Hey,

    It seems that this example in documentation is outdated. Here is how it should look like:

    #!/usr/bin/env python
    
    
    from CodernityDB.hash_index import UniqueHashIndex
    from CodernityDB.storage import Storage
    from CodernityDB.database import Database
    from hashlib import sha256
    
    import salsa20
    import marshal
    import os
    
    
    class Salsa20Storage(Storage):
    
        def __init__(self, db_path, name, enc_key):
            super(Salsa20Storage, self).__init__(db_path, name)
            self.enc_key = enc_key
    
        def data_from(self, data):
            iv = data[:8]
            sal = salsa20.Salsa20(self.enc_key, iv, 20)
            s_data = sal.decrypt(data[8:])
            m_data = marshal.loads(s_data)
            return m_data
    
        def data_to(self, data):
            iv = os.urandom(8)
            m_data = marshal.dumps(data)
            sal = salsa20.Salsa20(self.enc_key, iv, 20)
            s_data = sal.encrypt(m_data)
            return iv + s_data
    
    
    class EncUniqueHashIndex(UniqueHashIndex):
    
        __enc_key = 'a' * 32
    
        custom_header = """
    from demo_secure_storage import Salsa20Storage
    from hashlib import sha256"""
    
        def __init__(self, *args, **kwargs):
            super(EncUniqueHashIndex, self).__init__(*args, **kwargs)
    
        @property
        def enc_key(self):
            return self.__enc_key
    
        @enc_key.setter
        def enc_key(self, value):
            if len(value) != 32:
                self.__enc_key = sha256(value).digest()
            else:
                self.__enc_key = value
            self.storage.enc_key = self.__enc_key
    
        def _open_storage(self):
            if not self.storage:
                self.storage = Salsa20Storage(
                    self.db_path, self.name, self.enc_key)
                self.storage.open()
    
        def _create_storage(self):
            if not self.storage:
                self.storage = Salsa20Storage(
                    self.db_path, self.name, self.enc_key)
                self.storage.create()
    
    
    def main():
        db = Database('/tmp/demo_secure')
        key = 'abcdefgh'
        id_ind = EncUniqueHashIndex(db.path, 'id')
        db.set_indexes([id_ind])
        db.create()
        db.id_ind.enc_key = key
    
        for x in xrange(100):
            db.insert(dict(x=x, data='testing'))
    
        db.close()
        dbr = Database('/tmp/demo_secure')
        dbr.open()
        dbr.id_ind.enc_key = key
    
        for curr in dbr.all('id', limit=5):
            print curr
    
    
    if __name__ == "__main__":
        main()
    

    As you can see, you can't send storage_class argument anymore. We changed it because it was impossible to initialize correctly the storage class (in this situation the Salsa20).

    So it's documentation "bug" not a DB bug. Thank you for pointing it out! (documentation changed)

  2. ont NA reporter

    Hmm... Did you use this slightly "overloaded" api due to backward compatibility issues? Yes, I see this code in index.py:

        def _create_storage(self):
            s = globals()[self.storage_class]
            if not self.storage:
                self.storage = s(self.db_path, self.name)
            self.storage.create()
    

    In my opinion it is more easy just to create storage object in __init__ and then parent class will be able to simply use direct call to self.storage.create() without strange _open_storage and _create_storage wrappers.

    In this case code for index will look better:

    class EncUniqueHashIndex(UniqueHashIndex):
    
        __enc_key = 'a' * 32
    
        custom_header = """
    from demo_secure_storage import Salsa20Storage
    from hashlib import sha256"""
    
        def __init__(self, *args, **kwargs):
            super(EncUniqueHashIndex, self).__init__(*args, **kwargs)
    
            ## now magic part: we create it in one place and one time!
            self.storage = Salsa20Storage(self.db_path, self.name, self.enc_key)
    
        @property
        def enc_key(self):
            return self.__enc_key
    
        @enc_key.setter
        def enc_key(self, value):
            if len(value) != 32:
                self.__enc_key = sha256(value).digest()
            else:
                self.__enc_key = value
            self.storage.enc_key = self.__enc_key
    
  3. ont NA reporter

    Ok, i miss major point: you can't just create storage immediately. Am I right? Did you need first to form setup options and then create storage object with this parameters? If so, what's about some setup method, which just create storage object and assign it to self.storage?

    So, lines for self._create_storage() can be replaced with:

    if not self.storage:
        self._setup_storage()
    self.storage.create()
    

    The _open_storage and _create_storage methods with duplicated code looks so unnaturally... =(

  4. codernity repo owner

    Yes,

    You can't create storage when you create index object. it's because CodernityDB first stores index code in indexes/....py then loads index object from it. That also happens on embeded mode and remote mode. It's because you probably don't want to have Index classes in scope every time you use database, isn't it?

    That setup method seems ok, BUT it might be true that in other storages it's not usable. But probably it's better then to provide 'minimized' example instead of that duplicated. We will examine that. But remember CodernityDB is very flexible, so you can use it in many different ways.

    Please also remember that there is inheritance with index codes, so that "simplify" might then cause code consistency problems where you will have inherited _setup_storage method, and some "custom" because of your custom storage. I don't think it's good idea because of that.

    BTW. You should not rely on index.py code it's interface like class. We have probably to clean it a bit more then, or at least explain it's existence a bit more.

  5. Log in to comment