Remove DB open/close in data.calculate, allow for persistent (non-file-based) databases

Issue #115 new
Zachary Ulissi created an issue

The way Data() is currently written, the database handle is closed, discarded, new items calculated, added to the database, then the handle is closed and discarded. Because of this structure, it is impossible to use a persistent (in-memory) database without connecting to a separate database scheme like an in-memory SQL database.

Every time there is a call to Data.calculate_items() the database is opened and closed. During a minimization, this happens on every force call, resulting in huge files and disk IO limitations, for images that will probably never be used again (unlike a training set, where you probably want to keep the fingerprints and derivatives around so you can train again later).

As an example, consider a naive in-memory database, a simple dictionary:

class MemoryDatabase:

    def __init__(self, filename):
        """Open the filename at specified location. flag is ignored; this
        format is always capable of both reading and writing."""
        self._memdict = {}  # Items already accessed; stored in memory.

    @classmethod
    def open(Cls, filename, flag=None):
        """Open present for compatibility with shelve. flag is ignored; this
        format is always capable of both reading and writing."""
        return Cls(filename=filename)

    def close(self):
        """Only present for compatibility with shelve."""
        return

    def keys(self):
        """Return list of keys, both of in-memory and out-of-memory
        items."""
        keys = self._memdict.keys()
        return keys

    def __len__(self):
        return len(self.keys())

    def __setitem__(self, key, value):
        self._memdict[key] = value

    def __getitem__(self, key):
        if key in self._memdict:
            return self._memdict[key]
        else:
            raise KeyError(str(key))


    def update(self, newitems):
        for key, value in newitems.iteritems():
            self.__setitem__(key, value)

This scheme won't work because it is expected that the database can be saved, dropped, and loaded without losing information.

Comments (0)

  1. Log in to comment