Issue #44 resolved

Cross platform file caching

Anonymous created an issue

I'm looking for a cross platform file caching backend as the dogpile.cache.backends.file requires fcntl which is Unix. In my case I'll have a single thread doing rw operations so I don't mind not having a rw lock.

  • It should be mentioned in the docs that the file backend is not compatible with Windows in a more explicit way
  • The file backend should be cross platform
  • The file backend should have an optional rw lock

Comments (16)

  1. Mike Bayer repo owner

    I'm confused. The file backend already uses an RW lock, note:

    def acquire_read_lock(self, wait):
        return self._acquire(wait, os.O_RDONLY, fcntl.LOCK_SH)
    
    def acquire_write_lock(self, wait):
        return self._acquire(wait, os.O_WRONLY, fcntl.LOCK_EX)
    

    as far as how to make it cross platform, I've no idea how a read/write file lock would be implemented on windows (I know how to do it with threading.condition, if you want an option to be present that works in only one process).

  2. Mike Bayer repo owner

    right, was going to say earlier, dependencies on win32 type stuff I can't specifically have directly in dogpile.cache, so I'm open to having the FileLock factory be pluggable, e.g. as an argument to DBMBackend. feel free to send a pullreq.

  3. Mike Bayer repo owner

    we already have lock mechanisms as pluggable. I'd rather not endorse a particular win32 system as I don't have the resources to test or support it, portalocker hasn't had a commit in 3 years (does it work at scale? does it work on the latest windows? etc., I have no idea). I commonly create hooks so that features I can't support can be implemented and maintained by others on the outside.

  4. Dutchreaper123

    I'm running subliminal and made some changes to overcome the problems in windows. I do not know if it solves all the problems or if it even works, but it fixed my errors :D 1. install https://github.com/WoLpH/portalocker (windows replacement fcntl) 2. install pywin32 for required libraries. 3. change dogpile.cache-0.5.1-py2.7.egg\dogpile\cache\backends\file.py to:

    """
    File Backends
    ------------------
    
    Provides backends that deal with local filesystem access.
    
    """
    
    from __future__ import with_statement
    from dogpile.cache.api import CacheBackend, NO_VALUE
    from contextlib import contextmanager
    from dogpile.cache import compat
    from dogpile.cache import util
    import os
    
    import portalocker
    """import fcntl"""
    
    __all__ = 'DBMBackend', 'FileLock'
    
    class DBMBackend(CacheBackend):
        """A file-backend using a dbm file to store keys.
    
        Basic usage::
    
            from dogpile.cache import make_region
    
            region = make_region().configure(
                'dogpile.cache.dbm',
                expiration_time = 3600,
                arguments = {
                    "filename":"/path/to/cachefile.dbm"
                }
            )
    
        DBM access is provided using the Python ``anydbm`` module,
        which selects a platform-specific dbm module to use.
        This may be made to be more configurable in a future
        release.
    
        Note that different dbm modules have different behaviors.
        Some dbm implementations handle their own locking, while
        others don't.  The :class:`.DBMBackend` uses a read/write
        lockfile by default, which is compatible even with those
        DBM implementations for which this is unnecessary,
        though the behavior can be disabled.
    
        The DBM backend by default makes use of two lockfiles.
        One is in order to protect the DBM file itself from
        concurrent writes, the other is to coordinate
        value creation (i.e. the dogpile lock).  By default,
        these lockfiles use the ``flock()`` system call
        for locking; this is only available on Unix
        platforms.
    
        Currently, the dogpile lock is against the entire
        DBM file, not per key.   This means there can
        only be one "creator" job running at a time
        per dbm file.
    
        A future improvement might be to have the dogpile lock
        using a filename that's based on a modulus of the key.
        Locking on a filename that uniquely corresponds to the
        key is problematic, since it's not generally safe to
        delete lockfiles as the application runs, implying an
        unlimited number of key-based files would need to be
        created and never deleted.
    
        Parameters to the ``arguments`` dictionary are
        below.
    
        :param filename: path of the filename in which to
         create the DBM file.  Note that some dbm backends
         will change this name to have additional suffixes.
        :param rw_lockfile: the name of the file to use for
         read/write locking.  If omitted, a default name
         is used by appending the suffix ".rw.lock" to the
         DBM filename.  If False, then no lock is used.
        :param dogpile_lockfile: the name of the file to use
         for value creation, i.e. the dogpile lock.  If
         omitted, a default name is used by appending the
         suffix ".dogpile.lock" to the DBM filename. If
         False, then dogpile.cache uses the default dogpile
         lock, a plain thread-based mutex.
    
    
        """
        def __init__(self, arguments):
            self.filename = os.path.abspath(
                                os.path.normpath(arguments['filename'])
                            )
            dir_, filename = os.path.split(self.filename)
    
            self._rw_lock = self._init_lock(
                                    arguments.get('rw_lockfile'),
                                    ".rw.lock", dir_, filename)
            self._dogpile_lock = self._init_lock(
                                    arguments.get('dogpile_lockfile'),
                                    ".dogpile.lock",
                                    dir_, filename,
                                    util.KeyReentrantMutex.factory)
    
            # TODO: make this configurable
            if compat.py3k:
                import dbm
            else:
                import anydbm as dbm
            self.dbmmodule = dbm
            self._init_dbm_file()
    
        def _init_lock(self, argument, suffix, basedir, basefile, wrapper=None):
            if argument is None:
                lock = FileLock(os.path.join(basedir, basefile + suffix))
            elif argument is not False:
                lock = FileLock(
                            os.path.abspath(
                                os.path.normpath(argument)
                            ))
            else:
                return None
            if wrapper:
                lock = wrapper(lock)
            return lock
    
        def _init_dbm_file(self):
            exists = os.access(self.filename, os.F_OK)
            if not exists:
                for ext in ('db', 'dat', 'pag', 'dir'):
                    if os.access(self.filename + os.extsep + ext, os.F_OK):
                        exists = True
                        break
            if not exists:
                fh = self.dbmmodule.open(self.filename, 'c')
                fh.close()
    
        def get_mutex(self, key):
            # using one dogpile for the whole file.   Other ways
            # to do this might be using a set of files keyed to a
            # hash/modulus of the key.   the issue is it's never
            # really safe to delete a lockfile as this can
            # break other processes trying to get at the file
            # at the same time - so handling unlimited keys
            # can't imply unlimited filenames
            if self._dogpile_lock:
                return self._dogpile_lock(key)
            else:
                return None
    
        @contextmanager
        def _use_rw_lock(self, write):
            if self._rw_lock is None:
                yield
            elif write:
                with self._rw_lock.write():
                    yield
            else:
                with self._rw_lock.read():
                    yield
    
        @contextmanager
        def _dbm_file(self, write):
            with self._use_rw_lock(write):
                dbm = self.dbmmodule.open(self.filename,
                                    "w" if write else "r")
                yield dbm
                dbm.close()
    
        def get(self, key):
            with self._dbm_file(False) as dbm:
                if hasattr(dbm, 'get'):
                    value = dbm.get(key, NO_VALUE)
                else:
                    # gdbm objects lack a .get method
                    try:
                        value = dbm[key]
                    except KeyError:
                        value = NO_VALUE
                if value is not NO_VALUE:
                    value = compat.pickle.loads(value)
                return value
    
        def get_multi(self, keys):
            return [self.get(key) for key in keys]
    
        def set(self, key, value):
            with self._dbm_file(True) as dbm:
                dbm[key] = compat.pickle.dumps(value)
    
        def set_multi(self, mapping):
            with self._dbm_file(True) as dbm:
                for key,value in mapping.items():
                    dbm[key] = compat.pickle.dumps(value)
    
        def delete(self, key):
            with self._dbm_file(True) as dbm:
                try:
                    del dbm[key]
                except KeyError:
                    pass
    
        def delete_multi(self, keys):
            with self._dbm_file(True) as dbm:
                for key in keys:
                    try:
                        del dbm[key]
                    except KeyError:
                        pass
    
    class FileLock(object):
        """Use lockfiles to coordinate read/write access to a file.
    
        Only works on Unix systems, using
        `fcntl.flock() <http://docs.python.org/library/fcntl.html>`_.
    
        """
    
        def __init__(self, filename):
            self._filedescriptor = compat.threading.local()
            self.filename = filename
    
        def acquire(self, wait=True):
            return self.acquire_write_lock(wait)
    
        def release(self):
            self.release_write_lock()
    
        @property
        def is_open(self):
            return hasattr(self._filedescriptor, 'fileno')
    
        @contextmanager
        def read(self):
            self.acquire_read_lock(True)
            try:
                yield
            finally:
                self.release_read_lock()
    
        @contextmanager
        def write(self):
            self.acquire_write_lock(True)
            try:
                yield
            finally:
                self.release_write_lock()
    
        def acquire_read_lock(self, wait):
            return self._acquire(wait, os.O_RDONLY, portalocker.LOCK_SH)
    
        def acquire_write_lock(self, wait):
            return self._acquire(wait, os.O_WRONLY, portalocker.LOCK_EX)
    
        def release_read_lock(self):
            self._release()
    
        def release_write_lock(self):
            self._release()
    
        def _acquire(self, wait, wrflag, lockflag):
            wrflag |= os.O_CREAT
            fileno = os.open(self.filename, wrflag)
            try:
                if not wait:
                    lockflag |= portalocker.LOCK_NB
                portalocker.lock(fileno, lockflag)
            except IOError:
                os.close(fileno)
                if not wait:
                    # this is typically
                    # "[Errno 35] Resource temporarily unavailable",
                    # because of LOCK_NB
                    return False
                else:
                    raise
            else:
                self._filedescriptor.fileno = fileno
                return True
    
        def _release(self):
            try:
                fileno = self._filedescriptor.fileno
            except AttributeError:
                return
            else:
                portalocker.unlock(fileno)
                os.close(fileno)
                del self._filedescriptor.fileno
    

    PS This is my first time installing/editing python, entering a possible bugfix. sorry for the inconvenience of not testing it and not providing a .patch.

  5. Mike Bayer repo owner

    take a look at 34af3b758de91f6d3ffee5fa31 - you can take your FileLock approach and subclass AbstractFileLock with it, and drop the lock in. An example of usage is provided. I didn't actually test dropping in an alternate lock implemenation, so if there's any problems with my checkin, reopen

  6. Antoine Bertin

    I don't understand why it doesn't bother you to depend on fnctl but it does bother you to depend on some win32 stuff. I wished you could make the default implementation cross platform.

    Anyway, thanks for implementing a clean workaround :) Do you plan on releasing soon?

  7. Mike Bayer repo owner

    if there were a windows-compatible file locking system in the Python std lib, I'd use that, but there's not, which IMO says a lot about the feasibility of locking files on windows.

    portalocker is at version 0.3, says it's "beta", and hasn't had a commit in three years. It apparently has one ticket just opened two days ago that the dev has responded to, so that's good, but it seems to be a very basic issue. So i really have very little indication that portalocker is a widely used and accepted system that I can confidently import in my library knowing that it's the best possible approach. On windows, you probably get much better performance using threading conditions, which is why I also offer that as an option. Windows just doesn't have a single, widely accepted approach for this kind of thing and I have no interest in endorsing one of them.

  8. Antoine Bertin

    I can't test but the fact that there's still "import fcntl" will still throw "ImportError: No module named fcntl" under windows. You should put this under "if os.name == 'posix'"

  9. Log in to comment