Add a cachetools LRU + LFU in-memory backend

#32 Declined
Repository
harlowja
Branch
add_cachetools_backend
Repository
zzzeek
Branch
master
Author
  1. Joshua Harlow
Reviewers
Description
No description

Comments (26)

  1. Mike Bayer repo owner

    OK, if you copy tests/cache/test_memory_backend.py to test_cachetools_backend, change the names inside and run it....how do we do ?

  2. Pior Bastida

    Tests are passing

    (dogpile.cache)pior@misaki:~/dev/externals/dogpile.cache ((de34577...))$ nosetests tests.cache.test_cachetools_backend
    nose.config: INFO: Set working dir to /home/pior/dev/externals/dogpile.cache/tests
    nose.config: INFO: Working directory /home/pior/dev/externals/dogpile.cache/tests is a package; adding to sys.path
    nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
    nose.plugins.cover: ERROR: Coverage not available: unable to import coverage module
    #2 test_backend_delete (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #3 test_backend_delete_nothing (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #4 test_backend_get_nothing (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #5 test_backend_set_get_value (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #6 test_decorated_fn_functionality (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #7 test_exploding_value_fn (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #8 test_region_creator (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #9 test_region_delete (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #10 test_region_delete_multiple (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #11 test_region_expire (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #12 test_region_get_empty_multiple (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #13 test_region_get_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #14 test_region_get_nothing_multiple (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #15 test_region_get_zero_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #16 test_region_set_get_nothing (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #17 test_region_set_get_value (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #18 test_region_set_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #19 test_region_set_zero_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #20 test_region_set_zero_multiple_values_w_decorator (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #21 test_threaded_dogpile (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #22 test_threaded_get_multi (tests.cache.test_cachetools_backend.CachetoolsLFUBackendTest) ... ok
    #23 test_backend_delete (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #24 test_backend_delete_nothing (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #25 test_backend_get_nothing (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #26 test_backend_set_get_value (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #27 test_decorated_fn_functionality (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #28 test_exploding_value_fn (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #29 test_region_creator (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #30 test_region_delete (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #31 test_region_delete_multiple (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #32 test_region_expire (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #33 test_region_get_empty_multiple (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #34 test_region_get_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #35 test_region_get_nothing_multiple (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #36 test_region_get_zero_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #37 test_region_set_get_nothing (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #38 test_region_set_get_value (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #39 test_region_set_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #40 test_region_set_zero_multiple_values (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #41 test_region_set_zero_multiple_values_w_decorator (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #42 test_threaded_dogpile (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    #43 test_threaded_get_multi (tests.cache.test_cachetools_backend.CachetoolsLRUBackendTest) ... ok
    
    ----------------------------------------------------------------------
    Ran 42 tests in 19.837s
    
    OK
    
  3. Pior Bastida

    @harlowja Is there a reason to use cachetools.LRUCache rather than cachetools.TTLCache (which is LRU + expiration) ?

      1. Joshua Harlow author

        Eck, I hope not, seems useless imho. Its always seemed wasteful pickle things into memory, then unpickle them back out of memory (which seems more like just a validation that things stored can be pickled, vs. a useful feature that is required for this to work, and IMHO should be some other validation plug-in, not built-in to this layer).

  4. Pior Bastida

    @harlowja, I have two internal projects in need of dogpile.cache + a memory / LRU / ttl cache. Do you mind if I submit a PR with your code, augmented with TTLCache and a pickle variant?

    1. Pior Bastida

      I mentioned the "projects in need" only to explain why I would "take over" the PR.

      My understanding is that this need (a full features memory backend) is quite common.

      Maybe the definition of the "full features memory backend" is what you dislike?

  5. Mike Bayer repo owner

    the issue I have is that I made the "add your own backend" thing extremely simple to use, so that I wouldn't have to be tasked with reviewing endless amounts of pull requests, bugs, features, API inconsistencies, testing overhead etc. with an open-ended number of cache backends. I'd prefer if systems like cachetools just supplied their own dogpile.cache entrypoints, in fact.

  6. Mike Bayer repo owner

    @harlowja 's issue with that is, now he has to have dogpile-cache and cachetools in his requirements.txt. But that's not really any different from needing to have sqlalchemy and mysql-python in your req's either.

  7. Pior Bastida

    So... I'm a bit lost here. Has this PR (or another similar PR) some chances to be merged into dogpile.cache or not?

  8. Mike Bayer repo owner

    if its too much trouble, e.g. as it is, there's still no "pickle" support, still no TTLCache, whatever options that needs, and as always keeping the options consistent and not making bad choices that we regret (which are very expensive) etc., and I have to constantly be pulling these in, reviewing, releasing, it makes me lean much more towards having someone else just deal with it.

  9. Mike Bayer repo owner

    look at it this way. cachetools apparently has a lot of options. It might be adding new options all the time. if that's the case, doesnt it seem more appropriate that the modifcations to the dogpile API for it fall on that project? For backends like redis, memcached, things are simple, those don't change their APIs. but cachetools seems to have lots of switches. I really don't want to maintain a copy of someone's complicated and in-flux collection of switches.

    1. Pior Bastida

      I totally understand your position about managing a project where everyone will want their own small custom options. I won't argue about cachetools being the right tools for this "Full Featured Memory backend".

      However, I still think that dogpile.cache would be better with a real memory backend.

      IMHO, starting with a (good enough) per-process cache and switch to a centralized cache when needed is a common practice. Doing so by only changing the config and requirements is great.

  10. Mike Bayer repo owner

    we have a super simple "memory" cache already present, it just doesn't manage size or anything like that. dogpile.cache does not want to be in the business of building out backends, it's just glue.

    the cachetools backends here seem to be nothing more than dictionary objects, and MemoryBackend already has a "cache_dict" argument. So you don't even need a backend for this to be usable. Just pass a cachetools.WhateverBackendYouWant() to MemoryBackend.

    What I don't want to get into is mirroring the config options of some third party system, or doing things like setting defaults. There's a bunch of that in this PR right now. All the cachetools arguments should be straight passthroughs without any notion of them here.

    1. Pior Bastida

      About you not wanting to mirror a 3rd party system options: I think we didn't have the same idea of this new backend. I was seeing an opinionated FullFeatureMemoryBackend, where YOU decide what option relevant or not), and where cachetools is an internal implementation detail (and could be something else.

      However I missed that we can just pass a cache instance to MemoryBackend, and it's clearly good (and flexible) solution. Thank you for the discussion!

  11. Mike Bayer repo owner

    yeah, I think an opinionated backend is useful, but w/ dogpile.cache I set out with the very specific goal of not doing that here. we're an architectural pattern that makes use of cache backends, that's our business here; if we're an architectural pattern, and an opinionated thing on top of caching backends, now we're doing two things, and they're now coupled too. I encourage the community to release all the opinionated backends they want (which of course is not as appealing b.c. you don't get the free publicity).

    Maybe I should have even started day one with no actual backends present, and just released dogpile.cache.memcached and dogpile.cache.redis. that actually might be a good direction.

  12. Paul Brown

    Here's my first attempt at fixing the race condition issue:

    import pickle
    from threading import RLock
    
    from dogpile.cache.api import NO_VALUE
    from dogpile.cache.backends import memory
    
    _default_max_size = 1024
    
    
    class LRUMemoryBackend(memory.MemoryBackend):
        def __init__(self, arguments):
            self._imports()
            self.lock = RLock()
            max_size = arguments.pop('max_size', _default_max_size)
            get_size_of = arguments.pop('get_size_of', None)
            arguments['cache_dict'] = cachetools.LRUCache(max_size,  # noqa
                                                          getsizeof=get_size_of)
            super(LRUMemoryBackend, self).__init__(arguments)
    
        def _imports(self):
            # defer imports until backend is used
            global cachetools
            import cachetools  # noqa
    
        def get(self, key):
            with self.lock:
                value = self._cache.get(key, NO_VALUE)
    
            if value is not NO_VALUE and self.pickle_values:
                value = pickle.loads(value)
            return value
    
        def get_multi(self, keys):
            with self.lock:
                ret = [self._cache.get(key, NO_VALUE) for key in keys]
    
            if self.pickle_values:
                ret = [
                    pickle.loads(value)
                    if value is not NO_VALUE else value
                    for value in ret
                ]
            return ret
    
        def set(self, key, value):
            if self.pickle_values:
                value = pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
    
            # prevent race conditions when two threads need to evict the same key
            with self.lock:
                self._cache[key] = value
    
        def set_multi(self, mapping):
            pickle_values = self.pickle_values
            for key, value in mapping.items():
                if pickle_values:
                    value = pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
    
                with self.lock:
                    self._cache[key] = value
    
        def delete(self, key):
            with self.lock:
                self._cache.pop(key, None)
    
        def delete_multi(self, keys):
            with self.lock:
                for key in keys:
                    self._cache.pop(key, None)
    

    The normal key-based locking isn't sufficient because it needs to lock the entire self.cache before any values are set (when it could potentially be popping keys off of the LRU cache).

  13. Mike Bayer repo owner

    I dont have the resources to maintain this backend. Why can't this be a third party plugin ?

  14. Paul Brown

    Making it a third party plugin sounds good to me. Since this PR was still open, I just wasn't sure if there was still a chance of it being merged.

    The main counter argument to making it a third party plugin is that it makes it more difficult to find, but I think we can fix that by adding a note to the dogpile readme about it.

  15. Mike Bayer repo owner

    I'm fine w that.

    closing this b.c. it seems like a 3rd party plugin would better handle maintenance

  16. Paul Brown

    @harlowja do you want to be the one to create the repo for this? I'm up for creating the repo too.