exposing a region's configuration/lock for custom functionality

Issue #125 new
created an issue

with some advanced Redis usage, I need to handle caching certain keys in ways that are incompatible with dogpile.cache as-is, and must use the dogpile.core lock directly. at the same time, I still want to leverage my dogpile region and it's configuration.

by "incompatible", i mean various mechanisms that can't be fixed with a custom backend or Proxy. In one example, a key corresponds to a redis hash with messages for a user, and contains at least 3 values (count unread, count read, cached_ids):

messages|{user_id} count_read  = 10
messages|{user_id} count_unread = 5
messages|{user_id} cached_ids = (1,2,3,4,5,6,7,8,9,10)
messages|{user_id} msg-1 = {foo: bar}
messages|{user_id} msg-2 = {foo: bar}
messages|{user_id} msg-3 = {foo: bar}

getting/updating the message counts requires writing/validating two keys at once (count_read, count_unread). various functions need to happen within a redis pipeline as well.

to handle this sort of operation, I am essentially leveraging the code below - which is based on the getcreate logic

would it make sense to integrate something like this into the CacheRegion itself?

    import logging
    log = logging.getLogger(__name__)

    from dogpile import Lock, NeedRegenerationException
    from dogpile.cache.api import NO_VALUE

def raw_getcreate(region, key, f_get, f_gen, f_async=None):
    # key is needed for the lock

    def _get_value():
        if __debug__:
        value = f_get()
        if value in (None, NO_VALUE):
            raise NeedRegenerationException()
        return value, -1

    def _gen_value():
        if __debug__:
            if __debug__:
                log.debug("raw_getcreate._gen_value | first, try fetching")
            return _get_value()
        except NeedRegenerationException as e:
        if __debug__:
            log.debug("raw_getcreate._gen_value | create")
        created_value = f_gen()
        return created_value, -1

    with Lock(
            None,  # expiration_time, never expire
            f_async,  # async creator, unused
    ) as value:
        return value

reg = CACHE_REGIONS['foo']
key = "testkey-2"

def f_get():
    value = reg.backend_actual.get(key)
    return value

def f_gen():
    created_value = "NEWDATA"
    uploaded = reg.backend_actual.set(key, created_value)
    return created_value

print raw_getcreate(reg, key, f_get, f_gen)

Comments (10)

  1. Michael Bayer repo owner

    are you still identifying all those things by one "key" at the top? if there are still "get" and "set" operations of some kind, why cant the custom backend look up all keys, combine them together, call wahteevr, etc.? the CacheRegion is already oriented around having all the real options dependency-injected.

  2. jvanasco reporter

    This type of functionality doesn't really work well as a method of a custom region or backend once you scale the amount of regions beyond a certain point. I'm not sure what that point is, but one deployment has 23 regions using 3 backends right now; implementing the raw lock access for custom getcreates into custom regions would just be a nightmare to manage.

    in typical usage, i've got about 5 unique methods/keys which will share a single key for locking. in the example above, editing any hashfield under messages|{user_id} will lock the key messages|{user_id}. [this is newly a hash, an earlier version used flat keys like messages|{user_id}|ids but I needed to squeeze some more performance out of redis].

    this is definitely an advanced case and I don't need a solution for myself - I have a functional workaround.

    the getcreate logic in dogpile's region and lock is really brilliant, but it seems limited in use because it's not pluggable.

  3. Michael Bayer repo owner

    so the problem is the number of regions and that's why you need to pass the functions in at get() time? clarifying the region is fully pluggable, just not on a per-get basis? or is there a new kind of functionality that cant be done here period with a plugin. I don't understand what the number of regions has to do with anything.

  4. jvanasco reporter

    tldr: the barrier to simply using a custom backend is the need to use different keys for the Lock template and Cached values.

    or is there a new kind of functionality that cant be done here period with a plugin.

    Mostly this.

    I don't understand what the number of regions has to do with anything.

    In a given region, a getcreate on 5 different backend keys [Key1, Key2...Key5] needs to create a lock on the same key KeyA.

    Option 1: use 5 identically configured regions, which getcreate on KeyA to make the lock, and use 5 different custom backends to handle the get/set

    Option 2: use 5 custom regions, which getcreate on Key1-Key5 but share a lock on KeyA, and use 1 standard backend.

    Because the backends handle plugability of the underlying datastore, it would make more sense to implement this with a custom CacheRegion.

  5. Michael Bayer repo owner

    I dont understand yet why the custom backend can't do all that work, delegating to a real backend to get to the cache itself. your backend gets the key "messages|user_id" and then does waht it needs to get all those things.

    is it because the individual getcreate() has to call into additional independent getcreate() steps? e.g. one lock operation has to actually involve more dogpile locks? The region implementation can call back to the owning cache region itself if it wanted to, or into another one.

    put another way, the API you ask for is:

    region.get(key, f_get, f_gen)

    can't possibly work if it was:

    region = CacheRegion(f_get=f_get) region.get(key, f_gen=f_gen)

    right? That's essentially what this seems to ask

  6. jvanasco reporter

    It's because multiple backend getcreates need to map onto a single shared key when locking:

    value = reg.getcreate("key1")
    value = reg.getcreate("key2")
    value = reg.getcreate("key3")

    On a cache miss, these all need to use the same getcreate lock as KeyA

     - with Lock(key="key1", get("key1"), fgen("key1")) as value:
     + with Lock(key="KeyA", get("key1"), fgen("key1")) as value:
     - with Lock(key="key2", get("key2"), fgen("key2")) as value:
     + with Lock(key="KeyA", get("key2"), fgen("key2")) as value:
     - with Lock(key="key3", get("key3"), fgen("key3")) as value:
     + with Lock(key="KeyA", get("key3"), fgen("key3")) as value:

    Within CacheRegion.get_or_create, the get_value, gen_value and the lock's _mutex all use the same key.

    Does this make sense? It's not an issue with get or set, but being able to use a specific string for the lock on getcreate

  7. Michael Bayer repo owner

    I mean, basically the answer is no I really don't want to blow up the existing get_or_create as far as its structure, it's complicated enough, to explode that structure even more would mean we really have to have the use case nailed down very clearly becasue somoene else might have another use case we aren't hitting.

  8. jvanasco reporter

    I'm willing to PR a custom lock_key override, or anything else, if it seems like a good idea for the general public.

    Maybe just leave this ticket open for a while and see if any other use-cases bubble up first?

  9. Log in to comment