Option to fail quickly instead of blocking when initial generation is happening?

Issue #67 new
Marc Abramowitz
created an issue

Current behavior is that if the cache key is empty or invalidated, the first worker will start regenerating and other workers will block until the first worker releases the dogpile lock. (for an expired key, the workers don't block; they return the expired value).

I'm wondering if it makes sense to have an option to make the workers fail quickly instead of blocking. I hate to tie up Python workers that are just waiting for a lock.

I'm doing it in app code now by essentially doing a get and checking for NO_VALUE and if NO_VALUE is there, I call set with a little busy error response and then call the original function and then cache the value that comes back from it. So the first worker will go through this and will set the little busy error response so that other workers will see that until the regen is complete and the new value is cached.

I'm wondering if a pattern like this could be put in dogpile.cache?

I'd be willing to take a stab at a PR. I just want to make sure it's something that you might be interested in before I start coding. Or maybe there is already an even easier pattern to do this?

Comments (6)

  1. Marc Abramowitz reporter

    I'm imagining maybe something where in addition to get_or_create there is get_or_set -- instead of taking a creator_func, it takes a default value to stick in the cache that expires immediately.

  2. Marc Abramowitz reporter

    Hmmm, maybe get_or_set is the wrong semantics, because we want a code path where the first worker calls the creator_func and subsequent workers return the default value (much like what happens with expired values).

    So now I'm thinking that there could be a default_value parameter to get_or_create.

  3. Michael Bayer repo owner

    so....you still want the error raise? if so, you can inject this just by using a custom lock with a zero-length timeout.

    default_value sounds a little more workable.

  4. Marc Abramowitz reporter

    Yeah I think default_value might be the best.

    Going to experiment a little with my app here. Things are a little complex here because we use an internal library on top of dogpile.cache (smlib.cache). That library is wrapping dogpile.cache and adding a couple of goodies (we may want to discuss with @s about which things in that library could go upstream into dogpile.cache, if you want them).

  5. Michael Bayer repo owner

    somehting like

    class GiveUpLock(object):
        def __init__(self):
            self.lock = threading.Lock()
        def acquire(self, wait=True):
            if not self.lock.acquire(wait):
                # add time.sleep() here, a loop, if you want a timeout of 
                # .5 sec or something 
                raise Exception("couldn't get lock")!
  6. Log in to comment