Issue #38 resolved

invalidate entire cache

created an issue

there's no good way to invalidate an entire cache / region.

it would be nice if there were.

Comments (21)

  1. jvanasco reporter

    Yeah, i'm concerned with resetting the cache(s) of a long-running process(es) without restarting the process(es).

    With a DBM based cache, if I want to drop the cache it seems I can just delete the dir and then run a script to re-generate the cache files. That doesn't seem to cause too many error. Not sure how to handle memcached, etc. Cycling the cache backend tends to cause errors.

    The best way to handle invalidating memcached without dogpile errors seems to be site-stop, memcached off, memcached on, sleep(5), site-start

  2. morgan_fainberg

    In theory, it would be possible to have the backend support a site-wide invalidate without too much extra code. Just make the current CacheRegion.invalidate() check to see if the backend has a similar method, call that. Have the backend store (in it's actual store) a special key that indicates that anything older than <timestamp> is invalid.

    I think that would be a reasonable feature add. It would add another lookup (perhaps something that could be done on an interval and is stored in a local var) to verify cache validity.

  3. Mike Bayer repo owner

    OK seems like you're talking about two things. "if the backend has a similar method", I guess you mean if the backend is a dict, we want dict.clear() type of thing, we have an existing convention for backend-specific features which is that you call it from the backend directly:


    now, if the feature is instead "Have the backend store (in it's actual store) a special key that indicates that anything older than <timestamp> is invalid", that's not specific to a backend, that could be done agnostically with CacheRegion. What I don't like about it is that it's slow, adds an extra cache hit to all operations. If we turn it off, we're cluttering up CacheRegion with ever more conditionals to suit use cases that are extremely rare (I'd never need a feature like this). I'd like to explore first how CacheRegion could allow extensibility in ways like this without cluttering it up, then the "augment all cache operations with an explicit invalidation key check" can be an extension feature in a separate module.

  4. morgan_fainberg

    I was actually thinking of the same mechanism as the current CacheRegion invalidate.

    With regards to something like dict.clear(), I think it is useful to pass that on as a utility for cache invalidation on that backend, but I see that as a one-off not as a globally acceptable mechanism (based upon how the back ends work).

    But that being said, I agree, you don't want the overhead of having to do that lookup every time. The mechanism to load in that specific "invalidate" information would need to be smarter than "check if invalidate is set, load, then check cache". I'm not yet sure how I would approach this in a universally acceptable way.

    Allowing elegant extension use is never a bad idea (in my opinion).

  5. Mike Bayer repo owner

    you either have to check that invalidate key every time, or you can "box" it by having a function that looks at the current time, and on a per-region basis only checks the "invalidate" key every N seconds. So a very active cache region would not be doing this second hit more than every N seconds. A not very active region would be doing the hit for a majority of accesses, but it's not active so not a big deal.

    its definitely logic I'd want to have "somewhere else", and nicely tested in isolation against a mock backend.

  6. Mike Bayer repo owner

    a simple hook to CacheRegion here would be that it consults some injected function in order to get at the "invalidation time" value.

  7. n01s3

    I'm not sure if this is the best place to ask, but I'm using async_runner to repopulate my cache (memory backend) in the background, and found that calling region.invalidate() forces the next call to do a synchronous/blocking repopulate. I've hunted around but can't find a good way to invalidate the whole region in a way that will continue to allow serving stale data while repopulating via async_runners. Is this possible with the current implementation?

  8. Mike Bayer repo owner

    that's a great point, as invalidate() was written to just force a regen immediately. I've broken it out into "hard" and "soft" options in 138d3d7fa9b9ff97a01b2b74c5cac48 where you can see that a "soft" invalidation does the invalidate by faking the creation time to be "now - expiration time", rather than raising a NeedRegen or returning a hard "0" value for creation time. I haven't tested this in an integration context (e.g. with multiple threads), please let me know if this flag solves this issue for you.

  9. n01s3

    That worked beautifully and saves me a bunch of work. Thanks again for the quick fix.

    For anyone who later finds this, the use-case I'm using it for:

    1. cache a ton of occasionally changing game metadata from the DB in memory (per app process) so many operations require 0 DB queries.

    2. when someone updates the data via the admin tool, signal app processes to invalidate the cache region (currently done by polling a 'last_update' in the DB (also async and cached for N secs), later to be via pub/sub)

    3. allow serving of stale content while querying the db in the background to refresh the cache, so no requests get hit w/the query lag.

  10. zoomorph

    When running multiple forked processes, you have to invalidate in every process because it doesn't actually delete or invalidate the keys from the backend. Would it be possible to delete an entire region from the backend, and if so could a flag or separate method be added to accomplish this?

  11. Mike Bayer repo owner

    zoomorph it sounds like you're going back around to the beginning of the ticket here. Backends like memcached or redis don't have a keys() function that we could use to "delete the entire region". Hence we do it with invalidation timestamps instead. Those are currently local to a specific Python process that sets that up, but the notion here is, hey lets get that invalidation time from the server instead. great ! but how do we do that and not double our cache accesses, how do we do it without messying up the dogpile internals too much? one answer right now is that each app queries the datastore periodically, like with a background thread, for a single "invalidation" timestamp, and sets it up as needed using region.invalidate(). So this can be rolled entirely on the outside - though that doesn't mean we can't add some helpers or at least examples in the recipes section that talk about this.

  12. n01s3

    zoomorph The way I handle this is with redis pub/sub. Each process has a redis sub on a cache.purge channel. To purge, publish cache.purge <region name> and each process listens for that message and calls region.invalidate() locally.

  13. jvanasco reporter

    A while back I thought about handling this with a custom ProxyBackend-

    Create an 'invalidation' ProxyBackend; calls to 'get' first check for an invalidation timestamp. Then you override get to take this value into account.

    This value could probably only be hit it periodically, and cached into memory

    The tricky part though, is this proxy backend would have to hit a different region : - It should never expire ( or at least expires 1.x longer than the 'invalidated' backend ) - requests can't use this ProxyBackend or a loop would form

    I think the logic would be something like :

    APP - value = Region1.get("Value1")

    ProxyBackend - _invalidated = Region2.get("Invalidated-Region1") if not _invalidated or not _invalided.not_timely : return get(key) else: return NO_VALUE

    I ended up not implementing this, because it was easier to construct the app not to have to deal with stuff like this.

    The only time /we/ would necessarily need to refresh an entire region or 'unknown' keys, occurs on an app deployment. in those cases, we plan for a downtime longer than a cache expiry. There's also a backup in place to use key_mangler to version the key name.

  14. Log in to comment