Is there any way to force a refresh

Issue #17 wontfix
created an issue

I would like to take full control of my redis cache backend, but I can't find the method in the documentation. I use a cron job to update my database, after a update I want to set the affected data in redis cache as expired. If the data is created by dogpile.cache, how can I make it expired in an elegant way?

All suggestions are welcomed. Thanks

Comments (11)

  1. Michael Bayer repo owner

    we've got invalidate ( which will set an invalidation time for the entire region, if you're looking for the whole thing, and for individual keys you can just delete them (

    It's true there's no method that sets just the invalidation time for a specific value while maintaining the value, because this is an unusual use case. Setting the invalidation but not deleting implies that you'd like clients to fetch the old value while one client gets the job of generating the new value. But in that case, it's more efficient for the function that's telling the cache to invalidate the key to instead just set() the new value at the same time, that way none of the clients need to regenerate anything.

  2. Michael Bayer repo owner

    continuing.... to truly set the invalidation time to expired while maintaining the value is particularly inefficient, in that you need to fetch the whole value from the cache and store the whole thing back in. dogpile stores the invalidation time and the cached value itself as a tuple under a single key. I suppose if we had the option to store the invalidation times separately, that could help, but then that would add significant latency to gets since a get would need to fetch two keys. this is kind of why it's awkward.

    let me know if this clears it up.

  3. sysout reporter

    Thanks mike! It helps me a lot. I am using cache_on_arguments( I found I can use:

    my_function.invalidate(5, 6) #to set expire(which still keeps data in the cache)


    my_function.set(3, 5, 6) #to update the data in the cache.

    If I really want to get rid of some data out of cache, then go for CacheRegion.delete. However, I don't know how to generate the key that used by cache_on_arguments decorator. It's easy to add a method to cache_on_arguments decorator myself, but you may consider add one in the next release.

    Another thing is about the data stored in the cache. I found two problem: serialization overhead and expiration overhead.

    Serialization overhead: If my function returns integer 1, the length of the data stored in redis is 87. If I serialize it with json, I may half the memory usage and double the speed. Is there any way to choose my own serializer/deserializer? I think memory is very expensive, reducing the usage in half may reduce the hosting cost in half.

    Expiration overhead: The expired data are not deleted from the backend. I don't have any experience with other backend, but if let redis to handle the expiration(, it should be very efficient. The expired data will be simply deleted on redis.

    What I want to achieve: My want to cache all my data as JSON compatible object or a list of JSON compatible object. After I query the database with SQL, I simply convert the sql result to JSON format and dump them to the cache. I am using sqlalchemy SQL Expression Language. I found ORM give me little benefit but lots of limitations, so I decide to keep all data in JSON compatible formats.

  4. Michael Bayer repo owner

    well myfunc.invalidate() actually does a "delete" right now.

    the expiration we're doing here is not the same as the Redis expiration. Redis expiration will have it such that the value is gone after that time, so it's usually best to have the dogpile-level expiration be shorter than that of the redis expiration, because we're always trying to leave the "stale" values around so that clients don't all have to wait for the new one.

    as far as serialization, "pickle" is explicit within the redis backend, so if you make yourself a subclass of RedisBackend that has a different get/set that will do it. But this should be configurable, I just made #18 for that.

    not seeing an action item on this ticket here for the moment can we close ?

  5. sysout reporter

    I see. It's all for "dogpile effect". If I simply delete the data. I won't be able to take advantage of what dogpile.cache does for preventing "dogpile effect".

    Let me explain my understanding of dogpile.cache now:

    def get_user_info(userid):
        # This kind of data usually will not create dogpile effect,  
        # so if a user is updating his information, we have to call 
        # get_user_info.invalidate(userid) at the end of user info 
        # updating request. The next call to get_user_info() will 
        # fetch the updated info from database while normally
        # no other thread will call this function during the fetch.
        return database(userid=userid)
    def get_front_page_content():
        # This kind of data may create dogpile effect, so 
        # if we want to update the front page, we should 
        # call region.invalidate() at the end of the updating 
        # request. Next call to get_front_page_content()
        # will fetch the updated info from database while 
        # letting other thread use the stale content during
        # the fetch.
        return database_front_page()

    If I have many "dogpile effect" functions like get_front_page_content(), I should create multiple region for each of them, right?

    BTW, I'm not quite sure about "to truly set the invalidation time to expired while maintaining the value is particularly inefficient, in that you need to fetch the whole value from the cache and store the whole thing back in." Could you please explain it in more details?

    Edit: function like get_user_info should better not using dogpile.cache since dogpile.cache introduced many overhead but the benefit of dogpile.cache will never be utilized by such functions. These function should interact with the cache backend directly. Am I right?

  6. Michael Bayer repo owner

    the region is meant to be kind of a homebase for a certain set of config, so it wasn't intended to be per-key or anything like that, but I suppose if you had a bunch of groups of keys that each expired per-group you could use it that way. the invalidate() method just sets a timestamp locally for that Region object, and causes anything accessed to be checked against it.

    I can add a "set the invaldiation time per key" feature, but this is what it needs to do:

    fetch (cached value, expiration time) from the cache == pulling cached value over the network

    create new tuple of (cached value, new expiration time)

    set (cached vlaue, expiration time) into the cache == pushing the same cached value over the network

    is that clearer ?

  7. Michael Bayer repo owner

    OK. I might want to add a "set the invalidate time for a specific key" feature at some point, though would be nice to figure out how it could work without a lot of overhead...

  8. David Kavanagh

    It seems like I need to use a custom key generator and track the keys (or re-generate) if I wish to be able to delete something previously cached with "cache_on_arguments". It would be fantastic if I could simply call region.delete() with the same args I used in the cache create method (decorated with cache_on_arguments). I also use namespaces, so that just adds one more thing going into the key.

  9. Michael Bayer repo owner

    @David Kavanagh take a look at invalidating a group of related keys which attempts to provide a recipe for this, or perhaps that's what you're referring to. I'm not sure if you're just asking that this tracking of keys be implicit within CacheRegion? I don't really know how to go about that as the recipe here illustrates a tight coupling between the specific key generator and the functions that are being decorated. a more open ended system of any kind of key generator and a wide variety of heterogeneous functions would not lend itself very simply to this feature and would likely require complicated and error prone configurational steps. The advantage of a recipe is that it makes the mechanism clear and the behavior available, in the absence of a sophisticated system of performing this generically and automatically.

  10. Log in to comment