Issue #62 new

idea for time issues/features

jvanasco
created an issue

I came up with a use-case and idea that might tie together a few existing tickets.

It also might be a terrible idea.

Existing Tickets -

https://bitbucket.org/zzzeek/dogpile.cache/issue/37/expose-cache-age https://bitbucket.org/zzzeek/dogpile.cache/issue/45/update-expiration-time-on-get

Use Case -

Given:

A) We have some "write" operations that are more frequent on some objects than others. B) Our objects can be expensive to generate C) We want to balance performance and clarity of code

Two options come to mind:

1) use multiple cache keys: high-write area, low-write area
2) use a single object, update it

The first option can be less-readable

The second option has the caveat that writing will extend the cache expiry.

The general idea I have is this:

get_raw returns a CacheHit object that has attributes for payload and timestamp_expiry. it possibly has an attribute for timestamp_last_update.

CacheHit (or dogpile) has a method for soft_update -- which will set a modified payload without updating the expiry. alternatively, a new expiry time could happen as well.

This would allow people to keep the original expiry time ( let's say 10 minutes ) but have the ability to "update" the value of the payload within that time. They payload would still expire in 10minutes ( unless explicitly extended ).

In the use-cases of comments and surveys, this might allow a developer to increment the 'count' of respondents many times over the span of a minute... yet still require a sync to the backend datastore every 10 minutes.

Comments (5)

  1. Mike Bayer repo owner

    The region.backend.get and region.backend.set methods give you the CachedValue object. You can write a new one that keeps the original creation time.

  2. jvanasco reporter

    This is weird, I see 3 responses in my email, but your detailed example isn't on bitbucket...

    My idea is that the object would be split into 2 k/v payloads:

    • read-only ( core data )
    • high writes ( number of respondents )
    

    They could be in a single region or multiple region. either way, 2 keys would be needed.

    If the backend.get/set methods allow direct CachedValue access, writing this functionality could be entirely in 'userland' without library modification.

    a detailed use-case would be something like this...

    i'll use the term "UPDATE" to describe the functionality of preserving the original creation time

    # cache region default is 5:00
    
    2014-05-20 12:00:00 - GET "count_responses:1" # Fails
    2014-05-20 12:00:01 - SET "count_respsonse:1" = 100 # set to 100; cache is set to 5:00
    
    2014-05-20 12:00:30 - GET "count_responses:1" # returns 100
    2014-05-20 12:04:30 - UPDATE "count_responses:1" = 101 # increment by 1
    
    2014-05-20 12:00:31 - GET "count_responses:1" # returns 101
    2014-05-20 12:04:31 - UPDATE "count_responses:1" = 102 # increment by 1
    
    2014-05-20 12:00:32 - GET "count_responses:1" # returns 106
    2014-05-20 12:04:32 - UPDATE "count_responses:1" = 107 # increment by 1
    
    2014-05-20 12:05:30 - GET "count_responses:1" # Fails 
    2014-05-20 12:00:01 - SET "count_respsonse:1" = 110 # set to 110.  because of race conditions in a clustered environment, we only incremented to 107.  we probably incremented 10 times, but 3 of those used stale data.
    

    I think that approach could handle the issues that sontek brought up in #37 and #45

    I got pinged with some bitbucket notices last week from those threads, and i foresee similar needs, so I've been thinking of ways to tackle the problem.

  3. Mike Bayer repo owner

    the detailed example is not here because I deleted it, it doesn't work :) you still need to be able to game the "created" timestamp in the cache.

  4. Mike Bayer repo owner

    OK der I think what I had does work, very hard to get my head around these. If it is 12:30, and you updated the cache at 12:28, the "modulus" approach will have it such that the value will be invalidated. If OTOH the cache was updated at 12:31, and it is now 12:32, then the value will not be invalidated until 12:40 - but that is fine, right?

    Here's what it was:

    def every_ten_minutes():
        ten_minutes = 60 * 10
    
        return time.time() % ten_minutes
    
    @region.cache_on_arguments(expiration_time=every_ten_minutes)
    def my_expensive_thing(x, y):
        return expensive_lookup(x, y)
    

    the test I ran to show the motion was:

    >>> while True:
    ...     print datetime.datetime.today(), time.time() % 600
    ...     time.sleep(30)
    ... 
    2014-05-20 18:07:39.896828 459.896888018
    2014-05-20 18:08:09.897290 489.897324085
    2014-05-20 18:08:39.896959 519.896992922
    2014-05-20 18:09:09.896386 549.896426916
    2014-05-20 18:09:39.896064 579.896094084
    2014-05-20 18:10:09.896433 9.89649200439
    2014-05-20 18:10:39.896937 39.8969950676
    2014-05-20 18:11:09.897409 69.8974680901
    2014-05-20 18:11:39.897835 99.8978750706
    

    that is, at 18:10, nothing that is older than 18:10 can survive, even if the cache was just updated the previous minute.

    When i first read this issue this is what came to me in an insight and then as I was typing it out I lost it :). But I think this works?

  5. Log in to comment