idea for time issues/features

Issue #62 new
created an issue

I came up with a use-case and idea that might tie together a few existing tickets.

It also might be a terrible idea.

Existing Tickets -

Use Case -


A) We have some "write" operations that are more frequent on some objects than others. B) Our objects can be expensive to generate C) We want to balance performance and clarity of code

Two options come to mind:

1) use multiple cache keys: high-write area, low-write area
2) use a single object, update it

The first option can be less-readable

The second option has the caveat that writing will extend the cache expiry.

The general idea I have is this:

get_raw returns a CacheHit object that has attributes for payload and timestamp_expiry. it possibly has an attribute for timestamp_last_update.

CacheHit (or dogpile) has a method for soft_update -- which will set a modified payload without updating the expiry. alternatively, a new expiry time could happen as well.

This would allow people to keep the original expiry time ( let's say 10 minutes ) but have the ability to "update" the value of the payload within that time. They payload would still expire in 10minutes ( unless explicitly extended ).

In the use-cases of comments and surveys, this might allow a developer to increment the 'count' of respondents many times over the span of a minute... yet still require a sync to the backend datastore every 10 minutes.

Comments (5)

  1. jvanasco reporter

    This is weird, I see 3 responses in my email, but your detailed example isn't on bitbucket...

    Using the previous example of a Survey, my idea is that the conceptual object would be split into 2 K/V payloads:

    • read-only ( The core survey data )
    • high writes ( The number of respondents )

    They could be in a single region or multiple region. Either way, 2 keys would be needed for splitting up the data.

    If the backend's get/set methods allow for direct CachedValue access, writing this functionality could be entirely in 'userland' without library modification.

    A detailed use-case would be something like this...

    i'll use the term "UPDATE" to describe the functionality of preserving the original creation time

    # cache region default is 5:00
    2014-05-20 12:00:00 - GET "count_responses:1" # Fails
    2014-05-20 12:00:01 - SET "count_respsonse:1" = 100 # set to 100; cache is set to 5:00
    2014-05-20 12:00:30 - GET "count_responses:1" # returns 100
    2014-05-20 12:04:30 - UPDATE "count_responses:1" = 101 # increment by 1
    2014-05-20 12:00:31 - GET "count_responses:1" # returns 101
    2014-05-20 12:04:31 - UPDATE "count_responses:1" = 102 # increment by 1
    2014-05-20 12:00:32 - GET "count_responses:1" # returns 106
    2014-05-20 12:04:32 - UPDATE "count_responses:1" = 107 # increment by 1
    2014-05-20 12:05:30 - GET "count_responses:1" # Fails 
    2014-05-20 12:00:01 - SET "count_respsonse:1" = 110 # set to 110.  Because of race conditions in a clustered environment, we only incremented the value to 107, even though we probably incremented 10 times.  The reason is because many writes may have operated on stale data often (ie, 3 clients try to increment 105->106, instead of 105->106, 106->107, 107->108)

    I think this approach could handle the issues that @sontek brought up in #37 and #45

    I got pinged with some bitbucket notices last week from those threads, and i foresee similar needs, so I've been thinking of ways to tackle the problem.

    [2015-11] Clarified the above example

  2. Michael Bayer repo owner

    OK der I think what I had does work, very hard to get my head around these. If it is 12:30, and you updated the cache at 12:28, the "modulus" approach will have it such that the value will be invalidated. If OTOH the cache was updated at 12:31, and it is now 12:32, then the value will not be invalidated until 12:40 - but that is fine, right?

    Here's what it was:

    def every_ten_minutes():
        ten_minutes = 60 * 10
        return time.time() % ten_minutes
    def my_expensive_thing(x, y):
        return expensive_lookup(x, y)

    the test I ran to show the motion was:

    >>> while True:
    ...     print, time.time() % 600
    ...     time.sleep(30)
    2014-05-20 18:07:39.896828 459.896888018
    2014-05-20 18:08:09.897290 489.897324085
    2014-05-20 18:08:39.896959 519.896992922
    2014-05-20 18:09:09.896386 549.896426916
    2014-05-20 18:09:39.896064 579.896094084
    2014-05-20 18:10:09.896433 9.89649200439
    2014-05-20 18:10:39.896937 39.8969950676
    2014-05-20 18:11:09.897409 69.8974680901
    2014-05-20 18:11:39.897835 99.8978750706

    that is, at 18:10, nothing that is older than 18:10 can survive, even if the cache was just updated the previous minute.

    When i first read this issue this is what came to me in an insight and then as I was typing it out I lost it :). But I think this works?

  3. Log in to comment