add decorated_fn.refresh(*args) feature

Lx Yu avatarLx Yu created an issue

do dogpile.cache have this feature yet:

one thread refresh the cached value while others directly return the old?

so I can update the cache in the background without slowing down the response time.

Comments (22)

  1. Lx Yu

    didn't notice it already exists!

    It works great except one thing: it seems I can't manually expire a value.

    Maybe we can change the invalidate behavior to expire rather than delete? Or may be add a expire func?

  2. Lx Yu

    The problem I'm solving is this:

    I'm working on a server-client model based on thrift, sometimes I need to force ignore the cache and get a real-time value when some events happen, and if I got a real-time value, why not just refresh the cache in succession.

    So how about a more specific feature, to get a non-cached value from a decorated function and refresh the cache in succession?

    maybe new_value = some_decorated_func(*args, refresh=True) or just new_value = some_decorated_func.refresh(*args)?

  3. Lx Yu

    wow you're sooo fast!

    I've prepared to give a pull request today, and you're faster than me again. ;)

    And, glad to see the repo migrate to git!

  4. Lx Yu

    Hi, I've monitored that in some rare cases, the refresh may not work as expected under distributed env.

    Take the following example, suppose we have 2 thread calling refresh.

    thread-a comes at 0s and somehow returns the value at 3s.

    thread-b comes at 1s and returns value at 2s.

    Then what if the value changed at 0.5s? The final cache is flushed to an expired one.

    So maybe it require a lock in refresh too?

  5. Lx Yu

    I think we may change the _value behavior here.

    Currently the _value recored when the value generated with time.time().

    def _value(self, value):
            """Return a :class:`.CachedValue` given a value."""
            return CachedValue(value, {
                                "ct": time.time(),
                                "v": value_version
                            })
    

    While the enter-time may make more sense here. For example we entered at 0s and takes 2s to generate the value, we may set the ct to 0s here.

    In this way, we know which value is newer under concurrency.

  6. Lx Yu

    ok, since the pullreq 1 declined, I'll resolve the #2 part later. But as creation_time is only needed in refresh, I think we may need more detail discussion about the refresh feature first.

    About why mutex in refresh. Refer to comments above, the refresh may have concurrent issue when the first-enter returns later, which cause the correct cache be replaced with a wrong old one. So it's not simple as set, what we shall do here, is check the creation_time, if the creation_time is older, don't write back to cache. While this is not an atomic operation, we need lock. And it's not a "do a get_or_create with a guaranteed expiration", it's a force refresh.

    And the refresh don't support class method, I still can't find a solution, invalidate seems to have this issue as well.

  7. Mike Bayer

    About why mutex in refresh. Refer to comments above, the refresh may have concurrent issue when the first-enter returns later, which cause the correct cache be replaced with a wrong old one.

    I'd really like all concurrency issues addressed by dogpile.cache to go through get_or_create(). you can send an expiration time of "now" or "the past" and that would guarantee a regeneration. Anything that's in the "decorator" method is just convenience on top of this. As far as "force", in your pull req, it looks like you call the creator unconditionally, but then you don't actually do the "set" if someone else got to it already. So that just seems like a more broken form of what get_or_create already does (runs only one creator guaranteed).

    And the refresh don't support class method, I still can't find a solution, invalidate seems to have this issue as well.

    I think you're talking about https://bitbucket.org/zzzeek/dogpile.cache/issue/24/cache-invalidation-for-class-or-instance. separate issue.

  8. Lx Yu

    do you mean toggle a refresh by a get_or_create with expiration_time=0? ok I'll try that then. Just dived too deep in cache_on_arguments decorator and didn't think of this.

    If in this way, the creation_time is no longer needed. yah, seems great.

  9. Lx Yu

    have tried that, but it not working the same.

    when use get_or_create in refresh, it works as dogpile, and forbid parallel calling. If we have 10 concurrent refresh calling and the first one somehow blocked or timed out, all the following calling timed out.

    what I want to achieve is, refresh always return the non-cached real value, and auto refresh the cache to the newest state.

    While we may have many calling of refresh at the same time, it's not dogpile refresh, but a refresh that support concurrent calling.

  10. Lx Yu

    it looks like you call the creator unconditionally, but then you don't actually do the "set" if someone else got to it already.

    it's not ' if someone else got to it already', it compares the enter time of refresh calling to solve this problem:

    Take the following example, suppose we have 2 thread calling refresh. thread-a comes at 0s and somehow returns the value at 3s. thread-b comes at 1s and returns value at 2s. then the cache will be overwritten by thread-a at 3s. Then what if the value changed at 0.5s? The final cache is flushed to an expired one.

    So this concurrent is a different approach compare to dogpile.

  11. Mike Bayer

    This isn't a use case I care to support directly. if you have two or three threads all calling the same "creation" function at roughly the same time, you have no idea which one has the "fresher" value - concurrency is non-deterministic, the creator that started second could finish first, etc., there's no way to determine it. The point of the dogpile library is to prevent ever wastefully running the same creation function concurrently.

    If you are trying to get multiple threads to purposefully pile up on the same creation function with the same arguments all at the same time and then just pick a winner, you can roll that yourself on the outside.

  12. Mike Bayer

    let me know if you worked out some system of doing this that works, and if there are any potential API features or adjustments needed on the dogpile system to help it along.

  13. Lx Yu

    yah have to say I have not worked out a really good solution on this. In my specific situation, the later started thread will always get the newest value, so I added an enter_time to cache value, so that the result from later started thread will always be left in cache, no matter when it finished.

    It's working fine for months since, but the system itself is not perfect and I don't know if this situation will also applied to other systems.

    And I'm agree with your opinion about "The point of the dogpile library is to prevent ever wastefully running the same creation function concurrently. " I'll try to refine the system outside dogpile.cache, maybe later. If I got anything new, I'll open pull request here. :)

  14. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.