Dogpile.core assumes it's only ever used from a single process

Issue #1 resolved
Richard Barlow created an issue

I'm using dogpile.cache with a custom file-based backend that creates a file per cached object and uses a lock-file per cache object as the dogpile lock. The cache objects and lock files are on an NFS share (I'm using flufl.lock, so locking on NFS 'should' work) and the processes using/generating the cache objects are running on different machines on the network.

My issue is that, when starting with an empty cache, multiple processes will all generate the cache object once they successfully acquire the lock, rather than a single process generating it and the others subsequently using it.

From what I understand, line 159 in dogpile.py attempts to check for this situation, i.e. the lock has been acquired, but another thread/process may have just generated the value. However this only works for a single process with multiple threads as it only inspects the current dogpile object's state, rather than checking for the existence of the cache object from the backend.

It seems that the check on line 159 should either call the backend's 'get' method or a new backend 'check' method should be introduced. Calling the 'get' method for this simple 'does the cache object exist' check could be inefficient for large cache objects (as in my case), so a 'check' backend method would be nice in this case.

Comments (14)

  1. Mike Bayer repo owner

    OK, to prove to me that I'm not insane, you meant the headline here to say, "dogpile.core assumes it's only ever used from a single process", is that correct ?

  2. Richard Barlow reporter

    Ah yes I did indeed, sorry. This whole thing is rather confusing as it is, without me making it worse by muddling up terms ;)

  3. Mike Bayer repo owner

    so anyway, yes i can see that if you're using dogpile.core in its original "default" mode, you'd have this problem. But there is a solution already in place to it, which is to use a value_and_created_fn within your dogpile.acquire() call, along with your file-distributed lock.

    What value_and_created_fn does is provides a function to dogpile.core, at lock acquisition time, which will provide to dogpile not just the cached value (or it could be an arbitrary, symbolic value) that you care about, but also the creation time of the value. This function would be retrieving the creation time from whatever service is providing the value as well, so this is where you would coordinate among multiple processes what the actual "creation" timestamp was.

    But reading your suggestion that dogpile.core should "call the backends get() method" is exactly what value_and_created_fn does, and it's also how dogpile.core is used by its main consumer dogpile.cache, so I think this would solve your issue.

  4. Mike Bayer repo owner

    this does have me thinking about the kind of broken up API of dogpile.core though. It may be obvious that it was written in two parts, first the "simple thread" version and then later the remote creation function and all that was added on. it might be worthwhile to move things around such that the "simple" form of dogpile lock is actually a separate call on top of the more "fundamental" version, so that API-wise things make more sense.

  5. Richard Barlow reporter

    Thank you for the very quick response. I am using dogpile.cache directly but I can't see how value_and_created_fn helps in the situation where there is no previous cache value, as there won't be a creation time. I've attached a test case to demonstrate my problem, rather than trying to describe it in words. You'll need to make a /tmp/dptest dir.

    If you run this script twice, concurrently, I would expect the first to begin generating the data and the second to block on the completion of the first. Once the first completes I would expect the second to just use the value generated by the first, however it appears to needlessly call the creation function a second time.

  6. Mike Bayer repo owner

    oh wow you're using dogpile.cache. funny that you're reporting the issue on this end, I figured you were using the core API directly. I had no idea this was with full usage, well lets start again then.

  7. Richard Barlow reporter

    Sorry, I'd tried to see where the problem was and come to the conclusion that it was in dogpile.core so reported it here.

  8. Mike Bayer repo owner

    anyway, this will take quite a while to repair, because I'd like to refactor the "lock" out of dogpile core, get it re-tested, then implement a better form of "is_expired", test a lot more. We'll see how today goes.

  9. Mike Bayer repo owner

    OK, I've done the rework of the API I mentioned before, which by removing the usage of any state from the new "lock" object, made it much easier to smoke out the fact that creationtime is memory local. the changes are in 5db7cd3bb540b2bab80e8d2659f2556a3febeba8 and 623ce1ad86158121322aeeeab2dbcccbc423f8ef, but i still want to do some more api cleanup and actually change dogpile.core's documentation quite a bit. I also want to change dogpile.cache to use the newer API directly but at the moment these updates affect your test case in the way you're looking for.

  10. Mike Bayer repo owner

    OK, I've got dogpile.core and dogpile.cache all ready for an 0.4.0 release on both, with this fix. If you can confirm it all works on your end, this can all go out - thanks !

  11. Mike Bayer repo owner

    okey doke, I want to push this out and we can always do another release. let me know if there's still issues on this ticket, closing for now.

  12. Log in to comment