Cached values may get changed for Memory backend

Issue #130 new
created an issue

For example,

from dogpile.cache import make_region

region = make_region().configure('dogpile.cache.memory')

def foo():
    return [1, 2, 3]

def bar():
    a = foo()
    a += [4]
    return a

print bar()
print bar()

The first call to bar() returns [1, 2, 3, 4] as expected. However, the [4] gets appended to the cached value (of type list) in the dictionary for the cache. And the next call to bar() returns [1, 2, 3, 4, 4], which seems unexpected.

This would happen if the cached value is an object, so that the assignment in bar() is assignment by reference.

Though this can be fixed by the user to clone the value, I think the memory backend may return a clone of the cached value instead of directly returning the value, which gives a chance that the cached value may get modified by the user. Also, from the perspective of abstraction, the user should not see a cached value being changed.

Comments (10)

  1. Yi reporter

    Using pickle would solve the problem, as it creates a new object to return to the user.

    The reasons I am raising this issue are that

    1. a cached value may get mutated by the user, which seems unexpected.
    2. For the example I posted, with and without cache, the program behaves different, which also seems undesired.
  2. Michael Bayer repo owner

    Here's an object:

    class Foo(object):
        def __init__(self, x, y):
           self.x = x
           self.y = y

    the user passes this to dogpile.cache. How should dogpile know how to "copy" this? do we require that __copy__ is present? do we use deep copy or shallow copy? What about all the applications that don't need this possibly expensive copy operation? shouldn't the cloning be something that is optional and fully customizable rather than implicit and assumed?

    As far as proxies, those are very problematic for an unknown type of object and I don't think that prevents mutation in any case, if someone calls "myobject.change_me(5)", proxy calls that down and the object has changed.

  3. Yi reporter

    If avoiding implicit copying is a goal, then probably restricting what can be put to memory backend is another approach. Disallowing mutable objects being put to memory backends would give a restrictive but reasonable semantic. At a second look at MemoryPickleBackend, it is already documented that the behavior of MemoryBackend is different. I think this issue can be closed.

  4. Michael Bayer repo owner

    however, adding a "clone" argument to the backend where the user can just provide a cloning function for the kinds of objects they want to store, that would allow for a memory backend that's a lot faster than the pickle version. can make it emit a warning if no clone function is given. folks who don't want to clone can specify it as a no-op so at least they know what they're getting into.

  5. jvanasco

    Maybe the docs on MemoryBackend can just have a "warning" that reflects what MemoryPickle says. Something like...

    Warning: MemoryBackend is offered for specific situations and test suites. This backend does not serialize cached values like most other backends, so changes to a stored/retrieved object may reflected in the cache value as well. You may prefer the MemoryPickle backend, which places a copy of the object into the cache.

  6. Log in to comment