1. Mike Bayer
  2. dogpile.cache
Issue #33 resolved

About get_or_create_multi feature

Lx Yu
created an issue

I need a feature to get/create multi in one call.

Take this code for example

def get_order(order_id):
    return DBSession().query(Order).get(order_id)

def mget_order(order_ids):
    orders = DBSession().query(Order).filter(Order.id.in_(order_ids))
    return {order.id: order for order in orders}

The cache_on_arguments works great for the fist get_order function. But for the later one, if I mget [1, 2, 3] and [1, 2], they'll generate 2 cached value, and if a single element in the ids expired, the whole cache expired.

So I want to let the later one make use of the first function's cache. Then mget [1, 2, 3, 4, 5] will directly make use of cache generated in 'get', and if say [2, 3] expired, we only need to regenerate [2, 3]. It'll be much more fast and efficient when ids list grow larger.

While I go through the source code, the cache_on_arguments use a Lock from dogpile.core with this code, which seems not very suitable to do the multiple way in my situation.

with Lock(
        async_creator) as value:
    return value

So do you have any suggestions on how to accomplish this goal?

Comments (14)

  1. Lx Yu reporter

    Add it's a little confusing to me as some codes check value version while some not:

    if value is NO_VALUE or \
        value.metadata['v'] != value_version or \
            (self._invalidated and
            value.metadata["ct"] < self._invalidated):

    This checks version while get_multi do not.

  2. Mike Bayer repo owner

    it might be possible, and I think as far as the lock, you'd use the async_creation_runner feature. this delivers the mutex held by Lock to the actual creator function. The expiration times and dogpile locks are per-key, so the function here would have a list of locks which it would need to close out once it has retrieved all values.

  3. Mike Bayer repo owner

    oh that. That's a separate issue, that's just a check in case we upgrade how dogpile.cache stores values in the cache, we'd bump "value_version" to 2. So we'd want to add the value_version check everywhere its missing now sure, but that only matters if and when we bump "value_version". that logic has no meaning right now.

  4. Mike Bayer repo owner

    try out that patch. I've only tested it in a non-concurrent case. I'd need to think pretty deeply about it in order to see if it actually maintains "dogpile" behavior, it might not, because it does the get_multi() outside of the mutex. the tricky thing here is that the mutexes have to remain per-key because you want all kinds of combinations of get(x, y, z, ...) to coordinate on keys.

  5. Mike Bayer repo owner

    silly test:

    from dogpile.cache import make_region
    from itertools import count
    counter = count()
    region = make_region(
    def creator(keys):
        print "creator:", keys
        return [k + next(counter) for k in keys]
    values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
    print values
    values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
    print values
  6. Lx Yu reporter

    Thanks, I'll test the performance tomorrow.

    IMHO, we use get_multi major for its performance(do 1 request instead of n request), so the performance is the first thing to consider, even with slightly less accuracy.

  7. Mike Bayer repo owner

    I think its a great feature, but with mutexing etc. I'm worried about deadlocks. this stuff deadlocks easily when it's not correct

  8. Mike Bayer repo owner

    in fact we might need to sort the keys to prevent this.

    consider one call against (3, 4) and another against (4, 3). If get_multi calls go out against both simultaneously, you can deadlock on A->waiting for 4 B->waiting for 3. I need to look more closely, maybe produce a test, to see if that happens here.

  9. Lx Yu reporter

    Just tested the code modified a litter, you can check it here: https://gist.github.com/lxyu/5712915


    1. The get_multi returns dict, so I think the get_or_create_multi should keep the same behavior.
    2. Add should_cache_fn support
    3. Check keys_to_get before call creator
    4. Add key sorting.

    And another consideration is, the creator function differs from get_or_create too, as it accept the keys as args.

    So it becomes a little complex to add a namespace to keys, see the following code for example.

    def get(order_id):
        def _creator():
            return DBSession().query(Order).get(order_id)
        _get_key = lambda x: "cache:order:{}".format(x)
        return region.get_or_create(
    def mget(order_ids):
        def _creator(order_keys):
            order_ids = map(_extract_key, order_keys)
            orders = session.query(Order).filter(Order.id.in_(order_ids)).all()
            return {_get_key(order.id): order for order in orders}
        _get_key = lambda x: "cache:order:{}".format(x)
        _extract_key = lambda x: int(x[x.rfind(':')+1:])
        keys = map(_get_key, order_ids)
        orders = region.get_or_create_multi(
        return {i: orders[_get_key(i)] for i in order_ids}

    Any idea on how to refine this situation? and I'm also wondering if this feature can be written as a multi decorator?"

  10. Mike Bayer repo owner

    I've created the function decorator method, addressed the namespace issue as well as produced some real tests in b323546dea06c5fab7c3896a60ccea0a10aafd57. The method still needed a lot of changes in order to function fully. This will now be 0.5.0. Let me know that this implementation addresses your issues and reopen if you have problems, thanks.

  11. Lx Yu reporter

    Yes this resolved my problems.

    I actually implement a decorator my own and just thinking for a pull request, then find out that your implementation is better than mine. hah!

    Thank you very much!

  12. Log in to comment