Concurrent usage of dbm causes cache to go stale

Issue #104 new
David Gardner
created an issue

Ran into this issue in production where users were reporting items in the cache being an hour old for a cache region configured with a one minute expiration time. Problem first observed with version 0.5.7, and reproduced with version 0.6.1.

I was able to reproduce the issue in a simple test of a function that returns datetime.now() (dogpile-test.py), if I run about 10 concurrent instances of the script after about a minute or so they will start reporting stale data.

However I noticed as a work-around if I use the MutexLock class from the lock_factory documentation, I don't run into the problem: http://dogpilecache.readthedocs.io/en/latest/api.html?highlight=dogpile.cache.dbm#dogpile.cache.backends.file.DBMBackend.params.lock_factory

Comments (6)

  1. David Gardner reporter

    Looked into this a bit more, since my issue is with concurrent processes and not concurrent threads I think all that the MutexLock was providing for me was bypassing the default FileLock implementation.

    In my case anydbm is picking dbhash which has it's own locking. I was also able to avoid the problem by simply setting rw_lockfile=False

  2. Michael Bayer repo owner

    well, lockfiles can be weird. I can't reproduce any problem. I added this:

    i += 1
    if i % 10000 == 0:
        print("%s iterations.  current age is %s" % (i, now - dt))
    

    ran it in three windows, I see:

    0.6.2
    10000 iterations.  current age is 0:00:00.026704
    20000 iterations.  current age is 0:00:04.107627
    30000 iterations.  current age is 0:00:04.853445
    40000 iterations.  current age is 0:00:04.978281
    50000 iterations.  current age is 0:00:01.062407
    60000 iterations.  current age is 0:00:04.126943
    ...
    

    are you in one of the many danger areas for lockfiles? e.g. Windows, NFS shares, weird file systems, containers, etc. ?

  3. David Gardner reporter

    Its a typical Linux setup on an ext4 file system. I added your iteration counter, and ran my test launching 10 instances of the script at near the same time with:

    gnome-terminal -e ./dogpile-test.py --tab --tab --tab --tab --tab --tab --tab --tab --tab --tab
    

    Around iteration 50000 or so the script starts complaining about the age of the cache.

  4. Michael Bayer repo owner

    hmmm probably need to put more debugging into the lock code itself, set a timer when the file lock is acquired, somehow have it print out how long it's been held, then perhaps shutdown other processes and see if that one keeps holding opened. another way would be to use linux commands, there's a util lslocks that you can try which would show whos locking it.

    lockfiles are just like this, they have weird problems.

  5. Log in to comment