File size bloat when anydbm picks gdbm

Issue #90 new
David Gardner
created an issue

This might not be a dogpile cache bug, but ran into this and thought you should be aware of it.

When using the dogpile.cache.dbm disk backend if the anydbm picks the gdbm implementation, then the dbm disk file will continue to grow. The reason seems to be that gdbm requires that gdbm.reorganize is called periodically while none of the other dbm backends have this method.

Comments (2)

  1. Michael Bayer repo owner

    there's not much that can be done on this end about that and it seems this is only in terms of deletions. so it depends on what kind of caching you're doing.

  2. David Gardner reporter

    With regard to deletes, I think this includes overwriting an existing key.

    I'm attaching a simple repro script of the issue (which last weekend eventually filled up the disk on my dev server).

    When I run the script without any arguments, dogpile/anydbm will use the dbhash module, and the size of the dbm will grow and shrink as the contents of the cache change as expected.

    When run with 'g' as the first argument the file will quickly grow to be several megabytes.

    The background on this is trying to use an older Python 2.6 build where Python's _bsddb.so was linked against libdb-4.3.so inside of mod_wsgi+Apache both of which were linked against libdb-4.7.so, and using dbhash was causing a segfault.

    With that said I agree this really isn't a dogpile issue, but wanted to document this in case anyone else ran into it.

  3. Log in to comment