Improve memory_pressure management

Issue #320 resolved
Philip Jenvey
created an issue

cffi's docs now recommend manually adding memory_pressure anywhere ext. allocations are tied into pypy's GC.

This is inconvenient w/ a lib like cryptography: it's often allocating via OpenSSL's own functions, all tied into ffi.gc. In some cases cryptography doesn't even know the underlying size of what it's allocated.

It's doesn't add memory pressure currently which can exhibit itself as memory leaks under heavy usage (e.g. https://github.com/mozilla-services/autopush/issues/917)

Instead of forcing every lib to call a __pypy__ specific function, can cffi be improved?

Maybe ffi.gc can arbitrarily add memory pressure itself? Gain an optional size argument for cases that know the memory pressure size, otherwise guess? Could there be some added pypy GC heuristics that improve such a "guess"?

Comments (8)

  1. Nathaniel Smith

    Two somewhat out-there ideas:

    • On Linux, if the pypy interpreter exports symbols called malloc/calloc/realloc, then any C extensions that try to allocate memory the usual way will end up calling into pypy's code instead, at which point you can do accounting and then call into the real libc functions. (This doesn't work on Windows or MacOS, because of ELF/PE/MachO differences, but long-running services are always on Linux anyway..... right?)

    • Instead of keeping a single counter of recently-allocated-memory and having cffi.gc increment it by some arbitrary amount, keep a separate counter of "how many cffi.gc objects are alive" (or created since the last gc or whatever). Then... do something clever with this? At the very least having a separate tunable threshold for these calls versus memory-in-general could potentially be useful, but you can also imagine something like, at GC time ask the OS how much address space is used, and use this to estimate how big each ffi.gc object is for this particular program and auto-tune the threshold?

  2. Armin Rigo

    @Philip Jenvey The most important "blocker" for ffi.gc(..., size) is checking on a real-life application that it really works as intended. So can you confirm that it really does work, in cryptography, to add __pypy__.add_memory_pressure(size) after each call to ffi.gc()? If it does, then indeed adding an optional extra argument to ffi.gc() looks like a minimal but good idea.

    What to do if the extra argument is not provided? That's very unclear. Although we could guess the size using non-portable methods, I'm not sure it's a good idea. To start with, we have no clue if ffi.gc() is really linked to just free() or to some more complex cleanup function (e.g. accounting for dependencies, like the memory block containing pointers to more memory blocks) or even to some cleanup logic that is mostly unrelated to memory. So I think that the behavior of ffi.gc() with no extra argument can't change.

    @Nathaniel Smith Using global measures (like the RSS or the total allocated memory from all malloc() calls) is delicate. IMHO it is the wrong approach for PyPy, although I may be wrong about it. It's likely to give good and easily tuned results on small examples, and then fail in more complex real-life programs. For example, I'd be unhappy if PyPy started to gobble an extra 20GB of RAM on its own after I explicitly call malloc(20GB). Not to mention, it's the kind of approach that can go very wrong if two independent components in the same process use it---each component thinks it's ok to use N bytes of memory because the other component also uses N bytes of memory, for any N. This aspect of PyPy I'd like to keep a bit different from a typical JVM, which assumes it is the center of the universe (er, process) and mostly controls everything about it.

  3. Alex Gaynor

    After spending some more time with it, I think there's a far more general problem than just external allocations:

    import random
    import sys
    
    class A(object):
        def __del__(self):
            pass
    
    
    class B(object):
        pass
    
    cls = {"A": A, "B": B}[sys.argv[1]]
    
    while True:
        # Gibberish to make sure the JIT doesn't optimize the allocation away
        [cls()] * random.randint(1, 1)
    

    Compare the memory with A vs. B. For me, B sits at like 20MB, and A just keeps growing (at least 4GB).

    I think we have a general issue where if a high enough percentage of your memory has __del__, the GC can't keep up.

  4. Armin Rigo

    @Philip Jenvey Still waiting for feedback. Can you or someone else try, on cryptography or some other real-life library, add __pypy__.add_memory_pressure(size) after each call to ffi.gc() in order to check that the idea works? And then report the results here. (I'm not suggesting to change the libraries, but only to check if that gives the right result.)

  5. Log in to comment