Pickle (or otherwise serial) the compiled regex

Create issue
Issue #195 resolved
Former user created an issue

We have many regular expressions (thousands) that our application compiles at start-time. This takes a long time (~5 seconds). Too long for us... Is it possible to store the compiled regular expressions to file and quickly load them when needed?

Comments (7)

  1. Matthew Barnett repo owner

    The best bet would be to store the intermediate code that's passed to the engine, and also any named sets that are used. It's something I'd need to experiment with.

  2. animalize

    This idea is great. The code holds more memory, it would be better if generate the pickled_data when needed.

    *Edit: Should this be written into document? So that more people will know this.

  3. Matthew Barnett repo owner

    Objects like the named lists and group names are already part of the pattern object, and the pickled data simply holds another reference to them, so they don't occupy any more memory.

    The only object that could occupy significant memory is the list of codes, but you need that in order to create a new pattern object; previously it was just discarded. It's not possible to re-create it from the pattern object itself.

    I might be able to reduce the memory a little, but you're going to have to accept that if you want pattern objects to be able picklable, it's going to use more memory.

  4. animalize

    I compiled a huge pattern with 800,000 branches, then observe the process memory, it looks good:

    • 339.8 MB regex 2016.03.31, without pickle ability
    • 412.0 MB regex 2016.04.01, with pickle ability
    • 348.6 MB regex 2016.04.02, improved memory use

    Tested on 64bit Python, Windows 10

  5. Log in to comment