Please don't pack internal data for pickle

Create issue
Issue #207 resolved
animalize created an issue

(this issue related to Hg #195)

Why I suggest that don't pack internal data?

  1. It slow down the compiling more or less, while memory is cheap in most environments.
  2. If a user doesn't have huge patterns, the extra memory usage is insignificant.
  3. If a user has huge patterns, he/she probably will benefit from pickle fast-load, memory usage is not the principal contradiction.
  4. It doesn't break the compatibility of existing pickled data.

I compiled 156 common patterns, then dump them to a file, the file size is only 46 KB, so I guess memory usage is not a big problem for most users.

BTW, please add this feature to document, so that more people will know it.

FYI, compiled a huge pattern with 800,000 branches, then observe the process memory:
339.8 MB regex 2016.03.31, without pickle ability of compiled pattern.
412.0 MB regex 2016.04.01, with pickle ability, don't pack.
348.6 MB regex 2016.04.02, with pickle ability, pack.

Comments (6)

  1. Matthew Barnett repo owner

    I doubt that packing the data has much effect on the speed (it's a very simple form of compression), and by reducing the size it's reducing the load time from disk, so I wouldn't be surprised if there were a net speed-up!

  2. animalize reporter

    You forget that pickled data is unpacked, packing is only for internal data in current code, so it won't reduce the size of disk file.

    I saw many recent commits slow down the compiling, so I think this tweak is balanced. If someone really has a huge pattern with 800,000 branches, then extra 100MB memory is deserved. Of course, you decide.

    Fixed support for pickling (Hg issue 195)
    

    This is unclear... How about: Support for pickling compiled pattern.

  3. Matthew Barnett repo owner

    You were correct about the pickled data being unpacked. :-( Fixed in regex 2016.05.15.

  4. Matthew Barnett repo owner

    There are no guarantees that pickled regexes will be compatible between different versions of the module.

  5. Log in to comment