mzML XML "huge text node" error

Issue #23 resolved
jacobrosenstein
created an issue

I am reading some large mzML files, and I am frequently seeing this error:

XMLSyntaxError: xmlSAX2Characters: huge text node

I have temporarily gone through mzml.py and xml.py and added the parameter huge_tree=True to all etree.iterparse() methods. This seems to have temporarily fixed it.

Comments (9)

  1. Lev Levitsky repo owner

    Thanks for reporting!

    huge_tree was enabled in some calls and then omitted in others, which doesn't make much sense. What I see we can do is either have it enabled everywhere or make it configurable through keyword arguments. The former is supposed to be easier for the user, but the huge_tree kwarg is there for security reasons, so maybe we should respect that and go for the latter option.

  2. Lev Levitsky repo owner

    This should be working in fc2a31a.

    I didn't touch the iterparse call in mzml.py because that one only parses the indexes and does not get called if you are using mzml.read() anyway (mzml.read instantiates the MzML class and not PreIndexedMzML).

  3. Lev Levitsky repo owner

    PyPI gets updated with versions that have version tags here on Bitbucket. It's usually once in several months.

    I typically push an update when I have done all I wanted in the short term. This time things have been piling up a bit; there is one more change I wanted to implement in the near future before I push an update. There has been a number of bugfixes though, so it's probably worth pushing earlier than later.

  4. Log in to comment