mzML XML "huge text node" error

Issue #23 resolved
created an issue

I am reading some large mzML files, and I am frequently seeing this error:

XMLSyntaxError: xmlSAX2Characters: huge text node

I have temporarily gone through and and added the parameter huge_tree=True to all etree.iterparse() methods. This seems to have temporarily fixed it.

Comments (9)

  1. Lev Levitsky repo owner

    Thanks for reporting!

    huge_tree was enabled in some calls and then omitted in others, which doesn't make much sense. What I see we can do is either have it enabled everywhere or make it configurable through keyword arguments. The former is supposed to be easier for the user, but the huge_tree kwarg is there for security reasons, so maybe we should respect that and go for the latter option.

  2. Lev Levitsky repo owner

    This should be working in fc2a31a.

    I didn't touch the iterparse call in because that one only parses the indexes and does not get called if you are using anyway ( instantiates the MzML class and not PreIndexedMzML).

  3. Lev Levitsky repo owner

    PyPI gets updated with versions that have version tags here on Bitbucket. It's usually once in several months.

    I typically push an update when I have done all I wanted in the short term. This time things have been piling up a bit; there is one more change I wanted to implement in the near future before I push an update. There has been a number of bugfixes though, so it's probably worth pushing earlier than later.

  4. Log in to comment