Error when retrieving large amounts (>128MB) of metadata

Issue #445 resolved
Scott Wells repo owner created an issue

A user reported an issue with metadata retrieval when trying to pull down the entire org contents from a reasonably large org:

Caused by: javax.xml.bind.UnmarshalException
 - with linked exception:
[javax.xml.stream.XMLStreamException: Text size limit (134217728) exceeded]
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:470)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:402)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:379)
    at org.apache.cxf.jaxb.JAXBEncoderDecoder.doUnmarshal(JAXBEncoderDecoder.java:858)
    at org.apache.cxf.jaxb.JAXBEncoderDecoder.access$100(JAXBEncoderDecoder.java:102)
    at org.apache.cxf.jaxb.JAXBEncoderDecoder$2.run(JAXBEncoderDecoder.java:897)
    ...
Caused by: javax.xml.stream.XMLStreamException: Text size limit (134217728) exceeded
    at com.ctc.wstx.sr.StreamScanner.constructLimitViolation(StreamScanner.java:2469)
    at com.ctc.wstx.sr.StreamScanner.verifyLimit(StreamScanner.java:2462)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123)
    ... 43 more

This is due to a limitation with CXF's usage of Stax for XML parsing where the default maximum document size is 128MB. It's possible to raise this limit on a per-request basis, so I'm planning to raise it to 512MB initially and then if it needs to be adjusted further, I'll consider making this a per-connection configurable parameter.

Comments (7)

  1. Xander Victory

    @Scott Wells I’ve managed to hit this with IC2 2.0.8.2 (unsure how to measure the size of the download)

    Hmm seems it was probably due to including Documents in the retrieve (org has lots of large images)

  2. Scott Wells reporter

    Hmmmm...I wonder how large that response is. According to Salesforce's documentation, even 512MB should be sufficient to handle a base-64-encoded maximum payload:

    https://developer.salesforce.com/docs/atlas.en-us.salesforce_app_limits_cheatsheet.meta/salesforce_app_limits_cheatsheet/salesforce_app_limits_platform_metadata.htm

    I'm actually surprised that you're not hitting the 10K file limit before exceeding 512MB. Does this include large attachments, static resources, etc.? Also, is a partitioned retrieval a viable workaround for you? Basically split the retrieval into two (or more) requests?

    I can certainly bump the CXF payload size limit up higher, but before I do so I'd like to understand this better seeing as how IC should be providing limits that are higher than those documented by Salesforce for this operation.

  3. Xander Victory

    Very large Document object bodies. In this case I just unticked documents in the retrieval as I was just making a backup of a sandbox’s metadata before refresh.

    The error popup did strike me as a little odd - there was basically just the Exception’s message and no real indication of which component it came from.

  4. Scott Wells reporter

    Yeah, unfortunately the error reported by the underlying SOAP/XML library isn't specific. It just says that the payload was larger than it can handle, and it doesn't know specifically why the payload is that large. In fact, the payload is largely opaque to it because it's a base-64-encoded byte stream.

    I'm glad to hear that you were able to retrieve the main metadata by being a bit more selective.

  5. Log in to comment