openxml (docx) crashing with javax.xml.stream.XMLStreamException: Maximum attribute size limit (2097152) exceeded
I’ll try to get a sample file to reproduce the problem. We did find an attribute with an obscenely long value. Strange that the file works with M38/M39. Did we change our xml processor version in the openxml filter? If so they have better error checking for these pathological cases.
For now here is the stack trace:
Caused by: javax.xml.stream.XMLStreamException: Maximum attribute size limit (2097152) exceeded
at com.ctc.wstx.sr.StreamScanner.constructLimitViolation(StreamScanner.java:2483) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.StreamScanner.verifyLimit(StreamScanner.java:2476) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader._checkAttributeLimit(BasicStreamReader.java:2053) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:2038) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3144) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:3042) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2920) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1122) ~[woodstox-core-6.1.1.jar:6.1.1]
at com.ctc.wstx.evt.WstxEventReader.nextEvent(WstxEventReader.java:283) ~[woodstox-core-6.1.1.jar:6.1.1]
at net.sf.okapi.filters.openxml.PrioritisedXMLEventReader.nextEvent(PrioritisedXMLEventReader.java:57) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.SkippableElements$Default.skip(SkippableElements.java:131) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.SkippableElements$Inline.skip(SkippableElements.java:168) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.RunSkippableElements.skip(RunSkippableElements.java:76) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.RunParser.parseSkippableElements(RunParser.java:416) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.RunParser.startRunParsing(RunParser.java:196) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.RunParser.parse(RunParser.java:165) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.BlockParser.processRun(BlockParser.java:297) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.BlockParser.parse(BlockParser.java:230) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.StyledTextPart.process(StyledTextPart.java:241) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.StyledTextPart.open(StyledTextPart.java:207) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.StyledTextPart.open(StyledTextPart.java:129) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInDocument(OpenXMLFilter.java:446) ~[okapi-filter-openxml-1.40.0.jar:na]
at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:256) ~[okapi-filter-openxml-1.40.0.jar:na]
... 49 common frames omitted
Comments (18)
-
reporter -
I could have sworn we’d fixed this before, but I think it was in the IDML filter – see commit b15d9327e.
I don’t remember intentionally changed the XML parser, but we’ve had problems before the one we’re using changes underneath us because of SPI discovery/classpath issues. That might have happened here.
-
reporter @Chase Tingley I saw that the woodstock xml processor version was bumped a while back. Possible woodstock added this check? I’ll tell the team about the classpath issues and investigate on our side.
-
@Jim Hargrave (OLD) , there is
net.sf.okapi.filters.openxml.OpenXMLFilter#MAX_ATTRIBUTE_SIZE
constant (2 * 1024 * 1024
), which affects the maximum allowed attribute size, as far as I can see:if (inputFactory.isPropertySupported(WstxInputProperties.P_MAX_ATTRIBUTE_SIZE)) { inputFactory.setProperty(WstxInputProperties.P_MAX_ATTRIBUTE_SIZE, MAX_ATTRIBUTE_SIZE); }
I think the best solution would be to reflect the IDMLFilter behaviour - when this value comes from filter parameters as @Chase Tingley mentioned before.
-
reporter @Chase Tingley @Denis Konovalyenko would everyone be ok if we simply increased the limit? Would this work (
3 * 1024 * 1024
) -
-
reporter Good catch - let’s go with that value
-
A related pull request #447 was opened.
-
- changed status to resolved
The pull request #447 was merged.
-
-
Seems we have already hit the new limit which was 4x the old. I think we should leave this open ended (not check for size) and let memory dictate what is possible. This does seem to be a bug in OpenXml/Word as this has to cause a problem with other tools at some point. I’d bet there is a bug logged with Microsoft on this already.
-
@Jim Hargrave Woodstox sets a default value (which is smaller than what we currently set). I can’t find documentation that indicates if setting it to 0 disables the check completely. (Disabling the check also makes me uneasy, safety-wise.) I think we should just expose the value through the filter config like we do for IDML.
-
a filter config works for me.
-
@devesh kumar , I agree with @Jim Hargrave - adjusting the
maxAttributeSize
filter parameter value to a reasonable to you one (more than 4194304 at the moment) should work. -
Thanks all for your response on this,
@Denis Konovalyenko since i am using the okapi, the class were the
maxAttributeSize
was set to have a 4mib size (in PR #447 ) comes as a decompiled class for me and is read-only.plus this maxAttributeSize is a variable thing and increasing the size more than 4194304 may only work for now.
-
I forgot that Denis had already added the parameter for this in
#974, there’s just no UI for it.Devesh – you don’t need to decompile anything. You just need to make a custom filter configuration for the OpenXML filter in Rainbow, then open the .fprm file in a text editor and change the value of the
maxAttributeSize
attribute.We are not going to remove the restriction entirely - doing so allows for server code running this filter to be DOSed.
-
@Chase Tingley is it something like this that you want me to write ?
I found it in the
okapiFilterFactory.java
classprivate static RegexFilter getSRTFilter() { RegexFilter filter = new RegexFilter(); try { net.sf.okapi.filters.regex.Parameters params = (net.sf.okapi.filters.regex.Parameters) filter.getParameters(); String config = IOUtils.toString(OkapiFilterFactory.class.getResourceAsStream(OKAPI_CUSTOM_CONFIGS_PATH + "okf_regex@srt.fprm"), "UTF-8"); params.fromString(config); } catch (IOException e) { System.err.println("Strings custom configuration could not be loaded"); } return filter; } ```
this is the class (okapiFilterFactory.java) where i could see and edit variables likes
public static final String XML_CONFIG_FILENAME = "okf_xmlstream-custom.fprm";
and these *.fprm files are present in /resources/okapi/configurations
-
Hi, I have some difficulties with the maxAttrubuteSize config. I found the maxAttrubuteSize is read but the value is not passed to
P_MAX_ATTRIBUTE_SIZE
by the code below in OpenXMLFilter.java. So when do extraction, the exception“Caused by: javax.xml.stream.XMLStreamException: Maximum attribute size limit (2097152) exceeded” still exists.
Can you help identify if anything is wrong? Thank you so much!
setPropertyIfSupported(inputFactory, WstxInputProperties.P_MAX_ATTRIBUTE_SIZE, conditionalParameters.getMaxAttributeSize());
I’ve done the following:
- Add a fprm file
okf_openxml@maxAttrSize.fprm
containing maxAttrubuteSize.i=33333333 - Created a CustomOpenXMLFilterConfiguration.java
public class CustomOpenXMLFilterConfiguration { public static final String CUSTOM_OKAPI_FILTER_ID = "okf_openxml@maxAttrSize"; private static final String OKAPI_OPENXML_FILTER_CLASS = "net.sf.okapi.filters.openxml.OpenXMLFilter"; private static final String CONFIG_FILE_LOCATION = "/resources/okf_openxml@maxAttrSize.fprm"; private static final String OPENXML_EXTENTIONS = ".docx;.docm;.dotx;.dotm;.pptx;.pptm;.ppsx;.ppsm;.potx;.potm;" + ".xlsx;.xlsm;.xltx;.xltm;.vsdx;.vsdm;"; public static net.sf.okapi.common.filters.FilterConfiguration provideCustomOpenXMLFilterConfiguration() { return new FilterConfiguration( CUSTOM_OKAPI_FILTER_ID, MimeTypeMapper.XML_MIME_TYPE, OKAPI_OPENXML_FILTER_CLASS, "OPENXML (Customize MaxAttributeSize)", "Customize MaxAttributeSize", CONFIG_FILE_LOCATION, OPENXML_EXTENTIONS); } }
3. In
FilterConfiguration.java
, Add the customized config to FILTER_CONFIGURATION_MAPPER, add the new filter_ID to EXTENSIONS_MAP.FILTER_CONFIGURATION_MAPPER.addConfiguration(CustomOpenXMLFilterConfiguration.provideCustomOpenXMLFilterConfiguration());
public static final ImmutableMap<FileContentType, String> EXTENSIONS_MAP = new ImmutableMap.Builder<FileContentType, String>() .put(FileContentType.HTML, CustomHTMLFilterConfiguration.CUSTOM_OKAPI_FILTER_ID) .put(FileContentType.XLIFF, "okf_xliff") .put(FileContentType.MOSES_TEXT, "okf_mosestext") .put(FileContentType.DOCX, CustomOpenXMLFilterConfiguration.CUSTOM_OKAPI_FILTER_ID) .put(FileContentType.XLSX, CustomOpenXMLFilterConfiguration.CUSTOM_OKAPI_FILTER_ID) .put(FileContentType.PPTX, CustomOpenXMLFilterConfiguration.CUSTOM_OKAPI_FILTER_ID) .put(FileContentType.TMX, "okf_tmx") .build();
Thanks in advance for the effort!!
- Add a fprm file
- Log in to comment
In "word/document.xml", there is a <v:group> element with an “o:gfxdata” attribute that is super long: 3576823 characters without unescaping the XML escape sequences within it, and 3400187 with unescaping them.