- edited description
NullPointerException when extracting xlsx with a custom object in a hidden sheet
Hi,
We experienced a NullPointerException - OkapiBadFilterInputException: Invalid <col> element
when extracting an xlsx file. This xlsx file has a hidden worksheet containing a custom object (please find in attachment).
Step to reproduce
Use the attachment, or:
- Create a new blank excel file (say, file1), create a blank sheet2;
- Open attached xlsx (say, file2) with Excel, right click sheet1 - unhide - unhide sheet2;
- On sheet2 of file2, there is a dropdown list. Copy it and paste into sheet2 of file1;
- Right click sheet2 in file1, hide.
Run file1 with Okapi OpenXML filter, it will run into the error.
I don’t know how to make problematic dropdown list from scratch, I copied it from other source.
Full stack trace:
[java] Caused by: net.sf.okapi.common.exceptions.OkapiBadFilterInputException: Invalid <col> element
[java] at net.sf.okapi.filters.openxml.WorksheetFragments$Default.columnNames(WorksheetFragments.java:156)
[java] at net.sf.okapi.filters.openxml.WorksheetFragments$Default.readWith(WorksheetFragments.java:126)
[java] at net.sf.okapi.filters.openxml.Worksheet$Default.readWith(Worksheet.java:73)
[java] at net.sf.okapi.filters.openxml.ExcelDocument.nextPart(ExcelDocument.java:456)
[java] at net.sf.okapi.filters.openxml.ExcelDocument.nextPart(ExcelDocument.java:477)
[java] at net.sf.okapi.filters.openxml.Document$General.nextPart(Document.java:272)
[java] at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInDocument(OpenXMLFilter.java:441)
[java] at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:252)
[java] at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:261)
[java] at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:166)
[java] at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:117)
[java] at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:227)
[java] at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:199)
[java] at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182)
Investigation
Unzip the xlsx, and go to the xl/worksheet/sheet2.xml
I see the dropdown list is actually defined with following content:
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice Requires="x14">
<controls>
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice Requires="x14">
<control shapeId="2049" r:id="rId3" name="Drop Down 1">
<controlPr locked="0" defaultSize="0" autoLine="0" autoPict="0">
<anchor moveWithCells="1">
<from>
<xdr:col>1</xdr:col>
<xdr:colOff>495300</xdr:colOff>
<xdr:row>5</xdr:row>
<xdr:rowOff>12700</xdr:rowOff>
</from>
<to>
<xdr:col>2</xdr:col>
<xdr:colOff>762000</xdr:colOff>
<xdr:row>6</xdr:row>
<xdr:rowOff>50800</xdr:rowOff>
</to>
</anchor>
</controlPr>
</control>
</mc:Choice>
</mc:AlternateContent>
</controls>
</mc:Choice>
</mc:AlternateContent>
You can see this <xdr:col>1</xdr:col>
doesn’t have any attributes.
And the xdr
namespace is defined at very beginning:
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
...
xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"
From the source code it throws from WorksheetFragments.java#lines-146:158:
private List<String> columnNames(final StartElement startElement) {
try {
List<String> names = new ArrayList<>();
int min = Integer.parseUnsignedInt(startElement.getAttributeByName(MIN).getValue());
int max = Integer.parseUnsignedInt(startElement.getAttributeByName(MAX).getValue());
for (int i = min; i <= max; i++) {
names.add(indexToColumnName(i));
}
return names;
} catch (NumberFormatException | NullPointerException e) {
throw new OkapiBadFilterInputException("Invalid <col> element", e);
}
}
It is trying to get attribute of min
and max
which is not available from an <xdr:col>
. And from its caller:
} else if (COL.equals(se.getName().getLocalPart())) {
if (hidden(se)) {
// Column info blocks span one or more columns, which are referred to
// via 1-indexed min/max values.
this.hiddenColumns.addAll(columnNames(se));
}
It looks to me trying to collect hidden column names. But it only checks the local name to be col
and ignores the namespace.
By an experiment of hiding some columns, I see the a regular hidden column is like this
<col min="65" max="65" width="1.6796875" style="3" customWidth="1"/>
which is under primary namespace xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
Conclusion
My inference: the code counting hidden columns didn’t check the namespace, so it accidentally consider a column under spreadsheetDrawing
namespace as a regular worksheet column and trying to get min
attribute which does not exist.
Suggestion
Add a check to verify the name space is "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
, and also apply this validation to the hidden row counting.
Comments (6)
-
reporter -
- changed status to open
-
- changed milestone to 1.46.0
-
Good analysis, thank you Yuxiang. FYI @Denis Konovalyenko
-
@Yuxiang Tang thanks for the catch! A solution is available in the scope of pull request #792.
-
reporter @Denis Konovalenko Thanks for the fix. Looking forward to leverage it in 1.46.
- Log in to comment