NullPointerException when extracting xlsx with a custom object in a hidden sheet

Issue #1325 open
Yuxiang Tang created an issue

Hi,

We experienced a NullPointerException - OkapiBadFilterInputException: Invalid <col> element when extracting an xlsx file. This xlsx file has a hidden worksheet containing a custom object (please find in attachment).

Step to reproduce

Use the attachment, or:

  1. Create a new blank excel file (say, file1), create a blank sheet2;
  2. Open attached xlsx (say, file2) with Excel, right click sheet1 - unhide - unhide sheet2;
  3. On sheet2 of file2, there is a dropdown list. Copy it and paste into sheet2 of file1;
  4. Right click sheet2 in file1, hide.

Run file1 with Okapi OpenXML filter, it will run into the error.

I don’t know how to make problematic dropdown list from scratch, I copied it from other source.

Full stack trace:

 [java] Caused by: net.sf.okapi.common.exceptions.OkapiBadFilterInputException: Invalid <col> element
 [java]     at net.sf.okapi.filters.openxml.WorksheetFragments$Default.columnNames(WorksheetFragments.java:156)
 [java]     at net.sf.okapi.filters.openxml.WorksheetFragments$Default.readWith(WorksheetFragments.java:126)
 [java]     at net.sf.okapi.filters.openxml.Worksheet$Default.readWith(Worksheet.java:73)
 [java]     at net.sf.okapi.filters.openxml.ExcelDocument.nextPart(ExcelDocument.java:456)
 [java]     at net.sf.okapi.filters.openxml.ExcelDocument.nextPart(ExcelDocument.java:477)
 [java]     at net.sf.okapi.filters.openxml.Document$General.nextPart(Document.java:272)
 [java]     at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInDocument(OpenXMLFilter.java:441)
 [java]     at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:252)
 [java]     at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:261)
 [java]     at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:166)
 [java]     at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:117)
 [java]     at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:227)
 [java]     at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:199)
 [java]     at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182)

Investigation

Unzip the xlsx, and go to the xl/worksheet/sheet2.xml I see the dropdown list is actually defined with following content:

<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
        <mc:Choice Requires="x14">
            <controls>
                <mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
                    <mc:Choice Requires="x14">
                        <control shapeId="2049" r:id="rId3" name="Drop Down 1">
                            <controlPr locked="0" defaultSize="0" autoLine="0" autoPict="0">
                                <anchor moveWithCells="1">
                                    <from>
                                        <xdr:col>1</xdr:col>
                                        <xdr:colOff>495300</xdr:colOff>
                                        <xdr:row>5</xdr:row>
                                        <xdr:rowOff>12700</xdr:rowOff>
                                    </from>
                                    <to>
                                        <xdr:col>2</xdr:col>
                                        <xdr:colOff>762000</xdr:colOff>
                                        <xdr:row>6</xdr:row>
                                        <xdr:rowOff>50800</xdr:rowOff>
                                    </to>
                                </anchor>
                            </controlPr>
                        </control>
                    </mc:Choice>
                </mc:AlternateContent>
            </controls>
        </mc:Choice>
    </mc:AlternateContent>

You can see this <xdr:col>1</xdr:col> doesn’t have any attributes.

And the xdr namespace is defined at very beginning:

<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
...
    xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"

From the source code it throws from WorksheetFragments.java#lines-146:158:

        private List<String> columnNames(final StartElement startElement) {
            try {
                List<String> names = new ArrayList<>();
                int min = Integer.parseUnsignedInt(startElement.getAttributeByName(MIN).getValue());
                int max = Integer.parseUnsignedInt(startElement.getAttributeByName(MAX).getValue());
                for (int i = min; i <= max; i++) {
                    names.add(indexToColumnName(i));
                }
                return names;
            } catch (NumberFormatException | NullPointerException e) {
                throw new OkapiBadFilterInputException("Invalid <col> element", e);
            }
        }

It is trying to get attribute of min and max which is not available from an <xdr:col> . And from its caller:

                } else if (COL.equals(se.getName().getLocalPart())) {
                    if (hidden(se)) {
                        // Column info blocks span one or more columns, which are referred to
                        // via 1-indexed min/max values.
                        this.hiddenColumns.addAll(columnNames(se));
                    }

It looks to me trying to collect hidden column names. But it only checks the local name to be col and ignores the namespace.

By an experiment of hiding some columns, I see the a regular hidden column is like this

<col min="65" max="65" width="1.6796875" style="3" customWidth="1"/>

which is under primary namespace xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"

Conclusion

My inference: the code counting hidden columns didn’t check the namespace, so it accidentally consider a column under spreadsheetDrawing namespace as a regular worksheet column and trying to get min attribute which does not exist.

Suggestion

Add a check to verify the name space is "http://schemas.openxmlformats.org/spreadsheetml/2006/main", and also apply this validation to the hidden row counting.

Comments (6)

  1. Log in to comment