OpenXML Filter: XLSX: inline strings are not extracted when the shared strings part is absent

Issue #1116 resolved
Former user created an issue

Excel files created using https://sheetjs.com/ can not be correctly read by Okapi. Okapi does not pick up any TextUnits within the document. Nevertheless when running an Okapi filter pipeline which would modify the text and write back the modified file an intact file will be created with the original text left unchanged.

I'm attaching an example file which has been created by sheetjs (npm install xlsx) version 0.17.5 using the following code:

const XLSX = require('xlsx');

var wb = XLSX.utils.book_new();
wb.SheetNames.push("Test Sheet");
var ws_data = [['hello' , 'world']];  //a row with 2 columns
var ws = XLSX.utils.aoa_to_sheet(ws_data);
wb.Sheets["Test Sheet"] = ws;
XLSX.writeFile(wb, 'okapi-will-not-be able-to-see-the-text.xlsx');

When opening the file with Excel it opens just fine. When then saving the file Okapi is able to process it as expected.

Comments (6)

  1. Denis Konovalyenko

    Patrick, thank you for your report!

    It looks like there is no shared strings part in the original document and it is not created when all strings are listed inline. I.e.:

      <sheetData>
        <row r="1">
          <c r="A1" t="str">
            <v>hello</v>
          </c>
          <c r="B1" t="str">
            <v>world</v>
          </c>
        </row>
      </sheetData>
    

    Related to issue #982.

  2. Log in to comment