OpenXML Filter: XLSX: inline strings are not extracted when the shared strings part is absent
Excel files created using https://sheetjs.com/ can not be correctly read by Okapi. Okapi does not pick up any TextUnits within the document. Nevertheless when running an Okapi filter pipeline which would modify the text and write back the modified file an intact file will be created with the original text left unchanged.
I'm attaching an example file which has been created by sheetjs (npm install xlsx
) version 0.17.5 using the following code:
const XLSX = require('xlsx'); var wb = XLSX.utils.book_new(); wb.SheetNames.push("Test Sheet"); var ws_data = [['hello' , 'world']]; //a row with 2 columns var ws = XLSX.utils.aoa_to_sheet(ws_data); wb.Sheets["Test Sheet"] = ws; XLSX.writeFile(wb, 'okapi-will-not-be able-to-see-the-text.xlsx');
When opening the file with Excel it opens just fine. When then saving the file Okapi is able to process it as expected.
Comments (6)
-
-
Patrick, thank you for your report!
It looks like there is no shared strings part in the original document and it is not created when all strings are listed inline. I.e.:
<sheetData> <row r="1"> <c r="A1" t="str"> <v>hello</v> </c> <c r="B1" t="str"> <v>world</v> </c> </row> </sheetData>
Related to issue
#982. -
- changed title to OpenXML Filter: XLSX: inline strings are not extracted when the shared strings part is absent
-
One more probably related to issue - #1069.
-
- changed milestone to 1.45.0
-
assigned issue to
A related pull request #661 was opened.
-
- changed status to resolved
Pull request #661 was merged.
- Log in to comment
I somehow was not logged in when I filed this bug report. It’s from me.