Translation of Excel table column name leads to "recover excel" dialogue box appearance

Create issue
Issue #1051 resolved
Former user created an issue

Got that "recover excel" dialogue after merging it back

To reproduce:

  1. I've attached a sample file. Download it
  2. Extract it (to XLIFF)
  3. Find "OrderDate" word on the <target> and replace it with something else (“NewOrderDate”, for example)
  4. Merge it back
  5. Open the merged file. You should see the "recover excel" dialogue box

Comments (22)

  1. Handika D

    Update: I’ve just found out that it’s happened not only on Arabic text, it’s also happened on normal text too

  2. Kuro Kurosaka (BH Lab)

    I was able to reproduce this error by changing “OrderDate” to “XOrderDate” and merge it. The simple round trip didn’t cause this, so I suspect some kind of inconsistency between files within the .xlsx.

  3. Chase Tingley

    @Kuro Kurosaka Can you send me the XLIFF you used when you were merging this? I’m having trouble reproducing.

  4. Kuro Kurosaka (BH Lab)

    @Chase Tingley , I'm attaching the .xlsx.xlf file and .out.xlsx file. The Excel that I'm using is an unlicensed (read-only) version of Microsoft Excel for Mac Version 16.36. Interestingly, LibreOffice can open the .out.xlsx file without a problem.

  5. Chase Tingley

    @Kuro Kurosaka Oh, I was also trying to open it in LibreOffice, maybe that’s why I didn’t see the problem. Possibly a place where Microsoft doesn’t follow their own spec.

    cc @Denis Konovalyenko , when you have a chance

  6. Kuro Kurosaka (BH Lab)

    The company this Excel sheet is referring to, https://www.contextures.com/index.html, is probably the company that made this file. Since they are claiming to be the Excel expert, I am guessing that it might be using some uncommon features against which Okapi has never been tested.

  7. Denis Konovalyenko

    @Chase Tingley , @Kuro Kurosaka , Handika D, a closer look at the problem described has been taken and the following nuance has been revealed:

    The document has a table (table1.xml part) with <tableColumn... name="SomeName"/> markup, where name attribute values represent the unique caption of the table column - this is what is displayed in the header row in the UI, and is referenced through functions. So, if any of the first-row cell values of sheet2.xml part is changed (translated), a corresponding tableColumn name attribute value has to be changed (translated) as well. This is not happening at the moment.

    So, a workaround would be to manually adjust every needed table column name to match it with the one that has been translated (see the sharedStrings.xml part). As an example, below you can find a screenshot with manually changed “OrderDate” to “NewOrderDate” value:

    For more information please refer to the SampleDataManuallyAdjusted.out.xlsx document.

  8. Handika D

    So you mean I have to manually open the SampleData file and change the table header (also manually) in Word?

    Thank you guys for this discussion anyway

  9. Kuro Kurosaka (BH Lab)

    Thank you @Denis Konovalyenko . I didn’t know there was a different type of things called table in Excel; I thought each sheet of Excel is a gigantic table. I’m learning everyday.

  10. Denis Konovalyenko

    Handika D, I am afraid, the OpenXML filter does not support the behaviour with table parts involved. So, yes, the merged (translated) document would require additional steps to fix it. I will try to make them a bit more verbose:

    1. Unpack a document with unzip
    2. Look at every xl/tables/tableN.xml part and find out what are the column names and their order
    3. Find related xl/worksheets/sheetN.xml parts and discover the indexes under which the shared strings are written in the xl/sharedStrings.xml part. For instance,

      xml <row r="1" spans="1:7" x14ac:dyDescent="0.25"> <!-- the first row --> <c r="A1" s="11" t="s"> <!-- the cell number --> <v>16</v> <!-- this is a 0-based index in the sharedStrings.xml --> </c> ... 4. Copy all translated column names found in the xl/sharedString.xml to their corresponding places in the xl/tables/tableN.xml parts 5. Pack the document with zip

    Hope this would be helpful.

    @Kuro Kurosaka , indeed, OOXML is not an easy thing to deal with, especially when implementations deviate from the original specification to some extent (happens with Microsoft products from time to time)…

  11. Handika Dwi

    I’m now trying to solve this issue. Finally I have the time for this even though I don’t have any experience on building localization tools

    I’ve inspected thoroughly the xl/tables/table{N}.xml for days

    I find that the most complicated part of the Table feature is I can’t find the sheet and column related to the content. Have read SpreadsheetML on the official docs but still can’t find the explanation related to this thing.

    @Kuro Kurosaka @Denis Konovalyenko Do you guys have opinion on this?

  12. Denis Konovalyenko

    @Handika Dwi, I appreciate your attitude towards the resolution of this issue. You are a goal-oriented person. Could you please let me know whether you are willing to patch some documents by hand or there is an intention to dive into a related coding experience?

    For the reasons of reducing the volume of data to store and the time to process Excel documents, the shared strings part is used for keeping unique strings as much as possible. They are referenced from worksheets by 0-based indexes.

    Furthermore, a table element within a table part refers to a range of columns and rows a worksheet represents (using A1 style referencing). There is ref=”A1:P701” in your example. So, I expect that 16 column names should appear in the first row of a worksheet. And here the picture is assembled - we can match the actual values (strings).

    I assume you may wonder about how to link a worksheet with a particular table. This can be done with the help of the parts relationships. Just look for the information like this <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/table" Target="../tables/table1.xml"/> in worksheets/_rels/sheetX.xml.rels , where sheetX is a worksheet, and a table available for this worksheet can be found in the Target attribute. To be more concise, there can be even more reference data in a sheet. It looks a bit extra to me but still someone could use this for coming up with an idea of what tables are linked to a sheet. E.g.

      <tableParts count="1">
        <tablePart r:id="rId1"/>
      </tableParts>
    

    As for the programmatic solution for this issue, it is going to be considered in the scope of the issue #1059 (I have been looking into this for some time).

  13. Handika Dwi

    I want to channel my 3+ years of software engineering experience onto Okapi by solving this issue after I’ve used Okapi for the last 2 months. I’m keen on Java, Maven and its whatnot.

    However before I delve into Okapi and this issue for that matter, what should I know first related to Okapi’s things?

  14. Log in to comment