OpenXML Filter: Number field do not appear in generated XLIFF

Issue #1336 new
Stefan Brankovikj created an issue

We are having simple xlsx file that has cell with value 1234. In the output of the xliff, this segment is no where to be found. I am not sure if this is an issue because the 1234 cell is not found in the sharedStrings.xml or something else is going on.

The actual XLSX file:

The sharedStrings.xml:

The worksheet xml that contains the value 1234:

And the 1234 is not found in the exported XLIFF, tried it with different files. Is it possible to get the number to appear in the XLIFF, so we can do some application logic on it, because for us it is important that we have all of the segments there, so we can show it as a context.

Our use case is basically, we would want to have the XLIFF file generated, to contain always everything that is translatable or not, and then on application level (since we would have some API call that just generates the XLIFF file in Java and use it in PHP for our actual backend), we can define which rows and columns are translatable/excluded etc. But because the segments for the numbers are missing, are basically messing up our calculation on the segments, because we also want to support multilingual segments as well.

Comments (9)

  1. Denis Konovalyenko

    @Stefan Brankovikj thank you for posting this!

    Do you know if I got it right that the extraction of numbers (and probably dates and booleans) is for context purposes only? Would translate=”no” in XLIFF be enough to define if the extracted text unit is translatable? I think the net.sf.okapi.filters.xml.BundledConfigsTest#untranslatableContentExtracted test case can give you more details on that.

  2. Stefan Brankovikj reporter

    @Denis Konovalyenko

    Thank for the quick reply. But as I can see this is for the okf_xml filter, right? it is not for the OpenXML?

    And yes, we just need the numbers (probably dates and booleans too) to just be part of the XLIFF, I do not expect that we will do any modifications on them, but it is important that when we convert XLSX file to XLIFF, the numbers are there, so we can show them properly based on the XLSX structure.

    Because at the moment, our XLIFF looks like this:

    Which means the number from the XLSX is not included.

    I hope I am making sense.

    Thanks!

  3. Stefan Brankovikj reporter

    @Denis Konovalyenko

    Sorry to bother you again, are there any updates here?

    Thanks!

  4. Denis Konovalyenko

    Stefan, thank you for the remainder!

    I think extracting numbers, dates and booleans for the context reason makes sense to me. The mentioned translate=”no” and its use case was provided as a reference of what a solution might look like.

    If we take into consideration the following document

    then, the extracted XLIFF may look like that:

    <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:
    its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" its:version="2.0">
    <file original="xl/sharedStrings.xml" source-language="en" target-language="fr" datatype="x-undefined">
    <body>
    <group id="P76C545-sg1" resname="Sheet1">
    <group id="P132303AB-sg1" resname="1">
    <trans-unit id="P147242AB-tu1" resname="Sheet1!A1" xml:space="preserve" translate="no">
    <source xml:lang="en">111</source> <!-- this is a raw number, may be extracted formatted -->
    <target xml:lang="fr"></target>
    </trans-unit> 
    </group>
    <group id="P132303AB-sg2" resname="2">
    <trans-unit id="P147242AB-tu2" resname="Sheet1!A2" xml:space="preserve" translate="no">
    <source xml:lang="en">44463.042372685188</source> <!-- this is a raw date with time, may be extracted formatted -->
    <target xml:lang="fr"></target>
    </trans-unit>
    <trans-unit id="P147242AB-tu2" resname="Sheet1!B2" xml:space="preserve" translate="no">
    <source xml:lang="en">1</source> <!-- this is a raw boolean, may be extracted formatted -->
    <target xml:lang="fr"></target>
    </trans-unit>
    </group>
    ...
    </group>
    </body>
    </file>
    

    Please note the translate=”no” appearance and possible improvements in formatting the extracted data.

    I will try to attach the document with more details.

  5. Stefan Brankovikj reporter

    Thanks!

    But what do we need to do in our case in order to achieve this? Do we need to do some changes in the code or this can be done as a global settings? Because I can see the format is changed and they are grouped now with the group tags.

    Because when I check the unit test you shared, it is for okf_xml and not for okf_openxml.

    Sorry if I am going bit slow here, but this is entirely new thing for us.

    Thanks again!

  6. Denis Konovalyenko

    @Stefan Brankovikj I appreciate your quick response!

    I have been clarifying the expected behaviour and proposed a possible way of solving your issue. This means that currently there is no way of getting the XLIFF I posted above other than adding the required changes in the code base of the OpenXML filter (it handles many Office Open XML documents including DOCX, PPTX and XLSX). Ideally, this has to be done under a conditional parameter, something like boolean extractExcelUntranslatableValues, or a set of options like extractExcelUntranslatableNumbers, extractExcelUntranslatableDates and extractExcelUntranslatableBooleans to allow more granulated handling.

    Could you please let me know if anything is left unclear?

  7. Stefan Brankovikj reporter

    Oh no! Damn! Okay, I was really hoping that there is a solution for this.

    I am guessing that these things take time, and it won’t be something that is going to happen soon?

    And no, it makes sense, and in all honesty, the proposed solution seems like a way to go. Now it all makes sense, thanks!

  8. Log in to comment