- changed status to new
OpenXML Filter: Number field do not appear in generated XLIFF
We are having simple xlsx file that has cell with value 1234
. In the output of the xliff, this segment is no where to be found. I am not sure if this is an issue because the 1234
cell is not found in the sharedStrings.xml
or something else is going on.
The actual XLSX file:
The sharedStrings.xml:
The worksheet xml that contains the value 1234
:
And the 1234
is not found in the exported XLIFF, tried it with different files. Is it possible to get the number to appear in the XLIFF, so we can do some application logic on it, because for us it is important that we have all of the segments there, so we can show it as a context.
Our use case is basically, we would want to have the XLIFF file generated, to contain always everything that is translatable or not, and then on application level (since we would have some API call that just generates the XLIFF file in Java and use it in PHP for our actual backend), we can define which rows and columns are translatable/excluded etc. But because the segments for the numbers are missing, are basically messing up our calculation on the segments, because we also want to support multilingual segments as well.
Comments (9)
-
-
@Stefan Brankovikj thank you for posting this!
Do you know if I got it right that the extraction of numbers (and probably dates and booleans) is for context purposes only? Would translate=”no” in XLIFF be enough to define if the extracted text unit is translatable? I think the
net.sf.okapi.filters.xml.BundledConfigsTest#untranslatableContentExtracted
test case can give you more details on that. -
reporter @Denis Konovalyenko
Thank for the quick reply. But as I can see this is for the
okf_xml
filter, right? it is not for the OpenXML?And yes, we just need the numbers (probably dates and booleans too) to just be part of the XLIFF, I do not expect that we will do any modifications on them, but it is important that when we convert XLSX file to XLIFF, the numbers are there, so we can show them properly based on the XLSX structure.
Because at the moment, our XLIFF looks like this:
Which means the number from the XLSX is not included.
I hope I am making sense.
Thanks!
-
reporter @Denis Konovalyenko
Sorry to bother you again, are there any updates here?
Thanks!
-
Stefan, thank you for the remainder!
I think extracting numbers, dates and booleans for the context reason makes sense to me. The mentioned translate=”no” and its use case was provided as a reference of what a solution might look like.
If we take into consideration the following document
then, the extracted XLIFF may look like that:
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns: its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" its:version="2.0"> <file original="xl/sharedStrings.xml" source-language="en" target-language="fr" datatype="x-undefined"> <body> <group id="P76C545-sg1" resname="Sheet1"> <group id="P132303AB-sg1" resname="1"> <trans-unit id="P147242AB-tu1" resname="Sheet1!A1" xml:space="preserve" translate="no"> <source xml:lang="en">111</source> <!-- this is a raw number, may be extracted formatted --> <target xml:lang="fr"></target> </trans-unit> </group> <group id="P132303AB-sg2" resname="2"> <trans-unit id="P147242AB-tu2" resname="Sheet1!A2" xml:space="preserve" translate="no"> <source xml:lang="en">44463.042372685188</source> <!-- this is a raw date with time, may be extracted formatted --> <target xml:lang="fr"></target> </trans-unit> <trans-unit id="P147242AB-tu2" resname="Sheet1!B2" xml:space="preserve" translate="no"> <source xml:lang="en">1</source> <!-- this is a raw boolean, may be extracted formatted --> <target xml:lang="fr"></target> </trans-unit> </group> ... </group> </body> </file>
Please note the translate=”no” appearance and possible improvements in formatting the extracted data.
I will try to attach the document with more details.
-
reporter Thanks!
But what do we need to do in our case in order to achieve this? Do we need to do some changes in the code or this can be done as a global settings? Because I can see the format is changed and they are grouped now with the
group
tags.Because when I check the unit test you shared, it is for
okf_xml
and not forokf_openxml
.Sorry if I am going bit slow here, but this is entirely new thing for us.
Thanks again!
-
- attached numbers-dates-and-booleans.xlsx
The aforementioned document.
-
@Stefan Brankovikj I appreciate your quick response!
I have been clarifying the expected behaviour and proposed a possible way of solving your issue. This means that currently there is no way of getting the XLIFF I posted above other than adding the required changes in the code base of the OpenXML filter (it handles many Office Open XML documents including DOCX, PPTX and XLSX). Ideally, this has to be done under a conditional parameter, something like boolean
extractExcelUntranslatableValues
, or a set of options likeextractExcelUntranslatableNumbers
,extractExcelUntranslatableDates
andextractExcelUntranslatableBooleans
to allow more granulated handling.Could you please let me know if anything is left unclear?
-
reporter Oh no! Damn! Okay, I was really hoping that there is a solution for this.
I am guessing that these things take time, and it won’t be something that is going to happen soon?
And no, it makes sense, and in all honesty, the proposed solution seems like a way to go. Now it all makes sense, thanks!
- Log in to comment