- changed title to OpenXml Filter: HYPERLINK should apply to excel and powerpoint formats (currently only applies to word)
-
assigned issue to
OpenXml Filter: HYPERLINK (and other complex fields) should apply to excel and powerpoint formats (currently only applies to word)
tsComplexFieldDefinitionsToExtract.i=1
cfd0=HYPERLINK
There may be other options like TOC etc..
BTW: I don’t see these options defined on the wiki page.
Changing the Rainbow UI to have these options as “general” would probably be the cleanest way to represent this.
Comments (15)
-
reporter -
reporter - edited description
-
@jhargrave-straker thanks for reporting this case!
I was wondering whether PowerPoint or Excel have the abilities to produce complex fields. Would it be possible to create sample documents for PPTX and XLSX formats to check the issue?
According to the spec, the
instText
tag, which contains a field code, can be found in runs of WordporcessingML and SpreadsheetML namespaces, however, all examples mentioning the WordprocessingML only. So, it looks like PPTX format should not have complex fields at least.Furthermore, the same code (
net.sf.okapi.filters.openxml.RunParser
) is used for handling all runs. So, I believe the SpreadsheetML should be covered as well. -
@jhargrave-straker if this is for external hyperlinks extraction, then XLSX document should be covered by the recent changes in the scope of pull request #661.
-
reporter @DenisKonovalyenko However, looks like from Rainbow the options are only available on the Word tab.
The exact requirement is the ability to extract URL’s - are these always represented as complex fields? How are URL’s represented in powerpoint?
-
reporter - edited description
- changed title to OpenXml Filter: HYPERLINK (and other complex fields) should apply to excel and powerpoint formats (currently only applies to word)
-
@jhargrave-straker it seems recent versions of MS Office try to stick to
hyperlink
tags as much as possible and convert complex fields of hyperlink type if they appear in DOCX documents. In order to make it possible to extract such hyperlink targets, it is essential to set thebExtractExternalHyperlinks
parameter totrue
. Here and here can be found more details on this matter, and issue#533should help to find the changes in the code base.As for the “Translatable Fields” option, it should be present on the Word panel as it is now, and I would not allow it to be present on the Excel tab, unless there is a confirmation document with such structures to test against.
As for the “Translate Hyperlink URLs” option, I agree that it can be specified on PowerPoint and Excel panels to reflect its availability. However,
-
reporter Ok I will have Dale attach those files next week. I’ll let him respond directly here. Thanks!
-
reporter - attached Example_of_a_PPTX_with_a_link.pptx
- attached Example_of_Excel_with_Link.xlsx
Example excel and powerpoint documents with links.
Note on the Excel: It looks like it links the entire cell, even when just the "linked" part is underlined. Still, the URL should be within the zip's folder structure.
-
reporter Would it be possible to prioritize this for the 1.45.0 release (probably feb)?
-
reporter @Denis Konovalyenko Were you able to test these documents? Dale says that the Excel and PowerPoint does have a hyperlink that is not getting extracted - even with the options you mentioned above.
-
reporter -
@jhargrave-straker it looks like the extraction of hyperlinks is working…
The extracted hyperlink XLIFF related part for the PPTX example:
<file original="ppt/slides/_rels/slide1.xml.rels" source-language="en" target-language="fr" datatype="x-undefined"> <body> <trans-unit id="PE714DE9-tu1"> <source xml:lang="en">https://google.com/</source> <target xml:lang="fr">https://google.com/</target> </trans-unit> </body> </file>
And for the XLSX one:
<file original="xl/worksheets/_rels/sheet1.xml.rels" source-language="en" target-language="fr" datatype="x-undefined"> <body> <trans-unit id="P3826FE68-tu1"> <source xml:lang="en">https://google.com/</source> <target xml:lang="fr">https://google.com/</target> </trans-unit> </body> </file>
The behaviour can be verified when the conditional filter parameter
bExtractExternalHyperlinks
is set totrue
(bExtractExternalHyperlinks.b=true
).At the same time, there has been found out that:
- There is no default value assigned on reset (
net.sf.okapi.filters.openxml.ConditionalParameters#reset
) - The UI option is mentioned on the “Word Options” tab in Rainbow only: “Translate Hyperlink URLs”
So, I will make:
- The default value explicit:
false
- Rename the UI label from “Translate Hyperlink URLs” to “Extract external hyperlinks”
- Move the label and the input field to the “General Options” tab
Notes:
- If a more granulated configuration per document type wanted, there would be better to create another issue for performing the split.
- If an aligned naming of the same option among different filters wanted, there would be better to create a separate issue for this.
- There is no default value assigned on reset (
-
reporter Denis - Dale will work with you on the parameters - he is the primary user - especially of Rainbow etc..
-
@jhargrave-straker thank you for letting me know about this aspect. So far, the mentioned changes have been introduced in the opened pull request #677.
- Log in to comment