OpenXml Filter: HYPERLINK (and other complex fields) should apply to excel and powerpoint formats (currently only applies to word)

Issue #1176 new
jhargrave-straker created an issue

tsComplexFieldDefinitionsToExtract.i=1

cfd0=HYPERLINK

There may be other options like TOC etc..

BTW: I don’t see these options defined on the wiki page.

Changing the Rainbow UI to have these options as “general” would probably be the cleanest way to represent this.

Comments (15)

  1. Denis Konovalyenko

    @jhargrave-straker thanks for reporting this case!

    I was wondering whether PowerPoint or Excel have the abilities to produce complex fields. Would it be possible to create sample documents for PPTX and XLSX formats to check the issue?

    According to the spec, the instText tag, which contains a field code, can be found in runs of WordporcessingML and SpreadsheetML namespaces, however, all examples mentioning the WordprocessingML only. So, it looks like PPTX format should not have complex fields at least.

    Furthermore, the same code (net.sf.okapi.filters.openxml.RunParser) is used for handling all runs. So, I believe the SpreadsheetML should be covered as well.

  2. jhargrave-straker reporter

    @DenisKonovalyenko However, looks like from Rainbow the options are only available on the Word tab.

    The exact requirement is the ability to extract URL’s - are these always represented as complex fields? How are URL’s represented in powerpoint?

  3. Denis Konovalyenko

    @jhargrave-straker it seems recent versions of MS Office try to stick to hyperlink tags as much as possible and convert complex fields of hyperlink type if they appear in DOCX documents. In order to make it possible to extract such hyperlink targets, it is essential to set the bExtractExternalHyperlinks parameter to true. Here and here can be found more details on this matter, and issue #533 should help to find the changes in the code base.

    As for the “Translatable Fields” option, it should be present on the Word panel as it is now, and I would not allow it to be present on the Excel tab, unless there is a confirmation document with such structures to test against.

    As for the “Translate Hyperlink URLs” option, I agree that it can be specified on PowerPoint and Excel panels to reflect its availability. However,

  4. jhargrave-straker reporter

    Ok I will have Dale attach those files next week. I’ll let him respond directly here. Thanks!

  5. jhargrave-straker reporter

    @Denis Konovalyenko Were you able to test these documents? Dale says that the Excel and PowerPoint does have a hyperlink that is not getting extracted - even with the options you mentioned above.

  6. Denis Konovalyenko

    @jhargrave-straker it looks like the extraction of hyperlinks is working…

    The extracted hyperlink XLIFF related part for the PPTX example:

    <file original="ppt/slides/_rels/slide1.xml.rels" source-language="en" target-language="fr" datatype="x-undefined">
    <body>
    <trans-unit id="PE714DE9-tu1">
    <source xml:lang="en">https://google.com/</source>
    <target xml:lang="fr">https://google.com/</target>
    </trans-unit>
    </body>
    </file>
    

    And for the XLSX one:

    <file original="xl/worksheets/_rels/sheet1.xml.rels" source-language="en" target-language="fr" datatype="x-undefined">
    <body>
    <trans-unit id="P3826FE68-tu1">
    <source xml:lang="en">https://google.com/</source>
    <target xml:lang="fr">https://google.com/</target>
    </trans-unit>
    </body>
    </file>
    

    The behaviour can be verified when the conditional filter parameter bExtractExternalHyperlinks is set to true (bExtractExternalHyperlinks.b=true).

    At the same time, there has been found out that:

    1. There is no default value assigned on reset (net.sf.okapi.filters.openxml.ConditionalParameters#reset)
    2. The UI option is mentioned on the “Word Options” tab in Rainbow only: “Translate Hyperlink URLs”

    So, I will make:

    1. The default value explicit: false
    2. Rename the UI label from “Translate Hyperlink URLs” to “Extract external hyperlinks”
    3. Move the label and the input field to the “General Options” tab

    Notes:

    1. If a more granulated configuration per document type wanted, there would be better to create another issue for performing the split.
    2. If an aligned naming of the same option among different filters wanted, there would be better to create a separate issue for this.

  7. jhargrave-straker reporter

    Denis - Dale will work with you on the parameters - he is the primary user - especially of Rainbow etc..

  8. Log in to comment