IDML Filter: the extraction of the hyperlink text source inner elements is not fully supported

Issue #1179 resolved
Efrem Tewelde created an issue

It is possible for the HyperlinkTextSource story element to contain inner ones, most of which are aligned with the CharacterStyleRange internal elements.

Below is the hyperlink text source description from the specification:

An example document with UI:

its related structure:

        <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/$ID/NormalParagraphStyle" LeftIndent="18" FirstLineIndent="-18" BulletsAndNumberingListType="NumberedList" NumberingContinue="false">
            <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Hyperlink">
                <HyperlinkTextSource Self="u105" Name="http://hyperlink-1.net 1" Hidden="false" AppliedCharacterStyle="n">
                    <Content>http://hyperlink-1.net</Content>
                </HyperlinkTextSource>
            </CharacterStyleRange>
            <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
                <Br />
            </CharacterStyleRange>
        </ParagraphStyleRange>
        <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/$ID/NormalParagraphStyle" LeftIndent="18" FirstLineIndent="-18" BulletsAndNumberingListType="NumberedList">
            <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Hyperlink">
                <HyperlinkTextSource Self="u108" Name="http://hyperlink-2.net 1" Hidden="false" AppliedCharacterStyle="n">
                    <Content>http://hyperlink-2.net</Content>
                    <Br />
                </HyperlinkTextSource>
            </CharacterStyleRange>
            <HyperlinkTextSource Self="u109" Name="http://hyperlink-3.net 1" Hidden="false" AppliedCharacterStyle="n">
                <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Hyperlink">
                    <Content>http://hyperlink-3.net</Content>
                    <Br />
                </CharacterStyleRange>
                <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Hyperlink" Underline="false">
                    <Properties>
                        <AppliedFont type="string">Arial</AppliedFont>
                    </Properties>
                    <Content>Hyperlink text source as a character style</Content>
                </CharacterStyleRange>
                <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Hyperlink">
                    <Br />
                </CharacterStyleRange>
            </HyperlinkTextSource>
        </ParagraphStyleRange>

and its extraction:

<file original="Stories/Story_ue7.xml" source-language="en" target-language="fr" datatype="xml">
<body>
<trans-unit id="P50B5830A-tu1" xml:space="preserve">
<source xml:lang="en"><g id="1"><g id="2">http://hyperlink-1.net</g></g></source>
<target xml:lang="fr"><g id="1"><g id="2">http://hyperlink-1.net</g></g></target>
</trans-unit>
<trans-unit id="P50B5830A-tu2" xml:space="preserve">
<source xml:lang="en"><g id="1"><g id="2">http://hyperlink-2.net<x id="3"/></g></g><g id="4"><g id="5">http://hyperlink-3.net<x id="6"/></g><g id="7">Hyperlink text source as a character style<x id="8"/></g></g></source>
<target xml:lang="fr"><g id="1"><g id="2">http://hyperlink-2.net<x id="3"/></g></g><g id="4"><g id="5">http://hyperlink-3.net<x id="6"/></g><g id="7">Hyperlink text source as a character style<x id="8"/></g></g></target>
</trans-unit>
</body>
</file>

can be found attached.

All inner elements of hyperlink text sources are extracted as inline codes. So, the extraction has to be improved to reflect the aforementioned possibility (handling the Br tags as textual unit boundaries at least).

Comments (10)

  1. Efrem Tewelde reporter

    Hi @Denis Konovalyenko you have looked into an issue related to IDML filter before here. Do you mind looking into this as you might have a better context.

  2. Denis Konovalyenko

    A new configuration option was introduced:

    extractHyperlinkTextSourcesInline
    

    The default value is false.

    When it is set to false, the extraction of hyperlink text sources is performed as reference groups of textual units. E.g.:

    <trans-unit id="P50C39A8B-tu4" xml:space="preserve">
    <source xml:lang="en"><x id="1"/></source>
    <target xml:lang="fr"><x id="1"/></target>
    </trans-unit>
    <group id="P77553333-rg1">
    <trans-unit id="P8441FDF-tu1" xml:space="preserve">
    <source xml:lang="en">A hyperlink </source>
    <target xml:lang="fr">A hyperlink </target>
    </trans-unit>
    <trans-unit id="P8441FDF-tu2" xml:space="preserve">
    <source xml:lang="en">text source 1<g id="1"> and text source 2 and text source 3.</g></source>
    <target xml:lang="fr">text source 1<g id="1"> and text source 2 and text source 3.</g></target>
    </trans-unit>
    </group>
    

    for markup:

       <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph Style 1">
          <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Character Style 1">
            <HyperlinkTextSource Self="u124" Name="Hyperlink 4" Hidden="false"
                                 AppliedCharacterStyle="n">
              <Content>A hyperlink</Content>
              <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Character Style 1">
                <Br/>
              </CharacterStyleRange>
              <Content>text source 1</Content>
              <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
                <Content>and text source 2</Content>
              </CharacterStyleRange>
              <Content>and text source 3.</Content>
            </HyperlinkTextSource>
            <Br/>
          </CharacterStyleRange>
        </ParagraphStyleRange>
    

  3. Log in to comment