OpenXML Filter: improve complex fields extraction when there are more than one text intruction present

Issue #1083 resolved
Denis Konovalyenko created an issue

There are cases when complex fields extraction is expected by not performed due to the presence of “empty” text instructions after a meaningful one:

      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> HYPERLINK  \l "_top" </w:instrText>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> </w:instrText>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>

Also, the extraction decision has to be based on the principle of the first meaningful text instruction presence, if there are 2 meaningful ones. So, if there are DATE and HYPERLINK, then DATE has to be considered for extraction and vice versa.

      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> DATE </w:instrText>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> HYPERLINK  \l "_top" </w:instrText>
      </w:r>
      <w:r w:rsidR="00B4756F">
        <w:rPr>
          <w:lang w:val="en-US"/>
        </w:rPr>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>

For more details, please refer to the attached documents.

Comments (2)

  1. Log in to comment