OpenXML Filter: improve complex fields extraction when there are more than one text intruction present
Issue #1083
resolved
There are cases when complex fields extraction is expected by not performed due to the presence of “empty” text instructions after a meaningful one:
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:instrText xml:space="preserve"> HYPERLINK \l "_top" </w:instrText>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:instrText xml:space="preserve"> </w:instrText>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:fldChar w:fldCharType="separate"/>
</w:r>
Also, the extraction decision has to be based on the principle of the first meaningful text instruction presence, if there are 2 meaningful ones. So, if there are DATE
and HYPERLINK
, then DATE
has to be considered for extraction and vice versa.
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:instrText xml:space="preserve"> DATE </w:instrText>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:instrText xml:space="preserve"> HYPERLINK \l "_top" </w:instrText>
</w:r>
<w:r w:rsidR="00B4756F">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:fldChar w:fldCharType="separate"/>
</w:r>
For more details, please refer to the attached documents.
Comments (2)
-
reporter -
reporter - changed status to resolved
Pull request #542 was merged.
- Log in to comment
A related pull request #542 was opened.