OpenXML: Ignore the _GoBack bookmark

Issue #485 resolved
Chase Tingley created an issue

Office inserts a hidden bookmark called _GoBack to save the last editing position in the document, so that you can jump back to that spot open subsequent openings. (For a good explanation, see here.)

This uses the regular bookmark markup, but can occur in any position, including in the middle of words, etc. This may produce ugly segments due to tag noise. Stripping this during filtering produces no data loss and will marginally improve segment quality.

Comments (2)

  1. Chase Tingley reporter

    Attached a simple example. This produces:

    This is a sentence. The editing po<x id="1"/>sition is in the middle of the file.
    

    The <x> tag is from the _GoBack bookmark:

    <w:p w:rsidR="00474653" w:rsidRDefault="007B3CD1">
          <w:r>
            <w:t>This is a sentence. The editing po</w:t>
          </w:r>
          <w:bookmarkStart w:id="0" w:name="_GoBack"/>
          <w:bookmarkEnd w:id="0"/>
          <w:r>
            <w:t>sition is in the middle of the file.</w:t>
          </w:r>
    </w:p>
    
  2. Log in to comment