- attached goback.docx
OpenXML: Ignore the _GoBack bookmark
Issue #485
resolved
Office inserts a hidden bookmark called _GoBack to save the last editing position in the document, so that you can jump back to that spot open subsequent openings. (For a good explanation, see here.)
This uses the regular bookmark markup, but can occur in any position, including in the middle of words, etc. This may produce ugly segments due to tag noise. Stripping this during filtering produces no data loss and will marginally improve segment quality.
Comments (2)
-
reporter -
reporter - changed status to resolved
Fix issue
#485- Strip "_GoBack" bookmarks from word documents→ <<cset c0c0869ce5fa>>
- Log in to comment
Attached a simple example. This produces:
The <x> tag is from the _GoBack bookmark: