OpenXML: Inline markups for spell and grammar checking in xlf document

Issue #440 resolved
Former user created an issue

Original issue 440 created by s.kar...@24technology.de on 2015-02-02T08:40:53.000Z:

What steps will reproduce the problem?
1. Take a docx/pptx document with at least one word marked as spelling/grammar error, e.g. "Sentence with an Eror."
2. Convert this document to xlf
3. The converted xlf document has additional g- and/or x-markups

What is the expected output? What do you see instead?
Expected output: No markups around the "wrong"-spelled word.
Current output: Markups around the "wrong"-spelled word.
converted docx:
<source xml:lang="en-us"><x id="1"/><g id="2">Sentence with an </g><x id="3"/><g id="4">Eror</g><x id="5"/><g id="6">.</g><x id="7"/></source>

converted pptx:
<source xml:lang="de-de"><g id="1">Sentence with an </g><g id="2">Eror</g><g id="3">.<g/></source>

What version of the product are you using? On what operating system?
Okapi version: M27 (January 25 2014)
Operating System: Windows 7 Enterprise

Please provide any additional information below.

The xml document of the docx contains this sentence as the following:
Sentence with an </w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:t>Eror</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:t xml:space="preserve">.

Comments (7)

  1. Former user Account Deleted

    Comment 2. originally posted by @ysavourel on 2015-03-26T21:23:38.000Z:

    Sample file with a spelling error or two, but I don't think there's grammatical error markup in there.

  2. Chase Tingley

    The markup structure in question looks like: <w:proofErr w:type="spellStart"/>

    This element is used to anchor both start and end of both spelling and grammar errors, using different type values. The fix is to strip them prior to merging runs.

  3. Chase Tingley

    Fix issue #440 - Strip computed spelling and grammar markup

    This streamlines trans-units by dropping the <w:proofErr> tags from
    DOCX files before merging text runs.
    

    → <<cset 5575305e30bb>>

  4. Log in to comment