OmT 2.2.3 TTX using plugin 0.1.0 fails

Issue #157 resolved
Former user created an issue

Original [issue 157](https://code.google.com/p/okapi/issues/detail?id=157) created by le...@absamail.co.za on 2010-12-26T10:23:23.000Z:

What steps will reproduce the problem? 1. Open the attached OmT project in OmT 2.2.3 with the TTX filter.

What is the expected output? What do you see instead? Expected: three segments. Instead: two segments. Expected: TagEditor opens TTX file flawlessly. Instead: TagEditor says "40007: Error reading TTX file: Expected end of tags 'Tuv'"

What version of the product are you using? On what operating system? OmT 2.2.3, TTX filter 0.1.0, Win XP Pro SP2.

Please provide any additional information below.

The message I posted to omegat@ yahoogroups.com:

The TTX filter works even on non pre-translated texts.

I tested it on a Word 2003 file that was TTX'ed. The file contained two paragraphs:

The rain in Spain falls mainly on the plains. This is the house that Jack built. The cat sat on the mat.

The first paragraph was left untranslated (non-pre-translated). The first sentence of the second paragraph was translated.

[Added: TagEditor did segment the second paragraph into two segments, and the first of those segments was translated by me, and the second of those segments was left untranslated (un-pre-translated).]]

In OmT, the second sentence of the second paragraph is not present at all, and the first sentence of the second paragraph was untranslated.

There is a bit of tag soup, too... I had the word "rain", "Spain", "mainly" and "plains" in bold, underline, italics and highlighted (respectively), and this is what I see in OmT:

The <x1/><x2/><g3>rain<x4/><x5/></g3> in <x6/><x7/><g8>Spain<x9/><x10/></g8> falls <x11/><x12/><g13>mainly<x14/><x15/></g13> on the <g16>plains</g16>.

What is interesting is that the "g" tags aren't nested logically (but this is a function of TTX itself, if I remember correctly).

In TagEditor (the program), there is only one opening tag and one closing tag (presumably the "g" tags here) before and after every formatted word.

When I created the target file, TagEditor refused to open the TTX file, saying "40007: Error reading TTX file: Expected end of tags 'Tuv'". When I opened the TTX file, I saw that OmT had placed </Tuv></Tu> directly after the first sentence of the second paragraph, and therefore the second sentence of the second paragraph was "outside" a TUV pair. Fixing this in Notepad isn't trivial because it isn't clear where the end-of-unit tags have to be moved to.

Samuel

Comments (4)

  1. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=157#c3) originally posted by @ysavourel on 2010-12-27T05:20:05.000Z:

    Update: The creation of the target file should be solved now. The problem had to do with closing </df> that become isolated when creating the segment markers.

    The latest snapshot has the fix (http://okapi.opentag.com/snapshots)

    The text not extracted is issue \#151. I will work on it.

    The soup of tags is still there. We'll try to reduce it, but it's hard to preserve the <df> and the <ut> and show only one set of tags.

  2. Former user Account Deleted

    Comment [4.](https://code.google.com/p/okapi/issues/detail?id=157#c4) originally posted by @ysavourel on 2010-12-29T13:31:41.000Z:

    The non-extracted text problem should be resolved.

    Most problems with overlapping <df> elements causing XML error should also be resolved, except in the cases where the original TTX has <df> elements that go across external <ut>. Those should be rare and for example correspond to badly written HTML like this: <p>ab cb</p><p>ef gh</p>. A different issue is open for those (158).

  3. Log in to comment