If TTX file is unsegmented, OmT will create faux segmentation

Issue #164 new
Former user created an issue

Original issue 164 created by afrika... on 2011-02-12T18:43:07.000Z:

What steps will reproduce the problem?
1. Create new TTX file
2. Open in OmT and then create target file
3. Open resulting TTX in TagEditor

What is the expected output?
No change in the file

What do you see instead?
Changes (see attachment)

What version of the product are you using? On what operating system?
OmT 2.2.3_1, okapi-pluginForOmegaT_all-platforms_0.10.zip, Trados 2007, WinXP Pro SP2.

Comments (10)

  1. Former user Account Deleted

    Comment [3.](https://code.google.com/p/okapi/issues/detail?id=164#c3) originally posted by @ysavourel on 2011-02-12T21:19:46.000Z:

    It seems the issue described is that: Given a paragraph with 2 sentences in the original TTX, the resulting TTX has segment markers around the paragraph rather than around each of the sentence (while each sentence was translated as a different segment in OmegaT (as attested by the project TM)). That is the current behavior. The reason for this is related to the interface between the filter and OmegaT:

    - the filter provides the TTX as it to OmegaT's interface: one entry = one paragraph since the TTX is un-segmented.

    - OmegaT then applies its segmentation rules to each entry, resulting in 2 segments in OmegaT's UI and TM

    - then OmegaT put back together the entry to pass it to the filter that creates the output. Because it is a translated entry the filter must put segment markers and has to do this for the whole entry because at its level its has no knowledge of the OmegaT segmentation.

    To get a TTX file with segment marker corresponding to sentences, one has currently to pre-segment the TTX file (in Trados or using Rainbow).

    A fix would be either:

    a) change the way OmegaT interact with filter so it exposes segments rather than "paragraph" entries.

    b) change the way the filter work by adding a segmentation step before feeding the entries to OmegaT. this would also require the project to not segment (since it would be done alread). Ideally such segmentation would use the same rules as OmegaT does. But--while close to SRX--the rules of OmegaT are currently proprietary rules and difficult to use in another software.

    Not sure how to resolve this. But it's certainly a valid issue that need to be addressed.

    -ys

  2. Former user Account Deleted

    Comment [4.](https://code.google.com/p/okapi/issues/detail?id=164#c4) originally posted by afrika... on 2011-02-12T21:35:24.000Z:

    In the mean time, a nice-to-have would be for OmegaT to refuse to attempt a file that is not source=target prepared (either without telling the user why his TTX file is being refused or by giving the user a helpful error message). But that's for the OmegaT people to decide, right?

  3. Former user Account Deleted

    Comment [5.](https://code.google.com/p/okapi/issues/detail?id=164#c5) originally posted by afrika... on 2011-02-12T21:44:59.000Z:

    It seems the issue described is that: Given a paragraph with 2 sentences in the original TTX, the resulting TTX has segment markers around the paragraph rather than around each of the sentence (while each sentence was translated as a different segment in OmegaT (as attested by the project TM)). <<<

    Actually, the problem I have is that the file that OmegaT creates in the end is mangled, in the sense that both source and target fields are now translatable. I have translated them in the attachment. This should not be possible.

  4. Former user Account Deleted

    Comment [6.](https://code.google.com/p/okapi/issues/detail?id=164#c6) originally posted by @ysavourel on 2011-02-13T00:39:09.000Z:

    It seems at least one cause for the the issue is created because the entries delimited by the TTX filter include initial the line-break. They get included after <Tu> rather than before, and this causes TagEditor to open a segment inside existing segments. I'll see how we can fix this. -ys

  5. Former user Account Deleted

    Comment [7.](https://code.google.com/p/okapi/issues/detail?id=164#c7) originally posted by @ysavourel on 2011-02-13T15:39:03.000Z:

    I've made some changes to the filter so that, when the original content is unsegmented, the leading whitespace characters are moved outside of the created entries. TagEditor seems to work better wit the resulting TTX.

    The handling of line-breaks between external codes is not changed yet. I guess we would need to force two segments in those cases. I want to test more files to see the implications on formats like HTML, etc. before implementing something.

    The changes are in the latest snapshot (http://okapi.opentag.com/snapshots/)

  6. Former user Account Deleted

    Comment [8.](https://code.google.com/p/okapi/issues/detail?id=164#c8) originally posted by velior.ivan... on 2012-02-27T15:52:29.000Z:

    I just wanted to add that this an important issue for me. While it is extremely useful to be able to translate an unsegmented TTX file directly in OmegaT, the resulting target TTX file with paragraph-level segmentation isn't always appropriate for delivery to clients because many clients really expect sentence-level segmentation. I try to translate pre-segmented TTX files where possible, but it's not always practical. Will be looking forward to a solution. Thanks a lot! Best regards, Roman

  7. Log in to comment