tikal -m is unescaping

Issue #371 resolved
Former user created an issue

Original issue 371 created by johnt... on 2013-10-16T17:37:23.000Z:

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Comments (4)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-10-16T17:41:14.000Z:

    Could you please provide more information about the issue?
    -m is the Tikal command to merge. What is being un-escaped during the merge?
    An example would be useful.
    Thanks

  2. Former user Account Deleted

    Comment 2. originally posted by johnt... on 2013-10-16T18:20:22.000Z:

    I suspect my issue didn't submit correctly with the examples as I got an error message...

    So our source file we're translating has the following XLIFF markup:

    this is a <bpt id="1"></bpt> small house <ept id="1"></ept>

    we convert this to Moses inline format and get our translation:

    this is a <g id="1">small house </g>

    das ist ein <g id="1"> kleines haus </g>

    Finally, we run 'tikal -m' to put the original xliff from the source back into the translated target and we get the following

    this is a <bpt id="1"></bpt> small house <ept id="1"></ept>

    The > entity has been unescaped back to the > character. Now, it comes to my attention that this may be intentional as we only need to escape the < in order to have valid XML�

    I'd be grateful if you could elaborate

  3. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2013-10-16T18:55:52.000Z:

    First you may be using the wrong command. 'Merging' translations from Moses is done by leveraging the file you have prepared when using the -xm command. The leveraging is done with -lm (not -m)

    See http://www.opentag.com/okapi/wiki/index.php?title=Tikal_-_Extraction_Commands#Merge_Files for more info.

    In any case: for any XLIFF document:

    "<bpt id="1"></bpt> small house <ept id="1"></ept>" and "<bpt id="1"></bpt> small house <ept id="1"></ept>" are identical from the XML parser viewpoint. As you noted: there is no need to escape the character '>'.

    cheers,
    -ys

  4. Former user Account Deleted

    Comment 4. originally posted by johnt... on 2013-10-16T19:16:29.000Z:

    I tried the -lm command to the same effect, but not -xm. I'll try this next.

    Anyway, maybe we can get away with not having to worry about this case
    (will have to check with end users).

    Thanks for your help
    John

  5. Log in to comment