Various SDLXLIFF issues

Issue #367 resolved
Former user created an issue

Original issue 367 created by @ysavourel on 2013-10-01T22:25:55.000Z:

From:
http://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/29838

However, after making a single change to a file,
and then adding the BOM to the output file again,
Trados still does not accept the file.
Trados' error messages are even less helpful,
so I can't tell what the problem is:
http://i44.tinypic.com/10croer.png

That looks like a similar issue as Roman reported.
We'll try to reproduce and debug it.

version="1.2" sdl:version="1.0"
xmlns="urn:oasis:names:tc:xliff:document:1.2">
into this:
xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2"
sdl:version="1.0">
...which shouldn't make a difference, but we all know
that "shouldn't" doesn't always work that way.
I've seen otherwise perfectly good programs choke on
the fact that a tag's attributes were not in the
sequence that the program expected them to be.

You are correct: the order of the attributes does not matter in XML.
There is actually often no way to know what the original order was once it is parsed.
Tools expecting a fixed order are not using XML parsers and then there is not much we can do about that.

  1. Also in the header, Okapi changes utf-8 to UTF-8.

Same here: the official IANA name is uppercase, and XML processor should be case-insensitive for encoding declaration (See http://www.w3.org/TR/REC-xml/#charencoding).

  1. Trados closes standalone tags, whereas Okapi pairs them.
    For example, Trados would write <foo/> but Okapi would write <foo></foo>.

Both notations are equivalent. But we could try to use the shorthand when writing back.

  1. Trados entitises both < and >, whereas Okapi uses
    entities only for <. I understand that from a puristic point
    of view, > does not need to be written as an entity,
    but I also know that some parsers don't hold that view,
    and dislike it when you do that.

I would say the tools using XML parsers will have no problem with this.
Here again: there is no way to know what was the original form once the character is parsed.
We could force all > to be escaped, but then the users who want them un-escaped because they want to do compare with the original file where they are un-escaped would complain... there is no way we can win.

  1. In thd BODY, Okapi removes the "trans-unit" tags
    around any group of tags that don't have meaningful
    meaning in human language.
    http://i43.tinypic.com/6t20so.png

That is definitely an issue.
Possibly the cause for your merging error.

Comments (2)

  1. Log in to comment