Various SDLXLIFF issues

Original issue 367 created by @ysavourel on 2013-10-01T22:25:55.000Z:

From:
http://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/29838

However, after making a single change to a file,
and then adding the BOM to the output file again,
Trados still does not accept the file.
Trados' error messages are even less helpful,
so I can't tell what the problem is:
http://i44.tinypic.com/10croer.png

That looks like a similar issue as Roman reported.
We'll try to reproduce and debug it.

version="1.2" sdl:version="1.0"
xmlns="urn:oasis:names:tc:xliff:document:1.2">
into this:
xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2"
sdl:version="1.0">
...which shouldn't make a difference, but we all know
that "shouldn't" doesn't always work that way.
I've seen otherwise perfectly good programs choke on
the fact that a tag's attributes were not in the
sequence that the program expected them to be.

You are correct: the order of the attributes does not matter in XML.
There is actually often no way to know what the original order was once it is parsed.
Tools expecting a fixed order are not using XML parsers and then there is not much we can do about that.

Also in the header, Okapi changes utf-8 to UTF-8.

Same here: the official IANA name is uppercase, and XML processor should be case-insensitive for encoding declaration (See http://www.w3.org/TR/REC-xml/#charencoding).

Trados closes standalone tags, whereas Okapi pairs them.
For example, Trados would write <foo/> but Okapi would write <foo></foo>.

Both notations are equivalent. But we could try to use the shorthand when writing back.

Trados entitises both < and >, whereas Okapi uses
entities only for <. I understand that from a puristic point
of view, > does not need to be written as an entity,
but I also know that some parsers don't hold that view,
and dislike it when you do that.

I would say the tools using XML parsers will have no problem with this.
Here again: there is no way to know what was the original form once the character is parsed.
We could force all > to be escaped, but then the users who want them un-escaped because they want to do compare with the original file where they are un-escaped would complain... there is no way we can win.

In thd BODY, Okapi removes the "trans-unit" tags
around any group of tags that don't have meaningful
meaning in human language.
http://i43.tinypic.com/6t20so.png

That is definitely an issue.
Possibly the cause for your merging error.

Comments (2)