- changed title to Tikal replacing HTML encoded closing tag when converting to XLIFF both versions.
- edited description
Tikal replacing HTML encoded closing tag when extracting from TMX to XLIFF
It is replacing '>' by '>'
Steps to reproduce Use a TMX file containg HTML encoded tags in seg:
< > "
and convert to it XLIFF (-x -x1 -x2).
Did not have a chance to try with other input format.
Comments (8)
-
reporter -
reporter It is happening only with '>' encoded char, other chars looks good so far (< ")
-
reporter - changed title to Tikal replacing HTML encoded closing tag when extracting from TMX to XLIFF
- edited description
-
The character "greater-than" can be represented as > or as > in XML. The XML parsers will read both forms as a normal '>'. The same goes for " and ". On the other hand, < and & should be always escaped.
-
reporter ``
But only & gt; is being decoded. All other encode chars remain the same. For me looks like a bug since tags starting with & lt; and closing with >. Give it a try. Sorry for spaces in & lt & gt tags. I am in mobile and not able to quote text properly.
-
Mmm... Maybe I'm not understanding the issue.
If your TMX segment contains something like
<B>
From the view point of TMX, it's a plain text string "<B>". not a tag. It might have been an HTML tag originally, but in that notation in TMX it's just plain text.For plain text only < needs to be encoded as
<
in order for the parser to make the distinction between a real tag and a string. That is because the parser "sees" > as a special character only when it has started to parse a tag. So both&t;stuff>
and<stuff>
will be interpreted the same by the parser: as a literal text "<stuff>", not as a tag<stuff>
. -
reporter I do understand how encoded/decoded tags are interpreted, even in low level. I will provide a file with some contents and give a better example by Monday. I agree it is a minor issue, since tags can be parsed no matter of it it's encoded or not. By the way, thanks for your prompt feedback, I really appreciated that
-
This is not a bug. It's up to the discretion of the XML serializer to write out
>
as>
or>
. This is not the case for<
, which must always be serialized as<
according to the XML spec.The handling of
>
is a matter of personal preference, so it could be an enhancement to allow this toggle. The XLIFF Filter already exposes this option when merging XLIFF files that came in as source; however, tikal calls the XLIFF writer directly and doesn't expose the full set of options on the command-line. - Log in to comment