Tikal replacing HTML encoded closing tag when extracting from TMX to XLIFF

Issue #552 new
Garcia JR created an issue
It is replacing '>' by '>'

Steps to reproduce Use a TMX file containg HTML encoded tags in seg:

< > &quot

and convert to it XLIFF (-x -x1 -x2).

Did not have a chance to try with other input format.

Comments (8)

  1. Garcia JR reporter
    It is happening only with '>' encoded char, other chars looks good so far (< &quot)
    
  2. YvesS

    The character "greater-than" can be represented as > or as > in XML. The XML parsers will read both forms as a normal '>'. The same goes for " and ". On the other hand, < and & should be always escaped.

  3. Garcia JR reporter

    ``

    But only & gt; is being decoded. All other encode chars remain the same. For me looks like a bug since tags starting with & lt; and closing with >. Give it a try. Sorry for spaces in & lt & gt tags. I am in mobile and not able to quote text properly.

  4. YvesS

    Mmm... Maybe I'm not understanding the issue.

    If your TMX segment contains something like &lt;B&gt; From the view point of TMX, it's a plain text string "<B>". not a tag. It might have been an HTML tag originally, but in that notation in TMX it's just plain text.

    For plain text only < needs to be encoded as &lt; in order for the parser to make the distinction between a real tag and a string. That is because the parser "sees" > as a special character only when it has started to parse a tag. So both &t;stuff> and &lt;stuff&gt; will be interpreted the same by the parser: as a literal text "<stuff>", not as a tag <stuff>.

  5. Garcia JR reporter

    I do understand how encoded/decoded tags are interpreted, even in low level. I will provide a file with some contents and give a better example by Monday. I agree it is a minor issue, since tags can be parsed no matter of it it's encoded or not. By the way, thanks for your prompt feedback, I really appreciated that

  6. Chase Tingley

    This is not a bug. It's up to the discretion of the XML serializer to write out > as > or &gt;. This is not the case for <, which must always be serialized as &lt; according to the XML spec.

    The handling of > is a matter of personal preference, so it could be an enhancement to allow this toggle. The XLIFF Filter already exposes this option when merging XLIFF files that came in as source; however, tikal calls the XLIFF writer directly and doesn't expose the full set of options on the command-line.

  7. Log in to comment