Entity reference in XML Stream filter

Issue #394 resolved
Former user created an issue

Original issue 394 created by @ysavourel on 2014-03-05T17:43:14.000Z:

The XML Stream filter seems to escape the ampersand of any entity reference.

<?xml version="1.0" ?>
<root>
<p>test &abcdef; text</p>
</root>

becomes:

<?xml version="1.0" ?>
<root>
<p>test &abcdef; text</p>
</root>

When the entity is declared it also duplicates the content of the declaraion:

<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY abcdef "ABCDEF">
]>
<root>
<p>test &abcdef; text</p>
</root>

becomes:

<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY abcdef "ABCDEF">
]><!ENTITY abcdef "ABCDEF">
<root>
<p>test &abcdef; text</p>
</root>

Comments (11)

  1. Former user Account Deleted

    Comment 1. originally posted by rhong... on 2014-03-06T01:19:41.000Z:

    Further, when I pre-process XML files with okf_xmlstream filter, some chars ("&", "<" and so on) would be escaped to .their entity's name.
    When post-process their translated version, it is reverse. I mean that these entity's name (include those in origin XML files) would be replaced by themselves.
    For more details about such chars, please refer to: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

    I use CodeFinder in the filter:
    useCodeFinder: true
    codeFinderRules: |-
    #v1
    count.i=1
    rule0=(&[^;]+?;|&|<|>|'|")
    But the bug is still there.

  2. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2014-03-07T19:19:14.000Z:

    For the first case (&abcdef;), there's a question of what the correct behavior should be. I think there's a good argument to be made that the entity should be exposed for translation as a placeholder, since we probably don't know what it represents (the entity declaration may not even always be available). Most of time, the entities are used for content parameterization (product name, etc), in which case you don't want the translators to be able to mess with them. So protecting the entity by converting it automatically to the code seems like the best behavior to me.

    Do others agree?

  3. Former user Account Deleted

    Comment 4. originally posted by @ysavourel on 2014-03-07T19:24:02.000Z:

    It makes sense to treat it as an inline code. We should have a type for
    this added to our global types: "entity".

    J

  4. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2014-03-07T19:56:00.000Z:

    +1. The XML Filter does that by default.
    It has an option to expand the entities otherwise, but that should be very rarely used.

  5. Former user Account Deleted

    Comment 6. originally posted by rhong... on 2014-03-08T00:48:02.000Z:

    Hi, tingley. I think it would be a good way to process such entities as you said.

  6. Former user Account Deleted

    Comment 7. originally posted by @ysavourel on 2014-03-12T20:48:55.000Z:

    I fixed this issue:

    <?xml version="1.0" ?>
    <!DOCTYPE root [
    <!ENTITY abcdef "ABCDEF">
    ]><!ENTITY abcdef "ABCDEF">
    <root>
    <p>test &abcdef; text</p>
    </root>

  7. Former user Account Deleted

    Comment 8. originally posted by @ysavourel on 2014-04-07T23:53:33.000Z:

    Jim, can we resolve this?

  8. Former user Account Deleted

    Comment 9. originally posted by @ysavourel on 2014-04-08T01:23:43.000Z:

    You mean the second problem of escaping the entity reference? I won't be able to get to this for a while. Maybe in a week or so.

  9. Former user Account Deleted

    Comment 10. originally posted by @ysavourel on 2014-04-08T02:02:59.000Z:

    Oh my mistake - I meant could we close the bug, I didn't realize there were two separate issues.

  10. Log in to comment