Entity reference in XML Stream filter
Original issue 394 created by @ysavourel on 2014-03-05T17:43:14.000Z:
The XML Stream filter seems to escape the ampersand of any entity reference.
<?xml version="1.0" ?>
<root>
<p>test &abcdef; text</p>
</root>
becomes:
<?xml version="1.0" ?>
<root>
<p>test &abcdef; text</p>
</root>
When the entity is declared it also duplicates the content of the declaraion:
<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY abcdef "ABCDEF">
]>
<root>
<p>test &abcdef; text</p>
</root>
becomes:
<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY abcdef "ABCDEF">
]><!ENTITY abcdef "ABCDEF">
<root>
<p>test &abcdef; text</p>
</root>
Comments (11)
-
Account Deleted -
Account Deleted Comment 2. originally posted by @ysavourel on 2014-03-07T17:12:42.000Z:
-
Account Deleted Comment 3. originally posted by @ysavourel on 2014-03-07T19:19:14.000Z:
For the first case (&abcdef;), there's a question of what the correct behavior should be. I think there's a good argument to be made that the entity should be exposed for translation as a placeholder, since we probably don't know what it represents (the entity declaration may not even always be available). Most of time, the entities are used for content parameterization (product name, etc), in which case you don't want the translators to be able to mess with them. So protecting the entity by converting it automatically to the code seems like the best behavior to me.
Do others agree?
-
Account Deleted Comment 4. originally posted by @ysavourel on 2014-03-07T19:24:02.000Z:
It makes sense to treat it as an inline code. We should have a type for
this added to our global types: "entity".J
-
Account Deleted Comment 5. originally posted by @ysavourel on 2014-03-07T19:56:00.000Z:
+1. The XML Filter does that by default.
It has an option to expand the entities otherwise, but that should be very rarely used. -
Account Deleted Comment 6. originally posted by rhong... on 2014-03-08T00:48:02.000Z:
Hi, tingley. I think it would be a good way to process such entities as you said.
-
Account Deleted Comment 7. originally posted by @ysavourel on 2014-03-12T20:48:55.000Z:
I fixed this issue:
<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY abcdef "ABCDEF">
]><!ENTITY abcdef "ABCDEF">
<root>
<p>test &abcdef; text</p>
</root> -
Account Deleted Comment 8. originally posted by @ysavourel on 2014-04-07T23:53:33.000Z:
Jim, can we resolve this?
-
Account Deleted Comment 9. originally posted by @ysavourel on 2014-04-08T01:23:43.000Z:
You mean the second problem of escaping the entity reference? I won't be able to get to this for a while. Maybe in a week or so.
-
Account Deleted Comment 10. originally posted by @ysavourel on 2014-04-08T02:02:59.000Z:
Oh my mistake - I meant could we close the bug, I didn't realize there were two separate issues.
-
- changed status to resolved
- Log in to comment
Comment 1. originally posted by rhong... on 2014-03-06T01:19:41.000Z:
Further, when I pre-process XML files with okf_xmlstream filter, some chars ("&", "<" and so on) would be escaped to .their entity's name.
When post-process their translated version, it is reverse. I mean that these entity's name (include those in origin XML files) would be replaced by themselves.
For more details about such chars, please refer to: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML
I use CodeFinder in the filter:
useCodeFinder: true
codeFinderRules: |-
#v1
count.i=1
rule0=(&[^;]+?;|&|<|>|'|")
But the bug is still there.