- edited description
HTML filter in wellFormed mode drops the end </li> tag when <script> is used
Issue #666
new
When a <script> element exists within the <li>...</li> pair in an HTML snippet file, the end tag </li> disappears in the merged file.
Below is a sample command session to demonstrate this bug.
$ cat ul-snippet.html
<ul>
<li><script type="text/x-nonsense">H</script> is the magnetic field intensity</li>
<li><i>J</i>is the conduction current density</li>
<li><span>D</span>is the electric flux density</li>
</ul>
$ mkdir translated
$ tikal.sh -fc okf_html-wellFormed -x ul-snippet.html
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.34
-------------------------------------------------------------------------------
Extraction
Source language: en
Target language: fr
Default input encoding: UTF-8
Filter configuration: okf_html-wellFormed
Output: /Users/Kuro/tmp/htmltest/ul-snippet.html.xlf
Input: /Users/Kuro/tmp/htmltest/ul-snippet.html
Done in 0.659s
$ emacs ul-snippet.html.xlf # Change cases in fr translations
$ tikal.sh -fc okf_html-wellFormed -m ul-snippet.html.xlf -sd . -od translated
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.34
-------------------------------------------------------------------------------
Merging
Source language: en
Target language: fr
Default input encoding: UTF-8
Output encoding: UTF-8
Filter configuration: okf_html-wellFormed
XLIFF: ul-snippet.html.xlf
Output: translated/ul-snippet.html
Input: /Users/Kuro/tmp/htmltest/ul-snippet.html.xlf
Done in 0.709s
$ diff ul-snippet.html translated/
2,4c2,4
< <li><script type="text/x-nonsense">H</script> is the magnetic field intensity</li>
< <li><i>J</i>is the conduction current density</li>
< <li><span>D</span>is the electric flux density</li>
---
> <li><script type="text/x-nonsense">H</script>IS THE MAGNETIC FIELD INTENSITY
> <li><i>j</i>IS THE CONDUCTION CURRENT DENSITY</li>
> <li><span>d</span>IS THE ELECTRIC FLUX DENSITY</li>
Note that </li> doesn't exist on the first line of the diff in the latter part of the diff output.
Further experiments show that:
- The same thing happens when a custom tag with ruleTypes: [EXCLUDE] is added to the HTML filter configuration, and that tag pair is used within an <li> element.
- This does not happen (i.e. everything works fine) when
-fc okf_html
is specified instead of-fc okf_html-wellFormed
Comments (2)
-
reporter -
reporter Use of the script element in this context is not typical but HTML 5 spec seems to allow such as as the script tag is categorized as a flow content. Any custom tags with ruleTypes: [EXCLUDE] (but not INLINE) cause shows the same symptom.
- Log in to comment