HTML filter in wellFormed mode drops the end </li> tag when <script> is used

Issue #666 new
Kuro Kurosaka created an issue

When a <script> element exists within the <li>...</li> pair in an HTML snippet file, the end tag </li> disappears in the merged file.

Below is a sample command session to demonstrate this bug.

$ cat ul-snippet.html 
<ul>
<li><script type="text/x-nonsense">H</script> is the magnetic field intensity</li>
<li><i>J</i>is the conduction current density</li>
<li><span>D</span>is the electric flux density</li>
</ul>
$ mkdir translated
$ tikal.sh -fc okf_html-wellFormed -x ul-snippet.html 
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.34
-------------------------------------------------------------------------------
Extraction
Source language: en
Target language: fr
Default input encoding: UTF-8
Filter configuration: okf_html-wellFormed
Output: /Users/Kuro/tmp/htmltest/ul-snippet.html.xlf
Input: /Users/Kuro/tmp/htmltest/ul-snippet.html
Done in 0.659s
$ emacs ul-snippet.html.xlf # Change cases in fr translations
$ tikal.sh -fc okf_html-wellFormed -m ul-snippet.html.xlf -sd . -od translated
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.34
-------------------------------------------------------------------------------
Merging
Source language: en
Target language: fr
Default input encoding: UTF-8
Output encoding: UTF-8
Filter configuration: okf_html-wellFormed
XLIFF: ul-snippet.html.xlf
Output: translated/ul-snippet.html
Input: /Users/Kuro/tmp/htmltest/ul-snippet.html.xlf
Done in 0.709s
$ diff ul-snippet.html translated/
2,4c2,4
< <li><script type="text/x-nonsense">H</script> is the magnetic field intensity</li>
< <li><i>J</i>is the conduction current density</li>
< <li><span>D</span>is the electric flux density</li>
---
> <li><script type="text/x-nonsense">H</script>IS THE MAGNETIC FIELD INTENSITY
> <li><i>j</i>IS THE CONDUCTION CURRENT DENSITY</li>
> <li><span>d</span>IS THE ELECTRIC FLUX DENSITY</li>

Note that </li> doesn't exist on the first line of the diff in the latter part of the diff output.

Further experiments show that:

  • The same thing happens when a custom tag with ruleTypes: [EXCLUDE] is added to the HTML filter configuration, and that tag pair is used within an <li> element.
  • This does not happen (i.e. everything works fine) when -fc okf_html is specified instead of -fc okf_html-wellFormed

Comments (2)

  1. Kuro Kurosaka reporter

    Use of the script element in this context is not typical but HTML 5 spec seems to allow such as as the script tag is categorized as a flow content. Any custom tags with ruleTypes: [EXCLUDE] (but not INLINE) cause shows the same symptom.

  2. Log in to comment