Use of well-formed HTML filter to non-wellformed document silently misbehaves
Issue #668
new
When the filter id okf_html-wellFormed is used to process not well-formed HTML, no warnings or errors are shown and the resulting merged document has some elements out of the order. For example:
$ tikal.sh -fc okf_html-wellFormed -x mini-not-well-formed-2.html
$ tikal.sh -fc okf_html-wellFormed -m mini-not-well-formed-2.html.xlf -sd . -od translated
$ $ sdiff mini-not-well-formed-2.html translated/
<html> <html>
<body border="2"> <body border="2">
<p>This table is lacking some closing tags.</p> <p>This table is lacking some closing tags.</p>
<table> <table>
<tr> <tr>
<th>OS</th> <th>OS</th>
<th>Dir Seprator</th> <th>Dir Seprator</th>
<th>EOL</th> <th>EOL</th>
</tr> </tr>
<tr> <tr>
<td>Windows<td>\<td>CR+LF | </tr></table><td>Windows<td>\ <tr>
</tr> <
<tr> <
<td>Unix&Linux</td><td>/</td><td>LF</td> <td>Unix&Linux</td><td>/</td><td>LF</td>
</tr> / </tr><td>CR+LF<tr><table>
</table> <
</table> is out of order in the merged document and the document is ending with two open tags <tr> and <table>.
While it is not expected that not well-formed HTML document is handled properly by okf_html-wellFormed filter, some message warning the user that the input document isn't well-formed should be given.
With other not well formed HTML document, tikal -x generates an incomplete .xlf file, missing the </xliff> end tag, without a warning. I will attach a sample file when I encounter this situation next time.