Use of well-formed HTML filter to non-wellformed document silently misbehaves

Issue #668 new
Kuro Kurosaka created an issue

When the filter id okf_html-wellFormed is used to process not well-formed HTML, no warnings or errors are shown and the resulting merged document has some elements out of the order. For example:

$ tikal.sh -fc okf_html-wellFormed -x mini-not-well-formed-2.html
$ tikal.sh -fc okf_html-wellFormed -m mini-not-well-formed-2.html.xlf -sd . -od translated
$ $ sdiff mini-not-well-formed-2.html translated/
<html>                              <html>
 <body border="2">                       <body border="2">
  <p>This table is lacking some closing tags.</p>         <p>This table is lacking some closing tags.</p>
  <table>                             <table>
    <tr>                                <tr>
      <th>OS</th>                             <th>OS</th>
      <th>Dir Seprator</th>                           <th>Dir Seprator</th> 
      <th>EOL</th>                            <th>EOL</th>
    </tr>                               </tr>
    <tr>                                <tr>
      <td>Windows<td>\<td>CR+LF                   |       </tr></table><td>Windows<td>\ <tr>
    </tr>                             <
    <tr>                              <
      <td>Unix&amp;Linux</td><td>/</td><td>LF</td>            <td>Unix&amp;Linux</td><td>/</td><td>LF</td>
    </tr>                             /     </tr><td>CR+LF<tr><table>
  </table>                            <

</table> is out of order in the merged document and the document is ending with two open tags <tr> and <table>.

While it is not expected that not well-formed HTML document is handled properly by okf_html-wellFormed filter, some message warning the user that the input document isn't well-formed should be given.

With other not well formed HTML document, tikal -x generates an incomplete .xlf file, missing the </xliff> end tag, without a warning. I will attach a sample file when I encounter this situation next time.

Comments (0)

  1. Log in to comment