Incorrect parsing of XHTML

runa avatarruna created an issue

I created a test resource for the attached XHTML file on Transifex. It seems Transifex is able to parse the file, but strings will appear in a somewhat random order and some of them are also broken up (i.e. Transifex will split the contents in <p></p> tags for some reason).

The main problem, however, occurs when you click on English and choose "Download for use". The file you get is an incorrectly formatted XHTML file. The resulting file seems to break at the point where <code></code> tags are used, and you will only be able to view the first half of the document in a browser.

Any idea what's going on?

Comments (6)

  1. Apostolis Bessas

    Hi,

    I just took a look and the attached file is a whole document. Transifex currently only supports pieces of xhtml (that is, the part between the <body> tags for instance). This will change soon. (This is not the case with HTML, you could try to upload the file as plain html). We are also already working on an improved parser that is more robust (it is almost ready). So, depending on how urgent this is, I would suggest to wait a bit.

  2. runa

    XHTML, with image: uploaded the file with only <body></body> and everything in between. Transifex parses the document correctly, but the very long data:img source line breaks the web interface.

    XHTML, replace image with comment: uploaded the file with only <body></body> and everything in between, replacing the data:img source with a simple comment. Transifex parses the document correctly, but the for_use file is broken and does not render in a browser. The comment is not included.

    HTML, with image: Transifex parses the document correctly, but the very long data:img source line breaks the web interface.

    HTML, replace image with comment: replaced the data:img source with a simple comment. Transifex parses the document correctly, but the for_use file is broken. The document does render in a browser, but the comment is not included.

    The short user manual is a file we send to Tor users via email. Some email providers block HTML attachments, which is why we have to use XHTML instead. Any thoughts on why the final document seems to break on <code></code> tags, and is this something the improved parser will fix?

  3. Apostolis Bessas

    Hi,

    I must have missed your reply. Anyway, the new parser is now live (there are just a couple of bug fixes in the queue and will be deployed shortly). The issue with lotte is still there; I will ping the right guy to fix this. Other than that, I uploaded the file and it seemed to work. Everything that is not handled correctly now is a bug.

    Regards, Apostolis

  4. Anonymous

    A long data:img source line still breaks the web interface. Is this being tracked in another bug report? If not, please reopen this issue.

  5. runa
    • changed status to open

    HTML, replace image with comment: replaced the data:img source with a simple comment. Transifex parses the document correctly, but the for_use file is broken. The document does render in a browser, but the comment is not included.

    This is still an issue.

  6. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.