Source

html2textile / runtests.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

r"""
Test the correct handling of whitespace around Textile mark-up:

>>> convert('Some <em>emphasised</em> text.')
Some _emphasised_ text.

>>> convert('Some<em> emphasised</em> text.')
Some _emphasised_ text.

>>> convert('Some <em>emphasised </em>text.')
Some _emphasised_ text.

>>> convert('Some<em>emphasised</em>text.')
Some _emphasised_ text.

>>> convert('Empty <strong></strong> tags.')
Empty  tags.

>>> convert('Whitespace <strong>   </strong> tags.')
Whitespace  tags.


Test that the spacing around HTML entities is not affected.
>>> convert('bangers &amp; mash')
bangers & mash

Test that unrecognised tags are not interfering with the output
>>> convert('Free text.<p>Paragraph text.</p>')
Free text.
<BLANKLINE>
Paragraph text.
>>> convert('<div>Free text.<p>Paragraph text.</p></div>')
Free text.
<BLANKLINE>
Paragraph text.

Test that attributes from unknown tags are being discarded.
>>> convert('<address id="some_id" class="some_class">Some text.</address>')
Some text.


Test that images are handled correctly:

>>> convert('<img src="image.jpg" />')
!image.jpg!

>>> convert('<img src="image.jpg" class="image" style="width:40px" />')
!(image){width:40px}image.jpg!

>>> convert('<img src="image.jpg" alt="An image." />')
!image.jpg(An image.)!

>>> convert('<img src="image.jpg" title="An image." />')
!image.jpg(An image.)!

Closing parentheses in alternative text need to be encoded as HTML entities,
so that they're not interpreted as terminating parentheses.
>>> convert('<img src="smiley.jpg" alt=":)" />')
!smiley.jpg(:&#41;)!

>>> convert('<img src="http://example.com/spaced(!) link.jpg" />')
!http://example.com/spaced%28%21)%20link.jpg!


Test paragraphs:

Test paragraph attributes.
>>> convert('<p class="first">Paragraph.</p><p class="second">Paragraph.</p>')
p(first). Paragraph.
<BLANKLINE>
p(second). Paragraph.

Test sane handling of paragraphs enclosed in links.
>>> convert('<a href="http://example.com"><p><em>Example.</em></p></a>')
"_Example._":http://example.com

Test <p /> tags.
>>> convert('Paragraph 1.<p/>Paragraph 2.')
Paragraph 1.
<BLANKLINE>
Paragraph 2.

>>> convert('<p><p/>Some text. <p/> Indented text. </p>')
Some text.
<BLANKLINE>
Indented text.

Textile doesn't support nested paragraph, only use the innermost one.
>>> convert('<p class="container"></p><p class="text">Text.</p>')
p(text). Text.

A more extensive test of nested paragraphs.
>>> convert('<div class="body"> <div class="body"> <h4>Heading.</h4> <p>Paragraph.<p></div></div>')
h4. Heading.
<BLANKLINE>
Paragraph.

And blockquotes.
>>> convert('<blockquote><p>Paragraph text.</p><blockquote>Quoted text.</blockquote></blockquote>')
Paragraph text.
<BLANKLINE>
bq. Quoted text.


Test line breaks:

>>> convert('Two<br />lines.')
Two
lines.

>>> convert('<strong><br />Text.</strong>')
*Text.*

>>> convert('<strong><em>Two<br/>lines.</em></strong>')
*_Two_*
*_lines._*

>>> convert('<em>Emphasised text.<br/> Indented emphasised text.</em>')
_Emphasised text._
_Indented emphasised text._

>>> convert('<em>Emphasised text.<br /></em><br />More text.')
_Emphasised text._
<BLANKLINE>
More text.


Ensure sole images aren't unnecessarily indented.
>>> convert('<div>Text.</div><div><img src="image.jpg" /></div>')
Text.
<BLANKLINE>
!image.jpg!


Test moving of simple tags inside links:

>>> convert('<strong><a href="http://example.com">Link.</a></strong>')
"*Link.*":http://example.com

>>> convert('<em><strong><a href="http://example.com">Link.</a></strong></em>')
"_*Link.*_":http://example.com

Test that tag attributes do not interfere with the process of moving them
inside. Unfortunately, we're not preserving the attributes.
>>> convert('<p><span style="color: navy"><span><a href="example.com">Link</a></span></span></p>')
"%Link%":example.com


Test classes and IDs.
>>> convert('<p class="some_class" id="some_id">Some text.</p>')
p(some_class #some_id). Some text.

Test nested identical tags.
>>> convert('<span><span>Text.</span></span>')
%Text.%

Textile doesn't support this either.
>>> convert('<span>Some text <span>inside</span> a span.</span>')
%Some text inside a span.%


Paragraphs inside list items cannot be represented in Textile
>>> convert('<ol><li><p>Paragraph.</p></li></ol>')
# Paragraph.

Test a complicated shambles of nested lists.
>>> convert('<ol><li><ol><li><ol type="i"><li>Three.</li></ol></li></ol></li></ol>')
# 
## 
### Three.


KNOWN ISSUES:

#>>> convert('<p>---<em><br />*Note.</em>')
---
_*Note.*

Ending simple tags after links.
#>>> convert('<em>Click <a href="http://www.example.com">here.</a></em>')
_Click ["here":http://example.com]_

Spaceless mark-up.
#>>> convert('L<sup>A</sup>T<sub>E</sub>X')
L[^A^]T[~E~]X

More than one paragraph inside a link (invalid HTML).
#>>> convert('<a href="http://example.com"><p class="first">1</p><p class="second">2</p></a>')


"""

if __name__ == "__main__":
    import doctest
    from html2textile import html2textile
    
    def convert(text):
        print html2textile(text)
    
    doctest.testmod(verbose=True)