html2textile /

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Test the correct handling of whitespace around Textile mark-up:

>>> convert('Some <em>emphasised</em> text.')
Some _emphasised_ text.

>>> convert('Some<em> emphasised</em> text.')
Some _emphasised_ text.

>>> convert('Some <em>emphasised </em>text.')
Some _emphasised_ text.

>>> convert('Some<em>emphasised</em>text.')
Some _emphasised_ text.

>>> convert('Empty <strong></strong> tags.')
Empty  tags.

>>> convert('Whitespace <strong>   </strong> tags.')
Whitespace  tags.

Test that the spacing around HTML entities is not affected.
>>> convert('bangers &amp; mash')
bangers & mash

Test that unrecognised tags are not interfering with the output
>>> convert('Free text.<p>Paragraph text.</p>')
Free text.
Paragraph text.
>>> convert('<div>Free text.<p>Paragraph text.</p></div>')
Free text.
Paragraph text.

Test that attributes from unknown tags are being discarded.
>>> convert('<address id="some_id" class="some_class">Some text.</address>')
Some text.

Test that images are handled correctly:

>>> convert('<img src="image.jpg" />')

>>> convert('<img src="image.jpg" class="image" style="width:40px" />')

>>> convert('<img src="image.jpg" alt="An image." />')
!image.jpg(An image.)!

>>> convert('<img src="image.jpg" title="An image." />')
!image.jpg(An image.)!

Closing parentheses in alternative text need to be encoded as HTML entities,
so that they're not interpreted as terminating parentheses.
>>> convert('<img src="smiley.jpg" alt=":)" />')

>>> convert('<img src="!) link.jpg" />')

Test paragraphs:

Test paragraph attributes.
>>> convert('<p class="first">Paragraph.</p><p class="second">Paragraph.</p>')
p(first). Paragraph.
p(second). Paragraph.

Test sane handling of paragraphs enclosed in links.
>>> convert('<a href=""><p><em>Example.</em></p></a>')

Test <p /> tags.
>>> convert('Paragraph 1.<p/>Paragraph 2.')
Paragraph 1.
Paragraph 2.

>>> convert('<p><p/>Some text. <p/> Indented text. </p>')
Some text.
Indented text.

Textile doesn't support nested paragraph, only use the innermost one.
>>> convert('<p class="container"></p><p class="text">Text.</p>')
p(text). Text.

A more extensive test of nested paragraphs.
>>> convert('<div class="body"> <div class="body"> <h4>Heading.</h4> <p>Paragraph.<p></div></div>')
h4. Heading.

And blockquotes.
>>> convert('<blockquote><p>Paragraph text.</p><blockquote>Quoted text.</blockquote></blockquote>')
Paragraph text.
bq. Quoted text.

Test line breaks:

>>> convert('Two<br />lines.')

>>> convert('<strong><br />Text.</strong>')

>>> convert('<strong><em>Two<br/>lines.</em></strong>')

>>> convert('<em>Emphasised text.<br/> Indented emphasised text.</em>')
_Emphasised text._
_Indented emphasised text._

>>> convert('<em>Emphasised text.<br /></em><br />More text.')
_Emphasised text._
More text.

Ensure sole images aren't unnecessarily indented.
>>> convert('<div>Text.</div><div><img src="image.jpg" /></div>')

Test moving of simple tags inside links:

>>> convert('<strong><a href="">Link.</a></strong>')

>>> convert('<em><strong><a href="">Link.</a></strong></em>')

Test that tag attributes do not interfere with the process of moving them
inside. Unfortunately, we're not preserving the attributes.
>>> convert('<p><span style="color: navy"><span><a href="">Link</a></span></span></p>')

Test classes and IDs.
>>> convert('<p class="some_class" id="some_id">Some text.</p>')
p(some_class #some_id). Some text.

Test nested identical tags.
>>> convert('<span><span>Text.</span></span>')

Textile doesn't support this either.
>>> convert('<span>Some text <span>inside</span> a span.</span>')
%Some text inside a span.%

Paragraphs inside list items cannot be represented in Textile
>>> convert('<ol><li><p>Paragraph.</p></li></ol>')
# Paragraph.

Test a complicated shambles of nested lists.
>>> convert('<ol><li><ol><li><ol type="i"><li>Three.</li></ol></li></ol></li></ol>')
### Three.


#>>> convert('<p>---<em><br />*Note.</em>')

Ending simple tags after links.
#>>> convert('<em>Click <a href="">here.</a></em>')
_Click ["here":]_

Spaceless mark-up.
#>>> convert('L<sup>A</sup>T<sub>E</sub>X')

More than one paragraph inside a link (invalid HTML).
#>>> convert('<a href=""><p class="first">1</p><p class="second">2</p></a>')


if __name__ == "__main__":
    import doctest
    from html2textile import html2textile
    def convert(text):
        print html2textile(text)