Commits

Waylan Limberg committed 9aca193

Reorganized docs. Added an AUTHORS and INSTALL files. INSTALL is incomplete.

Comments (0)

Files changed (14)

CHANGE_LOG.txt

-PYTHON MARKDOWN CHANGELOG
-=========================
-
-August 2008: Updated included extensions to ElementTree. Added a 
-seperate commanline script. (v2.0-alpha)
-
-July 2008: Switched from home-grown NanoDOM to ElementTree and
-various related bugs (thanks Artem Yunusov).
-
-June 2008: Fixed issues with nested inline patterns and cleaned 
-up testing framework (thanks Artem Yunusov).
-
-May 2008: Added a number of additional extensions to the
-distribution and other minor changes. Moved repo to git from svn.
-
-Mar 2008: Refactored extension api to accept either an 
-extension name (as a string) or an instance of an extension
-(Thanks David Wolever). Fixed various bugs and added doc strings.
-
-Feb 2008: Various bugfixes mostly regarding extensions.
-
-Feb 18, 2008: Version 1.7.
-
-Feb 13, 2008: A little code cleanup and better documentation
-and inheritance for pre/post proccessors.
-
-Feb 9, 2008: Doublequotes no longer html escaped and rawhtml
-honors <?foo>, <@foo>, and <%foo> for those who run markdown on
-template syntax.
-
-Dec 12, 2007: Updated docs. Removed encoding arg from Markdown
-and markdown as per list discussion. Clean up in prep for 1.7.
-
-Nov 29, 2007: Added support for images inside links. Also fixed
-a few bugs in the footnote extension.
-
-Nov 19, 2007: `message` now uses python's logging module. Also removed 
-limit imposed by recursion in _process_section(). You can now parse as 
-long of a document as your memory can handle.
-
-Nov 5, 2007: Moved safe_mode code to a textPostprocessor and added 
-escaping option.
-
-Nov 3, 2007: Fixed convert method to accept empty strings.
-
-Oct 30, 2007: Fixed BOM removal (thanks Malcolm Tredinnick). Fixed 
-infinite loop in bracket regex for inline links.
-
-Oct 11, 2007: LineBreaks is now an inlinePattern. Fixed HR in 
-blockquotes. Refactored _processSection method (see tracker #1793419).
-
-Oct 9, 2007: Added textPreprocessor (from 1.6b).
-
-Oct 8, 2008: Fixed Lazy Blockquote. Fixed code block on first line. 
-Fixed empty inline image link.
-
-Oct 7, 2007: Limit recursion on inlinePatterns. Added a 'safe' tag 
-to htmlStash.
-
-March 18, 2007: Fixed or merged a bunch of minor bugs, including
-multi-line comments and markup inside links. (Tracker #s: 1683066,
-1671153, 1661751, 1627935, 1544371, 1458139.) -> v. 1.6b
-
-Oct 10, 2006: Fixed a bug that caused some text to be lost after
-comments.  Added "safe mode" (user's html tags are removed).
-
-Sept 6, 2006: Added exception for PHP tags when handling html blocks.
-
-August 7, 2006: Incorporated Sergej Chodarev's patch to fix a problem
-with ampersand normalization and html blocks.
-
-July 10, 2006: Switched to using optparse.  Added proper support for
-unicode.
-
-July 9, 2006: Fixed the <!--@address.com> problem (Tracker #1501354).  
-
-May 18, 2006: Stopped catching unquoted titles in reference links.
-Stopped creating blank headers.
-
-May 15, 2006: A bug with lists, recursion on block-level elements,
-run-in headers, spaces before headers, unicode input (thanks to Aaron
-Swartz). Sourceforge tracker #s: 1489313, 1489312, 1489311, 1488370,
-1485178, 1485176. (v. 1.5)
-
-Mar. 24, 2006: Switched to a not-so-recursive algorithm with
-_handleInline.  (Version 1.4)
-
-Mar. 15, 2006: Replaced some instance variables with class variables
-(a patch from Stelios Xanthakis).  Chris Clark's new regexps that do
-not trigger midword underlining.
-
-Feb. 28, 2006: Clean-up and command-line handling by Stewart
-Midwinter. (Version 1.3)
-
-Feb. 24, 2006: Fixed a bug with the last line of the list appearing
-again as a separate paragraph.  Incorporated Chris Clark's "mailto"
-patch.  Added support for <br /> at the end of lines ending in two or
-more spaces.  Fixed a crashing bug when using ImageReferencePattern.
-Added several utility methods to Nanodom.  (Version 1.2)
-
-Jan. 31, 2006: Added "hr" and "hr/" to BLOCK_LEVEL_ELEMENTS and
-changed <hr/> to <hr />.  (Thanks to Sergej Chodarev.)
-
-Nov. 26, 2005: Fixed a bug with certain tabbed lines inside lists
-getting wrapped in <pre><code>.  (v. 1.1)
-
-Nov. 19, 2005: Made "<!...", "<?...", etc. behave like block-level
-HTML tags.
-
-Nov. 14, 2005: Added entity code and email autolink fix by Tiago
-Cogumbreiro.  Fixed some small issues with backticks to get 100%
-compliance with John's test suite.  (v. 1.0)
-
-Nov. 7, 2005: Added an unlink method for documents to aid with memory
-collection (per Doug Sauder's suggestion).
-
-Oct. 29, 2005: Restricted a set of html tags that get treated as
-block-level elements.
-
-Sept. 18, 2005: Refactored the whole script to make it easier to
-customize it and made footnote functionality into an extension.
-(v. 0.9)
-
-Sept. 5, 2005: Fixed a bug with multi-paragraph footnotes.  Added
-attribute support.
-
-Sept. 1, 2005: Changed the way headers are handled to allow inline
-syntax in headers (e.g. links) and got the lists to use p-tags
-correctly (v. 0.8)
-
-Aug. 29, 2005: Added flexible tabs, fixed a few small issues, added
-basic support for footnotes.  Got rid of xml.dom.minidom and added
-pretty-printing. (v. 0.7)
-
-Aug. 13, 2005: Fixed a number of small bugs in order to conform to the
-test suite.  (v. 0.6)
-
-Aug. 11, 2005: Added support for inline html and entities, inline
-images, autolinks, underscore emphasis. Cleaned up and refactored the
-code, added some more comments.
-
-Feb. 19, 2005: Rewrote the handling of high-level elements to allow
-multi-line list items and all sorts of nesting.
-
-Feb. 3, 2005: Reference-style links, single-line lists, backticks,
-escape, emphasis in the beginning of the paragraph.
-
-Nov. 2004: Added links, blockquotes, html blocks to Manfred
-Stienstra's code
-
-Apr. 2004: Manfred's version at http://www.dwerg.net/projects/markdown/
-

MANIFEST

-README.txt
-README.html
-CHANGE_LOG.txt
-markdown.py
-mdx_codehilite.py
-mdx_fenced_code.py
-mdx_footnotes.py
-mdx_headerid.py
-mdx_imagelinks.py
-mdx_meta.py
-mdx_rss.py
-mdx_tables.py
-mdx_wikilink.py
-setup.py
-scripts/pymarkdown.py

README.html

-<h1><a href="http://freewisdom.org/projects/python-markdown">Python-Markdown</a></h1>
-<p>This is a Python implementation of John Gruber's <a href="http://daringfireball.net/projects/markdown/">Markdown</a>. 
-   It is almost completely compliant with the reference implementation,
-   though there are a few known issues. See <a href="http://www.freewisdom.org/projects/python-markdown/Features">Features</a> for information 
-   on what exactly is supported and what is not. Additional features are 
-   supported by the <a href="http://www.freewisdom.org/projects/python-markdown/Available_Extensions">Available Extensions</a>.
-</p>
-
-<h2>Installation</h2>
-<p>To install Python Markdown <a href="http://sourceforge.net/project/showfiles.php?group_id=153041">download</a> the zip file and extract the 
-   files.  If you want to install markdown as a module into your python 
-   tree, run <code>sudo python setup.py install</code> from a directory where you 
-   unzip the files.
-</p>
-
-<h2>Command Line Usage</h2>
-<p>To use markdown.py from the command line, run it as 
-</p>
-<pre><code>python markdown.py &lt;input_file&gt;
-</code></pre><p>or 
-</p>
-<pre><code>python markdown.py &lt;input_file&gt; &gt; &lt;output_file&gt;
-</code></pre><p>For more details, use the <code>-h</code> or <code>--help</code> options from the command line 
-   or read the <a href="http://www.freewisdom.org/projects/python-markdown/Command_Line">Command Line Docs</a> available online.
-</p>
-
-<h2>Using as a Python Module</h2>
-<p>To use markdown as a module:
-</p>
-<pre><code>import markdown
-html = markdown.markdown(your_text_string)
-</code></pre><p>For more details see the <a href="http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module">Module Docs</a>.
-</p>
-
-<h2>Support</h2>
-<p>You may ask for help and discuss various other issues on the <a href="http://lists.sourceforge.net/lists/listinfo/python-markdown-discuss">mailing list</a> and report bugs on the <a href="http://sourceforge.net/tracker/?func=add&amp;group_id=153041&amp;atid=790198">bug tracker</a>.
-</p>
-
-<h2>Credits</h2>
-<ul>
- <li>
-     Most of the code currently in the module was written by <a href="http://www.freewisdom.org">Yuri Takhteyev</a>
-  while procrastinating from his Ph.D.
- </li>
-
- <li>
-     The original version of this script was written by <a href="http://www.dwerg.net/">Manfred Stienstra</a>,
-  who is responsible for about a quarter of the code.
- </li>
-
- <li>
-     Many recent bugs are being fixed by <a href="http://achinghead.com/">Waylan Limberg</a>.
- </li>
-</ul>
-<p>Other contributions:
-</p>
-<ul>
- <li>
-     Daniel Krech provided the setup.py script.
- </li>
-
- <li>
-     G. Clark Haynes submitted a patch for indented lists.
- </li>
-
- <li>
-     Tiago Cogumbreiro submitted an email autolink fix.
- </li>
-
- <li>
-     Sergej Chodarev submitted a patch for treatment of <code>&lt;hr/&gt;</code> tags.
- </li>
-
- <li>
-     Chris Clark submitted a patch to handle <code>&lt;mailto:...&gt;</code> syntax and a reg ex 
-  for &quot;smart&quot; emphasis (ignoring underscores within a word).
- </li>
-
- <li>
-     Steward Midwinter wrote command-line parser and cleaned up comments.
- </li>
-
- <li>
-     Many other people helped by reporting bugs.
- </li>
-</ul>
-
-<h2>License</h2>
-<p>The code is dual-licensed under <a href="http://www.gnu.org/copyleft/gpl.html)">GPL</a> and <a href="http://www.opensource.org/licenses/bsd-license.php">BSD License</a>.  Other
-   licensing arrangements can be discussed.
-</p>

README.txt

-[Python-Markdown][]
-===================
-
-This is a Python implementation of John Gruber's [Markdown][]. 
-It is almost completely compliant with the reference implementation,
-though there are a few known issues. See [Features][] for information 
-on what exactly is supported and what is not. Additional features are 
-supported by the [Available Extensions][].
-
-[Python-Markdown]: http://freewisdom.org/projects/python-markdown
-[Markdown]: http://daringfireball.net/projects/markdown/
-[Features]: http://www.freewisdom.org/projects/python-markdown/Features
-[Available Extensions]: http://www.freewisdom.org/projects/python-markdown/Available_Extensions
-
-
-Installation
-------------
-
-To install Python Markdown [download][] the zip file and extract the 
-files.  If you want to install markdown as a module into your python 
-tree, run `sudo python setup.py install` from a directory where you 
-unzip the files.
-
-[download]: http://sourceforge.net/project/showfiles.php?group_id=153041
-
-
-Command Line Usage
-------------------
-
-To use markdown.py from the command line, run it as 
-
-    python markdown.py <input_file>
-
-or 
-
-    python markdown.py <input_file> > <output_file>
-
-For more details, use the `-h` or `--help` options from the command line 
-or read the [Command Line Docs][] available online.
-
-[Command Line Docs]: http://www.freewisdom.org/projects/python-markdown/Command_Line
-
-
-
-Using as a Python Module
-------------------------
-
-To use markdown as a module:
-
-    import markdown
-    html = markdown.markdown(your_text_string)
-
-For more details see the [Module Docs][].
-
-[Module Docs]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module
-
-Support
--------
-
-You may ask for help and discuss various other issues on the [mailing list][] and report bugs on the [bug tracker][].
-
-[mailing list]: http://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
-[bug tracker]: http://sourceforge.net/tracker/?func=add&group_id=153041&atid=790198
-
-
-Credits
--------
-
-* Most of the code currently in the module was written by [Yuri Takhteyev][]
-  while procrastinating from his Ph.D.
-* The original version of this script was written by [Manfred Stienstra][],
-  who is responsible for about a quarter of the code.
-* Many recent bugs are being fixed by [Waylan Limberg][].
-
-Other contributions:
-
-* Daniel Krech provided the setup.py script.
-* G. Clark Haynes submitted a patch for indented lists.
-* Tiago Cogumbreiro submitted an email autolink fix.
-* Sergej Chodarev submitted a patch for treatment of `<hr/>` tags.
-* Chris Clark submitted a patch to handle `<mailto:...>` syntax and a reg ex 
-  for "smart" emphasis (ignoring underscores within a word).
-* Steward Midwinter wrote command-line parser and cleaned up comments.
-* Many other people helped by reporting bugs.
-
-[Yuri Takhteyev]: http://www.freewisdom.org
-[Manfred Stienstra]: http://www.dwerg.net/
-[Waylan Limberg]: http://achinghead.com/
-
-
-License
--------
-
-The code is dual-licensed under [GPL][] and [BSD License][].  Other
-licensing arrangements can be discussed.
-
-[GPL]: http://www.gnu.org/copyleft/gpl.html) 
-[BSD License]: http://www.opensource.org/licenses/bsd-license.php
+Primary Authors
+===============
+
+Yuri Takteyev <http://freewisdom.org/>, who has written much of the current code
+while procrastingating his Ph.D.
+
+Waylan Limberg <http://achinghead.com/>, who has written most of the available 
+extensions and later was asked to join Yuri, fixing nummrious bugs, adding
+documentation and making general improvements to the existing codebase.
+
+Artem Yunusov, who as part of a 2008 GSoC project, has refactored inline 
+patterns, replaced the NanoDOM with ElementTree support and made various other 
+improvements.
+
+Manfed Stienstra <http://www.dwerg.net/>, who wrote the original version of 
+the script and is responsible for various parts of the existing codebase.
+
+David Wolever, who refactored the extension API and made other improvements
+as he helped to integrate Markdown into Dr.Project.
+
+Other Contributors
+==================
+
+The incomplete list of individuals below have provided patches 
+or otherwise contributed to the project in various ways. We would like to thank
+everyone who has contributed to the progect in any way.
+
+Jeff Balogh
+Sergej Chodarev
+Chris Clark
+Tiago Cogumbreiro
+G. Clark Haynes
+Daniel Krech
+Steward Midwinter
+Malcolm Tredinnick
+and many others to helped by reporting bugs
+PYTHON MARKDOWN CHANGELOG
+=========================
+
+August 2008: Updated included extensions to ElementTree. Added a 
+seperate commanline script. (v2.0-alpha)
+
+July 2008: Switched from home-grown NanoDOM to ElementTree and
+various related bugs (thanks Artem Yunusov).
+
+June 2008: Fixed issues with nested inline patterns and cleaned 
+up testing framework (thanks Artem Yunusov).
+
+May 2008: Added a number of additional extensions to the
+distribution and other minor changes. Moved repo to git from svn.
+
+Mar 2008: Refactored extension api to accept either an 
+extension name (as a string) or an instance of an extension
+(Thanks David Wolever). Fixed various bugs and added doc strings.
+
+Feb 2008: Various bugfixes mostly regarding extensions.
+
+Feb 18, 2008: Version 1.7.
+
+Feb 13, 2008: A little code cleanup and better documentation
+and inheritance for pre/post proccessors.
+
+Feb 9, 2008: Doublequotes no longer html escaped and rawhtml
+honors <?foo>, <@foo>, and <%foo> for those who run markdown on
+template syntax.
+
+Dec 12, 2007: Updated docs. Removed encoding arg from Markdown
+and markdown as per list discussion. Clean up in prep for 1.7.
+
+Nov 29, 2007: Added support for images inside links. Also fixed
+a few bugs in the footnote extension.
+
+Nov 19, 2007: `message` now uses python's logging module. Also removed 
+limit imposed by recursion in _process_section(). You can now parse as 
+long of a document as your memory can handle.
+
+Nov 5, 2007: Moved safe_mode code to a textPostprocessor and added 
+escaping option.
+
+Nov 3, 2007: Fixed convert method to accept empty strings.
+
+Oct 30, 2007: Fixed BOM removal (thanks Malcolm Tredinnick). Fixed 
+infinite loop in bracket regex for inline links.
+
+Oct 11, 2007: LineBreaks is now an inlinePattern. Fixed HR in 
+blockquotes. Refactored _processSection method (see tracker #1793419).
+
+Oct 9, 2007: Added textPreprocessor (from 1.6b).
+
+Oct 8, 2008: Fixed Lazy Blockquote. Fixed code block on first line. 
+Fixed empty inline image link.
+
+Oct 7, 2007: Limit recursion on inlinePatterns. Added a 'safe' tag 
+to htmlStash.
+
+March 18, 2007: Fixed or merged a bunch of minor bugs, including
+multi-line comments and markup inside links. (Tracker #s: 1683066,
+1671153, 1661751, 1627935, 1544371, 1458139.) -> v. 1.6b
+
+Oct 10, 2006: Fixed a bug that caused some text to be lost after
+comments.  Added "safe mode" (user's html tags are removed).
+
+Sept 6, 2006: Added exception for PHP tags when handling html blocks.
+
+August 7, 2006: Incorporated Sergej Chodarev's patch to fix a problem
+with ampersand normalization and html blocks.
+
+July 10, 2006: Switched to using optparse.  Added proper support for
+unicode.
+
+July 9, 2006: Fixed the <!--@address.com> problem (Tracker #1501354).  
+
+May 18, 2006: Stopped catching unquoted titles in reference links.
+Stopped creating blank headers.
+
+May 15, 2006: A bug with lists, recursion on block-level elements,
+run-in headers, spaces before headers, unicode input (thanks to Aaron
+Swartz). Sourceforge tracker #s: 1489313, 1489312, 1489311, 1488370,
+1485178, 1485176. (v. 1.5)
+
+Mar. 24, 2006: Switched to a not-so-recursive algorithm with
+_handleInline.  (Version 1.4)
+
+Mar. 15, 2006: Replaced some instance variables with class variables
+(a patch from Stelios Xanthakis).  Chris Clark's new regexps that do
+not trigger midword underlining.
+
+Feb. 28, 2006: Clean-up and command-line handling by Stewart
+Midwinter. (Version 1.3)
+
+Feb. 24, 2006: Fixed a bug with the last line of the list appearing
+again as a separate paragraph.  Incorporated Chris Clark's "mailto"
+patch.  Added support for <br /> at the end of lines ending in two or
+more spaces.  Fixed a crashing bug when using ImageReferencePattern.
+Added several utility methods to Nanodom.  (Version 1.2)
+
+Jan. 31, 2006: Added "hr" and "hr/" to BLOCK_LEVEL_ELEMENTS and
+changed <hr/> to <hr />.  (Thanks to Sergej Chodarev.)
+
+Nov. 26, 2005: Fixed a bug with certain tabbed lines inside lists
+getting wrapped in <pre><code>.  (v. 1.1)
+
+Nov. 19, 2005: Made "<!...", "<?...", etc. behave like block-level
+HTML tags.
+
+Nov. 14, 2005: Added entity code and email autolink fix by Tiago
+Cogumbreiro.  Fixed some small issues with backticks to get 100%
+compliance with John's test suite.  (v. 1.0)
+
+Nov. 7, 2005: Added an unlink method for documents to aid with memory
+collection (per Doug Sauder's suggestion).
+
+Oct. 29, 2005: Restricted a set of html tags that get treated as
+block-level elements.
+
+Sept. 18, 2005: Refactored the whole script to make it easier to
+customize it and made footnote functionality into an extension.
+(v. 0.9)
+
+Sept. 5, 2005: Fixed a bug with multi-paragraph footnotes.  Added
+attribute support.
+
+Sept. 1, 2005: Changed the way headers are handled to allow inline
+syntax in headers (e.g. links) and got the lists to use p-tags
+correctly (v. 0.8)
+
+Aug. 29, 2005: Added flexible tabs, fixed a few small issues, added
+basic support for footnotes.  Got rid of xml.dom.minidom and added
+pretty-printing. (v. 0.7)
+
+Aug. 13, 2005: Fixed a number of small bugs in order to conform to the
+test suite.  (v. 0.6)
+
+Aug. 11, 2005: Added support for inline html and entities, inline
+images, autolinks, underscore emphasis. Cleaned up and refactored the
+code, added some more comments.
+
+Feb. 19, 2005: Rewrote the handling of high-level elements to allow
+multi-line list items and all sorts of nesting.
+
+Feb. 3, 2005: Reference-style links, single-line lists, backticks,
+escape, emphasis in the beginning of the paragraph.
+
+Nov. 2004: Added links, blockquotes, html blocks to Manfred
+Stienstra's code
+
+Apr. 2004: Manfred's version at http://www.dwerg.net/projects/markdown/
+
+Installing Python-Markdown
+==========================
+
+Checking Dependencies
+---------------------
+
+Python-Markdown requires the ElementTree module to be installed. In Python2.5+ 
+ElementTree is included as part of the standard library. For earlier versions 
+of Python, open a Python shell and type the following:
+
+    >>> import cElementTree
+    >>> import ElementTree
+
+If at least one of those does not generate any errors, then you have a working
+copy of ElementTree installed on your system. As cElementTree is faster, you
+may want install that if you don't already have it and it's available for your
+system.
+
+The East Way
+------------
+
+The simplest way to install Python-Markdown is by using SetupTools. As and
+Admin/Root user on your system do:
+
+    easy_install ElementTree
+    easy_install Markdown
+
+That's it, your done.
+
+Installing on Windows
+---------------------
+
+
+
+Download the Windows installer (.exe) from PyPI:
+
+<http://pypi.python.org/pypi/Markdown>
+
+Doubleclick the file and follow the instructions.
+
+If you preffer to manually install Python-Markdown in Windows, download the
+Zip file, unzip it, and on the command line in the directory you unzipped to:
+
+    python setup.py install
+
+If you plan to use the provided commandline script, you need to make sure your
+script directory is on your system path. On a typical Python install on Windows
+the Scripts directory is `C:\Python25\Scripts\`. Adjust according to your 
+system and add that to your system path.
+
+Installing on *nix Sytems
+-------------------------
+
+From the command line do the following:
+
+    wget http://pypi.python.org/packages/source/M/Markdown/markdown-2.0.tar.gz
+    tar xvzf markdown-2.0.tar.gz
+    cd markdown-2.0/
+    sudo python setup.py install
+markdown.py
+mdx_codehilite.py
+mdx_fenced_code.py
+mdx_footnotes.py
+mdx_headerid.py
+mdx_imagelinks.py
+mdx_meta.py
+mdx_rss.py
+mdx_tables.py
+mdx_wikilink.py
+setup.py
+scripts/pymarkdown.py
+docs/README
+docs/README.html
+docs/CHANGE_LOG
+docs/INSTALL
+docs/AUTHORS
+docs/writing_extensions.txt
+
+[Python-Markdown][]
+===================
+
+This is a Python implementation of John Gruber's [Markdown][]. 
+It is almost completely compliant with the reference implementation,
+though there are a few known issues. See [Features][] for information 
+on what exactly is supported and what is not. Additional features are 
+supported by the [Available Extensions][].
+
+[Python-Markdown]: http://freewisdom.org/projects/python-markdown
+[Markdown]: http://daringfireball.net/projects/markdown/
+[Features]: http://www.freewisdom.org/projects/python-markdown/Features
+[Available Extensions]: http://www.freewisdom.org/projects/python-markdown/Available_Extensions
+
+
+Installation
+------------
+
+To install Python Markdown [download][] the zip file and extract the 
+files.  If you want to install markdown as a module into your python 
+tree, run `sudo python setup.py install` from a directory where you 
+unzip the files.
+
+[download]: http://sourceforge.net/project/showfiles.php?group_id=153041
+
+
+Command Line Usage
+------------------
+
+To use markdown.py from the command line, run it as 
+
+    python markdown.py <input_file>
+
+or 
+
+    python markdown.py <input_file> > <output_file>
+
+For more details, use the `-h` or `--help` options from the command line 
+or read the [Command Line Docs][] available online.
+
+[Command Line Docs]: http://www.freewisdom.org/projects/python-markdown/Command_Line
+
+
+
+Using as a Python Module
+------------------------
+
+To use markdown as a module:
+
+    import markdown
+    html = markdown.markdown(your_text_string)
+
+For more details see the [Module Docs][].
+
+[Module Docs]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module
+
+Support
+-------
+
+You may ask for help and discuss various other issues on the [mailing list][] and report bugs on the [bug tracker][].
+
+[mailing list]: http://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
+[bug tracker]: http://sourceforge.net/tracker/?func=add&group_id=153041&atid=790198
+
+
+Credits
+-------
+
+* Most of the code currently in the module was written by [Yuri Takhteyev][]
+  while procrastinating from his Ph.D.
+* The original version of this script was written by [Manfred Stienstra][],
+  who is responsible for about a quarter of the code.
+* Many recent bugs are being fixed by [Waylan Limberg][].
+
+Other contributions:
+
+* Daniel Krech provided the setup.py script.
+* G. Clark Haynes submitted a patch for indented lists.
+* Tiago Cogumbreiro submitted an email autolink fix.
+* Sergej Chodarev submitted a patch for treatment of `<hr/>` tags.
+* Chris Clark submitted a patch to handle `<mailto:...>` syntax and a reg ex 
+  for "smart" emphasis (ignoring underscores within a word).
+* Steward Midwinter wrote command-line parser and cleaned up comments.
+* Many other people helped by reporting bugs.
+
+[Yuri Takhteyev]: http://www.freewisdom.org
+[Manfred Stienstra]: http://www.dwerg.net/
+[Waylan Limberg]: http://achinghead.com/
+
+
+License
+-------
+
+The code is dual-licensed under [GPL][] and [BSD License][].  Other
+licensing arrangements can be discussed.
+
+[GPL]: http://www.gnu.org/copyleft/gpl.html) 
+[BSD License]: http://www.opensource.org/licenses/bsd-license.php
+<h1><a href="http://freewisdom.org/projects/python-markdown">Python-Markdown</a></h1>
+<p>This is a Python implementation of John Gruber's <a href="http://daringfireball.net/projects/markdown/">Markdown</a>. 
+   It is almost completely compliant with the reference implementation,
+   though there are a few known issues. See <a href="http://www.freewisdom.org/projects/python-markdown/Features">Features</a> for information 
+   on what exactly is supported and what is not. Additional features are 
+   supported by the <a href="http://www.freewisdom.org/projects/python-markdown/Available_Extensions">Available Extensions</a>.
+</p>
+
+<h2>Installation</h2>
+<p>To install Python Markdown <a href="http://sourceforge.net/project/showfiles.php?group_id=153041">download</a> the zip file and extract the 
+   files.  If you want to install markdown as a module into your python 
+   tree, run <code>sudo python setup.py install</code> from a directory where you 
+   unzip the files.
+</p>
+
+<h2>Command Line Usage</h2>
+<p>To use markdown.py from the command line, run it as 
+</p>
+<pre><code>python markdown.py &lt;input_file&gt;
+</code></pre><p>or 
+</p>
+<pre><code>python markdown.py &lt;input_file&gt; &gt; &lt;output_file&gt;
+</code></pre><p>For more details, use the <code>-h</code> or <code>--help</code> options from the command line 
+   or read the <a href="http://www.freewisdom.org/projects/python-markdown/Command_Line">Command Line Docs</a> available online.
+</p>
+
+<h2>Using as a Python Module</h2>
+<p>To use markdown as a module:
+</p>
+<pre><code>import markdown
+html = markdown.markdown(your_text_string)
+</code></pre><p>For more details see the <a href="http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module">Module Docs</a>.
+</p>
+
+<h2>Support</h2>
+<p>You may ask for help and discuss various other issues on the <a href="http://lists.sourceforge.net/lists/listinfo/python-markdown-discuss">mailing list</a> and report bugs on the <a href="http://sourceforge.net/tracker/?func=add&amp;group_id=153041&amp;atid=790198">bug tracker</a>.
+</p>
+
+<h2>Credits</h2>
+<ul>
+ <li>
+     Most of the code currently in the module was written by <a href="http://www.freewisdom.org">Yuri Takhteyev</a>
+  while procrastinating from his Ph.D.
+ </li>
+
+ <li>
+     The original version of this script was written by <a href="http://www.dwerg.net/">Manfred Stienstra</a>,
+  who is responsible for about a quarter of the code.
+ </li>
+
+ <li>
+     Many recent bugs are being fixed by <a href="http://achinghead.com/">Waylan Limberg</a>.
+ </li>
+</ul>
+<p>Other contributions:
+</p>
+<ul>
+ <li>
+     Daniel Krech provided the setup.py script.
+ </li>
+
+ <li>
+     G. Clark Haynes submitted a patch for indented lists.
+ </li>
+
+ <li>
+     Tiago Cogumbreiro submitted an email autolink fix.
+ </li>
+
+ <li>
+     Sergej Chodarev submitted a patch for treatment of <code>&lt;hr/&gt;</code> tags.
+ </li>
+
+ <li>
+     Chris Clark submitted a patch to handle <code>&lt;mailto:...&gt;</code> syntax and a reg ex 
+  for &quot;smart&quot; emphasis (ignoring underscores within a word).
+ </li>
+
+ <li>
+     Steward Midwinter wrote command-line parser and cleaned up comments.
+ </li>
+
+ <li>
+     Many other people helped by reporting bugs.
+ </li>
+</ul>
+
+<h2>License</h2>
+<p>The code is dual-licensed under <a href="http://www.gnu.org/copyleft/gpl.html)">GPL</a> and <a href="http://www.opensource.org/licenses/bsd-license.php">BSD License</a>.  Other
+   licensing arrangements can be discussed.
+</p>

docs/writing_extensions.txt

+### Overview
+
+Python-Markdown includes an API for extension writers to plug their own 
+custom functionality and/or syntax into the parser. There are preprocessors
+which allow you to alter the source before it is passed to the parser, 
+inline patterns which allow you to add, remove or override the syntax of
+any inline elements, and postprocessors which allow munging of the
+output of the parser before it is returned.
+
+As the parser builds an [ElementTree][] DOM object which is later rendered 
+as Unicode text, there are also some helpers provided to make manipulation of 
+the DOM tree easier. Each part of the API is discussed in its respective 
+section below. You may find reading the source of some [[existing extensions]] 
+helpful as well. For example, the [[footnote]] extension uses most of the 
+features documented here.
+
+* [Preprocessors][]
+    * [TextPreprocessors][]
+    * [Line Preprocessors][]
+* [InlinePatterns][]
+* [Postprocessors][]
+    * [DOM Postprocessors][]
+    * [TextProstprocessors][]
+* [Working with the DOM][]
+* [Integrating your code into Markdown][]
+    * [extendMarkdown][]
+    * [Config Settings][]
+    * [makeExtension][]
+
+<h3 id="preprocessors">Preprocessors</h3>
+
+Preprocessors munge the source text before it is passed into the Markdown 
+core. This is an excellent place to clean up bad syntax, extract things the 
+parser may otherwise choke on and perhaps even store it for later retrieval.
+
+There are two types of preprocessors: [TextPreprocessors][] and 
+[Line Preprocessors][].
+
+<h4 id="textpreprocessors">TextPreprocessors</h4>
+
+TextPreprocessors should inherit from `markdown.TextPreprocessor` and implement
+a `run` method with one argument `text`. The `run` method of each 
+TextPreprocessor will be passed the entire source text as a single Unicode
+string and should either return that single Unicode string, or an altered
+version of it.
+
+For example, a simple TextPreprocessor that normalizes newlines [^1] might look
+like this:
+
+    class NormalizePreprocessor(markdown.TextPreprocessor):
+        def run(self, text):
+            return text.replace("\r\n", "\n").replace("\r", "\n")
+
+[^1]: It should be noted that Markdown already normalizes newlines. This 
+example is for illustrative purposes only.
+
+<h4 id="linepreprocessors">Line Preprocessors</h4>
+
+Line Preprocessors should inherit from `markdown.Preprocessor` and implement 
+a `run` method with one argument `lines`. The `run` method of each Line
+Preprocessor will be passed the entire source text as a list of Unicode strings.
+Each string will contain one line of text. The `run` method should return
+either that list, or an altered list of Unicode strings.
+
+A pseudo example:
+
+    class MyPreprocessor(markdown.Preprocessor):
+        def run(self, lines):
+            new_lines = []
+            for line in lines:
+                m = MYREGEX.match(line)
+                if m:
+                    # do stuff
+                else:
+                    new_lines.append(line)
+            return new_lines
+
+<h3 id="inlinepatterns">Inline Patterns</h3>
+
+Inline Patterns implement the inline HTML element syntax for Markdown such as
+`*emphasis*` or `[links](http://example.com)`. Pattern objects should be 
+instances of classes that inherit from `markdown.Pattern` or one of its 
+children. Each pattern object uses a single regular expression and must have 
+the following methods:
+
+* `getCompiledRegExp()`: Returns a compiled regular expression.
+* `handleMatch(m)`: Accepts a match object and returns an ElementTree
+element of a plain Unicode string.
+
+Note that any regular expression returned by `getCompiledRegExp` must capture
+the whole block. Therefore, they should all start with `r'^(.*?)'` and end
+with `r'(.*?)!'. When using the default `getCompiledRegExp()` method provided 
+in the `Pattern` you can pass in a regular expression without that and 
+`getCompiledRegExp` will wrap your expression for you. This means that the first
+group of your match will be `m.group(2)` as `m.group(1)` will match everything 
+before the pattern.
+
+For an example, consider this simplified emphasis pattern:
+
+    class EmphasisPattern(markdown.Pattern):
+        def handleMatch(self, m):
+            el = markdown.etree.Element('em')
+            el.text = m.group(3)
+            return el
+
+As discussed in [Integrating Your Code Into Markdown][], an instance of this
+class will need to be provided to Markdown. That instance would be created
+like so:
+
+    # an oversimplified regex
+    MYPATTERN = r'\*([^*]+)\*'
+    # pass in pattern and create instance
+    emphasis = EmphasisPattern(MYPATTERN)
+
+Actually it would not be necessary to create that pattern (and not just because
+a more sophisticated emphasis pattern already exists in Markdown). The fact is,
+that example pattern is not very DRY. A pattern for `**strong**` text would
+be almost identical, with the exception that it would create a 'strong' element.
+Therefore, Markdown provides a number of generic pattern classes that can 
+provide some common functionality. For example, both emphasis and strong are
+implemented with separate instances of the `SimpleTagPettern` listed below. 
+Feel free to use or extend any of these Pattern classes.
+
+**Generic Pattern Classes**
+
+* `SimpleTextPattern(pattern)`:
+
+    Returns simple text of `group(2)` of a `pattern`.
+
+* `SimpleTagPattern(pattern, tag)`:
+
+    Returns an element of type "`tag`" with a text attribute of `group(3)`
+    of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
+
+* `SubstituteTagPattern(pattern, tag)`:
+
+    Returns an element of type "`tag`" with no children or text (i.e.: 'br').
+
+There may be other Pattern classes in the Markdown source that you could extend
+or use as well. Read through the source and see if there is anything you can 
+use. You might even get a few ideas for different approaches to your specific
+situation.
+
+<h3 id="postprocessors">Postprocessors</h3>
+
+Postprocessors manipulate a document after it has passed through the Markdown 
+core. This is were stored text gets added back in such as a list of footnotes, 
+a table of contents or raw html.
+
+There are two types of postprocessors: [DOM Postprocessors][] and 
+[TextPostprocessors][].
+
+<h4 id="dompostprocessors">DOM Postprocessors</h4>
+
+A DOM Postprocessor should inherit from `markdown.Postprocessor` and over-ride
+the `run` method which takes one argument `root` and should return either
+that root element or a modified root element.
+
+A pseudo example:
+
+    class MyPostprocessor(markdown.Postprocessor):
+    def run(self, root):
+        #do stufff
+        return my_modified_root
+
+For specifics on manipulating the DOM, see [Working with the DOM][] below.
+
+<h4 id="textpostprocessors">TextPostprocessors</h4>
+
+A TextPostprocessor should inherit from `markdown.TextPostprocessor` and
+over-ride the `run` method which takes one argument `text` and returns a
+Unicode string.
+
+TextPostprocessors are run after the DOM has been serialized back into Unicode
+text.  For example, this may be an appropriate place to add a table of contents
+to a document:
+
+    class TocTextPostprocessor(markdown.TextPostprocessor):
+    def run(self, text):
+        return MYMARKERRE.sub(MyToc, text)
+
+<h3 id="working_with_dom">Working with the DOM</h3>
+
+As mentioned, the Markdown parser converts a source document to an 
+[ElementTree][] DOM object before serializing that back to Unicode text. 
+Markdown has provided some helpers to ease that manipulation within the context 
+of the Markdown module...
+
+<h3 id="integrating_into_markdown">Integrating Your Code Into Markdown
+
+Once you have the various pieces of your extension built, you need to tell 
+Markdown about them and ensure that they are run in the proper sequence. 
+Markdown accepts a `Extension` instance for each extension. Therefore, you
+will need to define a class that extends `markdown.Extension` and over-rides
+the `extendMarkdown` method. Within this class you will manage configuration 
+options for your extension and attach the various processors and patterns to 
+the Markdown instance. 
+
+It is important to note that the order of the various processors and patterns 
+matters. For example, if we replace `http://...` links with <a> elements, and 
+*then* try to deal with  inline html, we will end up with a mess. Therefore, 
+the various types of processors and patterns are stored within an instance of 
+the Markdown class within lists. Your `Extension` class will need to manipulate
+those lists appropriately. You may insert instances of your processors and
+patterns into the appropriate location in a list, remove a built-in instances,
+or replace a built-in instance with your own.
+
+<h4 id="extendmarkdown">`extendMarkdown`</h4>
+
+The `extendMarkdown` method of a `markdown.Extension` class accepts two 
+arguments:
+
+* `md`:
+
+    A pointer to the instance of the Markdown class. You should use this to 
+    access the lists of processors and patterns. They are found under the 
+    following attributes:
+
+    * `md.textPreprocessors`
+    * `md.preprocessors`
+    * `md.inlinePatterns`
+    * `md.postpreprocessors`
+    * `md.textPostprocessors`
+
+    Some other things you may want to access in the markdown instance are:
+
+    * `md.inlineStash`
+    * `md.htmlStash`
+    * `md.registerExtension()`
+
+* `md_globals`
+
+    Contains all the various global variables within the markdown module.
+
+Of course, with access to those items, theoretically you have the option to 
+changing anything through various monkeypatching techniques. However, you should
+be aware that the various undocumented or private parts of markdown may change
+without notice and your monkeypatches may no longer work. Therefore, what you
+really should be doing is inserting processors and patterns into the markdown
+pipeline.
+
+<h4 id="configsettings">Config Settings</h4>
+
+If an extension uses any parameters that the user may want to change,
+those parameters should be stored in `self.config` of your `markdown.Extension`
+class in the following format:
+
+    self.config = {parameter_1_name : [value1, description1],
+                   parameter_2_name : [value2, description2] }
+
+When stored this way the config parameters can be over-ridden from the
+command line or at the time Markdown is initiated:
+
+    markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt
+
+Note that parameters should always be assumed to be set to string
+values, and should be converted at run time. For example:
+
+    i = int(self.getConfig("SOME_PARAM"))
+
+<h4 id="makeextension">`makeExtension`</h4>
+
+Each extension should ideally be placed in its own module starting
+with the  ``mdx_`` prefix (e.g. ``mdx_footnotes.py``).  The module must
+provide a module-level function called ``makeExtension`` that takes
+an optional parameter consisting of a dictionary of configuration over-rides 
+and returns an instance of the extension.  An example from the footnote extension:
+
+    def makeExtension(configs=None) :
+        return FootnoteExtension(configs=configs)
+
+By following the above example, when Markdown is passed the name of your 
+extension as a string (i.e.: ``'footnotes'``), it will automatically import
+the module and call the ``makeExtension`` function initiating your extension.
+
+However, Markdown will also accept an already existing instance of an extension.For example:
+
+    import markdown, mdx_myextension
+    configs = {...}
+    myext = mdx_myextension.MyExtension(configs=configs)
+    md = markdown.Markdown(extensions=[myext])
+
+This is useful if you need to implement a large number of extensions with more
+than one residing in a module.
+
+[Preprocessors]: #preprocessors
+[TextPreprocessors]: #textpreprocessors
+[Line Preprocessors]: #linepreprocessors
+[InlinePatterns]: #inlinepatterns
+[Postprocessors]: #postprocessors
+[DOM Postprocessors]: #dompostprocessors
+[TextProstprocessors]: #textpostprocessors
+[Working with the DOM]: #working_with_dom
+[Integrating your code into Markdown]: #integrating_into_markdown
+[extendMarkdown]: #extendmarkdown
+[Config Settings]: #configsettings
+[makeExtension]: #makeextension
+

odt2txt.py

-"""
-ODT2TXT
-=======
-
-ODT2TXT convers files in Open Document Text format (ODT) into
-Markdown-formatted plain text.
-
-Writteby by [Yuri Takhteyev](http://www.freewisdom.org).
-
-Project website: http://www.freewisdom.org/projects/python-markdown/odt2txt.php
-Contact: yuri [at] freewisdom.org
-
-License: GPL 2 (http://www.gnu.org/copyleft/gpl.html) or BSD
-
-Version: 0.1 (April 7, 2006)
-
-"""
-
-
-
-import sys, zipfile, xml.dom.minidom
-
-IGNORED_TAGS = ["office:annotation"]
-
-FOOTNOTE_STYLES = ["Footnote"]
-
-
-class TextProps :
-    """ Holds properties for a text style. """
-
-    def __init__ (self):
-        
-        self.italic = False
-        self.bold = False
-        self.fixed = False
-
-    def setItalic (self, value) :
-        if value == "italic" :
-            self.italic = True
-
-    def setBold (self, value) :
-        if value == "bold" :
-            self.bold = True
-
-    def setFixed (self, value) :
-        self.fixed = value
-
-    def __str__ (self) :
-
-        return "[i=%s, h=i%s, fixed=%s]" % (str(self.italic),
-                                          str(self.bold),
-                                          str(self.fixed))
-
-class ParagraphProps :
-    """ Holds properties of a paragraph style. """
-
-    def __init__ (self):
-
-        self.blockquote = False
-        self.headingLevel = 0
-        self.code = False
-        self.title = False
-        self.indented = 0
-
-    def setIndented (self, value) :
-        self.indented = value
-
-    def setHeading (self, level) :
-        self.headingLevel = level
-
-    def setTitle (self, value):
-        self.title = value
-
-    def setCode (self, value) :
-        self.code = value
-
-
-    def __str__ (self) :
-
-        return "[bq=%s, h=%d, code=%s]" % (str(self.blockquote),
-                                           self.headingLevel,
-                                           str(self.code))
-
-
-class ListProperties :
-    """ Holds properties for a list style. """
-
-    def __init__ (self):
-        self.ordered = False
- 
-    def setOrdered (self, value) :
-        self.ordered = value
-
-
-    
-class OpenDocumentTextFile :
-
-
-    def __init__ (self, filepath) :
-        self.footnotes = []
-        self.footnoteCounter = 0
-        self.textStyles = {"Standard" : TextProps()}
-        self.paragraphStyles = {"Standard" : ParagraphProps()}
-        self.listStyles = {}
-        self.fixedFonts = []
-        self.hasTitle = 0
-
-        self.load(filepath)
-        
-
-    def processFontDeclarations (self, fontDecl) :
-        """ Extracts necessary font information from a font-declaration
-            element.
-            """
-        for fontFace in fontDecl.getElementsByTagName("style:font-face") :
-            if fontFace.getAttribute("style:font-pitch") == "fixed" :
-                self.fixedFonts.append(fontFace.getAttribute("style:name"))
-        
-
-
-    def extractTextProperties (self, style, parent=None) :
-        """ Extracts text properties from a style element. """
-        
-        textProps = TextProps()
-        
-        if parent :
-            parentProp = self.textStyles.get(parent, None)
-            if parentProp :
-                textProp = parentProp
-            
-        textPropEl = style.getElementsByTagName("style:text-properties")
-        if not textPropEl : return textProps
-        
-        textPropEl = textPropEl[0]
-
-        italic = textPropEl.getAttribute("fo:font-style")
-        bold = textPropEl.getAttribute("fo:font-weight")
-
-        textProps.setItalic(italic)
-        textProps.setBold(bold)
-
-        if textPropEl.getAttribute("style:font-name") in self.fixedFonts :
-            textProps.setFixed(True)
-
-        return textProps
-
-    def extractParagraphProperties (self, style, parent=None) :
-        """ Extracts paragraph properties from a style element. """
-
-        paraProps = ParagraphProps()
-
-        name = style.getAttribute("style:name")
-
-        if name.startswith("Heading_20_") :
-            level = name[11:]
-            try :
-                level = int(level)
-                paraProps.setHeading(level)
-            except :
-                level = 0
-
-        if name == "Title" :
-            paraProps.setTitle(True)
-        
-        paraPropEl = style.getElementsByTagName("style:paragraph-properties")
-        if paraPropEl :
-            paraPropEl = paraPropEl[0]
-            leftMargin = paraPropEl.getAttribute("fo:margin-left")
-            if leftMargin :
-                try :
-                    leftMargin = float(leftMargin[:-2])
-                    if leftMargin > 0.01 :
-                        paraProps.setIndented(True)
-                except :
-                    pass
-
-        textProps = self.extractTextProperties(style)
-        if textProps.fixed :
-            paraProps.setCode(True)
-
-        return paraProps
-    
-
-    def processStyles(self, styleElements) :
-        """ Runs through "style" elements extracting necessary information.
-            """
-
-        for style in styleElements :
-
-            name = style.getAttribute("style:name")
-
-            if name == "Standard" : continue
-
-            family = style.getAttribute("style:family")
-            parent = style.getAttribute("style:parent-style-name")
-
-            if family == "text" : 
-                self.textStyles[name] = self.extractTextProperties(style,
-                                                                   parent)
-
-            elif family == "paragraph":
-                self.paragraphStyles[name] = (
-                                 self.extractParagraphProperties(style,
-                                                                 parent))
-    def processListStyles (self, listStyleElements) :
-
-        for style in listStyleElements :
-            name = style.getAttribute("style:name")
-
-            prop = ListProperties()
-            if style.childNodes :
-                if ( style.childNodes[0].tagName
-                     == "text:list-level-style-number" ) :
-                    prop.setOrdered(True)
-
-            self.listStyles[name] = prop
-        
-
-    def load(self, filepath) :
-        """ Loads an ODT file. """
-        
-        zip = zipfile.ZipFile(filepath)
-
-        styles_doc = xml.dom.minidom.parseString(zip.read("styles.xml"))
-        self.processFontDeclarations(styles_doc.getElementsByTagName(
-            "office:font-face-decls")[0])
-        self.processStyles(styles_doc.getElementsByTagName("style:style"))
-        self.processListStyles(styles_doc.getElementsByTagName(
-            "text:list-style"))
-        
-        self.content = xml.dom.minidom.parseString(zip.read("content.xml"))
-        self.processFontDeclarations(self.content.getElementsByTagName(
-            "office:font-face-decls")[0])
-        self.processStyles(self.content.getElementsByTagName("style:style"))
-        self.processListStyles(self.content.getElementsByTagName(
-            "text:list-style"))
-
-    def compressCodeBlocks(self, text) :
-        """ Removes extra blank lines from code blocks. """
-
-        lines = text.split("\n")
-        buffer = ""
-        numLines = len(lines)
-        for i in range(numLines) :
-            
-            if (lines[i].strip() or i == numLines-1  or i == 0 or
-                not ( lines[i-1].startswith("    ")
-                      and lines[i+1].startswith("    ") ) ):
-                buffer += "\n" + lines[i]
-
-        return buffer
-
-
-
-    def listToString (self, listElement) :
-
-        buffer = ""
-
-        styleName = listElement.getAttribute("text:style-name")
-        props = self.listStyles.get(styleName, ListProperties())
-
-        
-            
-        i = 0
-        for item in listElement.childNodes :
-            i += 1
-            if props.ordered :
-                number = str(i)
-                number = number + "." + " "*(2-len(number))
-                buffer += number + self.paragraphToString(item.childNodes[0],
-                                                        indent=3)
-            else :
-                buffer += "* " + self.paragraphToString(item.childNodes[0],
-                                                        indent=2)
-            buffer += "\n\n"
-            
-        return buffer
-
-    def toString (self) :
-        """ Converts the document to a string. """
-        body = self.content.getElementsByTagName("office:body")[0]
-        text = self.content.getElementsByTagName("office:text")[0]
-
-        buffer = u""
-
-
-        paragraphs = [el for el in text.childNodes
-                      if el.tagName in ["text:p", "text:h",
-                                        "text:list"]]
-
-        for paragraph in paragraphs :
-            if paragraph.tagName == "text:list" :
-                text = self.listToString(paragraph)
-            else :
-                text = self.paragraphToString(paragraph)
-            if text :
-                buffer += text + "\n\n"
-
-        if self.footnotes :
-
-            buffer += "--------\n\n"
-            for cite, body in self.footnotes :
-                buffer += "[^%s]: %s\n\n" % (cite, body)
-
-
-        return self.compressCodeBlocks(buffer)
-
-
-    def textToString(self, element) :
-
-        buffer = u""
-
-        for node in element.childNodes :
-
-            if node.nodeType == xml.dom.Node.TEXT_NODE :
-                buffer += node.nodeValue
-
-            elif node.nodeType == xml.dom.Node.ELEMENT_NODE :
-                tag = node.tagName
-
-                if tag == "text:span" :
-
-                    text = self.textToString(node) 
-
-                    if not text.strip() :
-                        return ""  # don't apply styles to white space
-
-                    styleName = node.getAttribute("text:style-name")
-                    style = self.textStyles.get(styleName, None)
-
-                    #print styleName, str(style)
-
-                    if style.fixed :
-                        buffer += "`" + text + "`"
-                        continue
-                    
-                    if style :
-                        if style.italic and style.bold :
-                            mark = "***"
-                        elif style.italic :
-                            mark = "_"
-                        elif style.bold :
-                            mark = "**"
-                        else :
-                            mark = ""
-                    else :
-                        mark = "<" + styleName + ">"
-
-                    buffer += "%s%s%s" % (mark, text, mark)
-                    
-                elif tag == "text:note" :
-                    cite = (node.getElementsByTagName("text:note-citation")[0]
-                                .childNodes[0].nodeValue)
-                               
-                    body = (node.getElementsByTagName("text:note-body")[0]
-                                .childNodes[0])
-
-                    self.footnotes.append((cite, self.textToString(body)))
-
-                    buffer += "[^%s]" % cite
-
-                elif tag in IGNORED_TAGS :
-                    pass
-
-                elif tag == "text:s" :
-                    try :
-                        num = int(node.getAttribute("text:c"))
-                        buffer += " "*num
-                    except :
-                        buffer += " "
-
-                elif tag == "text:tab" :
-                    buffer += "    "
-
-
-                elif tag == "text:a" :
-
-                    text = self.textToString(node)
-                    link = node.getAttribute("xlink:href")
-                    buffer += "[%s](%s)" % (text, link)
-                    
-                else :
-                    buffer += " {" + tag + "} "
-
-        return buffer
-
-    def paragraphToString(self, paragraph, indent = 0) :
-
-
-        style_name = paragraph.getAttribute("text:style-name")
-        paraProps = self.paragraphStyles.get(style_name) #, None)
-        text = self.textToString(paragraph)
-
-        #print style_name
-
-        if paraProps and not paraProps.code :
-            text = text.strip()
-
-        if paraProps.title :
-            self.hasTitle = 1
-            return text + "\n" + ("=" * len(text))
-
-        if paraProps.headingLevel :
-
-            level = paraProps.headingLevel
-            if self.hasTitle : level += 1
-
-            if level == 1 :
-                return text + "\n" + ("=" * len(text))
-            elif level == 2 :
-                return text + "\n" + ("-" * len(text))
-            else :
-                return "#" * level + " " + text
-
-        elif paraProps.code :
-            lines = ["    %s" % line for line in text.split("\n")]
-            return "\n".join(lines)
-
-        if paraProps.indented :
-            return self.wrapParagraph(text, indent = indent, blockquote = True)
-
-        else :
-            return self.wrapParagraph(text, indent = indent)
-        
-
-    def wrapParagraph(self, text, indent = 0, blockquote=False) :
-
-        counter = 0
-        buffer = ""
-        LIMIT = 50
-
-        if blockquote :
-            buffer += "> "
-        
-        for token in text.split() :
-
-            if counter > LIMIT - indent :
-                buffer += "\n" + " "*indent
-                if blockquote :
-                    buffer += "> "
-                counter = 0
-
-            buffer += token + " "
-            counter += len(token)
-
-        return buffer
-        
-
-
-if __name__ == "__main__" :
-
-
-    odt = OpenDocumentTextFile(sys.argv[1])
-
-    #print odt.fixedFonts
-
-    #sys.exit(0)
-    #out = open("out.txt", "wb")
-
-    unicode = odt.toString()
-    out_utf8 = unicode.encode("utf-8")
-
-    sys.stdout.write(out_utf8)
-
-    #out.write(

scripts/odt2txt.py

+"""
+ODT2TXT
+=======
+
+ODT2TXT convers files in Open Document Text format (ODT) into
+Markdown-formatted plain text.
+
+Writteby by [Yuri Takhteyev](http://www.freewisdom.org).
+
+Project website: http://www.freewisdom.org/projects/python-markdown/odt2txt.php
+Contact: yuri [at] freewisdom.org
+
+License: GPL 2 (http://www.gnu.org/copyleft/gpl.html) or BSD
+
+Version: 0.1 (April 7, 2006)
+
+"""
+
+
+
+import sys, zipfile, xml.dom.minidom
+
+IGNORED_TAGS = ["office:annotation"]
+
+FOOTNOTE_STYLES = ["Footnote"]
+
+
+class TextProps :
+    """ Holds properties for a text style. """
+
+    def __init__ (self):
+        
+        self.italic = False
+        self.bold = False
+        self.fixed = False
+
+    def setItalic (self, value) :
+        if value == "italic" :
+            self.italic = True
+
+    def setBold (self, value) :
+        if value == "bold" :
+            self.bold = True
+
+    def setFixed (self, value) :
+        self.fixed = value
+
+    def __str__ (self) :
+
+        return "[i=%s, h=i%s, fixed=%s]" % (str(self.italic),
+                                          str(self.bold),
+                                          str(self.fixed))
+
+class ParagraphProps :
+    """ Holds properties of a paragraph style. """
+
+    def __init__ (self):
+
+        self.blockquote = False
+        self.headingLevel = 0
+        self.code = False
+        self.title = False
+        self.indented = 0
+
+    def setIndented (self, value) :
+        self.indented = value
+
+    def setHeading (self, level) :
+        self.headingLevel = level
+
+    def setTitle (self, value):
+        self.title = value
+
+    def setCode (self, value) :
+        self.code = value
+
+
+    def __str__ (self) :
+
+        return "[bq=%s, h=%d, code=%s]" % (str(self.blockquote),
+                                           self.headingLevel,
+                                           str(self.code))
+
+
+class ListProperties :
+    """ Holds properties for a list style. """
+
+    def __init__ (self):
+        self.ordered = False
+ 
+    def setOrdered (self, value) :
+        self.ordered = value
+
+
+    
+class OpenDocumentTextFile :
+
+
+    def __init__ (self, filepath) :
+        self.footnotes = []
+        self.footnoteCounter = 0
+        self.textStyles = {"Standard" : TextProps()}
+        self.paragraphStyles = {"Standard" : ParagraphProps()}
+        self.listStyles = {}
+        self.fixedFonts = []
+        self.hasTitle = 0
+
+        self.load(filepath)
+        
+
+    def processFontDeclarations (self, fontDecl) :
+        """ Extracts necessary font information from a font-declaration
+            element.
+            """
+        for fontFace in fontDecl.getElementsByTagName("style:font-face") :
+            if fontFace.getAttribute("style:font-pitch") == "fixed" :
+                self.fixedFonts.append(fontFace.getAttribute("style:name"))
+        
+
+
+    def extractTextProperties (self, style, parent=None) :
+        """ Extracts text properties from a style element. """
+        
+        textProps = TextProps()
+        
+        if parent :
+            parentProp = self.textStyles.get(parent, None)
+            if parentProp :
+                textProp = parentProp
+            
+        textPropEl = style.getElementsByTagName("style:text-properties")
+        if not textPropEl : return textProps
+        
+        textPropEl = textPropEl[0]
+
+        italic = textPropEl.getAttribute("fo:font-style")
+        bold = textPropEl.getAttribute("fo:font-weight")
+
+        textProps.setItalic(italic)
+        textProps.setBold(bold)
+
+        if textPropEl.getAttribute("style:font-name") in self.fixedFonts :
+            textProps.setFixed(True)
+
+        return textProps
+
+    def extractParagraphProperties (self, style, parent=None) :
+        """ Extracts paragraph properties from a style element. """
+
+        paraProps = ParagraphProps()
+
+        name = style.getAttribute("style:name")
+
+        if name.startswith("Heading_20_") :
+            level = name[11:]
+            try :
+                level = int(level)
+                paraProps.setHeading(level)
+            except :
+                level = 0
+
+        if name == "Title" :
+            paraProps.setTitle(True)
+        
+        paraPropEl = style.getElementsByTagName("style:paragraph-properties")
+        if paraPropEl :
+            paraPropEl = paraPropEl[0]
+            leftMargin = paraPropEl.getAttribute("fo:margin-left")
+            if leftMargin :
+                try :
+                    leftMargin = float(leftMargin[:-2])
+                    if leftMargin > 0.01 :
+                        paraProps.setIndented(True)
+                except :
+                    pass
+
+        textProps = self.extractTextProperties(style)
+        if textProps.fixed :
+            paraProps.setCode(True)
+
+        return paraProps
+    
+
+    def processStyles(self, styleElements) :
+        """ Runs through "style" elements extracting necessary information.
+            """
+
+        for style in styleElements :
+
+            name = style.getAttribute("style:name")
+
+            if name == "Standard" : continue
+
+            family = style.getAttribute("style:family")
+            parent = style.getAttribute("style:parent-style-name")
+
+            if family == "text" : 
+                self.textStyles[name] = self.extractTextProperties(style,
+                                                                   parent)
+
+            elif family == "paragraph":
+                self.paragraphStyles[name] = (
+                                 self.extractParagraphProperties(style,
+                                                                 parent))
+    def processListStyles (self, listStyleElements) :
+
+        for style in listStyleElements :
+            name = style.getAttribute("style:name")
+
+            prop = ListProperties()
+            if style.childNodes :
+                if ( style.childNodes[0].tagName
+                     == "text:list-level-style-number" ) :
+                    prop.setOrdered(True)
+
+            self.listStyles[name] = prop
+        
+
+    def load(self, filepath) :
+        """ Loads an ODT file. """
+        
+        zip = zipfile.ZipFile(filepath)
+
+        styles_doc = xml.dom.minidom.parseString(zip.read("styles.xml"))
+        self.processFontDeclarations(styles_doc.getElementsByTagName(
+            "office:font-face-decls")[0])
+        self.processStyles(styles_doc.getElementsByTagName("style:style"))
+        self.processListStyles(styles_doc.getElementsByTagName(
+            "text:list-style"))
+        
+        self.content = xml.dom.minidom.parseString(zip.read("content.xml"))
+        self.processFontDeclarations(self.content.getElementsByTagName(
+            "office:font-face-decls")[0])
+        self.processStyles(self.content.getElementsByTagName("style:style"))
+        self.processListStyles(self.content.getElementsByTagName(
+            "text:list-style"))
+
+    def compressCodeBlocks(self, text) :
+        """ Removes extra blank lines from code blocks. """
+
+        lines = text.split("\n")
+        buffer = ""
+        numLines = len(lines)
+        for i in range(numLines) :
+            
+            if (lines[i].strip() or i == numLines-1  or i == 0 or
+                not ( lines[i-1].startswith("    ")
+                      and lines[i+1].startswith("    ") ) ):
+                buffer += "\n" + lines[i]
+
+        return buffer
+
+
+
+    def listToString (self, listElement) :
+
+        buffer = ""
+
+        styleName = listElement.getAttribute("text:style-name")
+        props = self.listStyles.get(styleName, ListProperties())
+
+        
+            
+        i = 0
+        for item in listElement.childNodes :
+            i += 1
+            if props.ordered :
+                number = str(i)
+                number = number + "." + " "*(2-len(number))
+                buffer += number + self.paragraphToString(item.childNodes[0],
+                                                        indent=3)
+            else :
+                buffer += "* " + self.paragraphToString(item.childNodes[0],
+                                                        indent=2)
+            buffer += "\n\n"
+            
+        return buffer
+
+    def toString (self) :
+        """ Converts the document to a string. """
+        body = self.content.getElementsByTagName("office:body")[0]
+        text = self.content.getElementsByTagName("office:text")[0]
+
+        buffer = u""
+
+
+        paragraphs = [el for el in text.childNodes
+                      if el.tagName in ["text:p", "text:h",
+                                        "text:list"]]
+
+        for paragraph in paragraphs :
+            if paragraph.tagName == "text:list" :
+                text = self.listToString(paragraph)
+            else :
+                text = self.paragraphToString(paragraph)
+            if text :
+                buffer += text + "\n\n"
+
+        if self.footnotes :
+
+            buffer += "--------\n\n"
+            for cite, body in self.footnotes :
+                buffer += "[^%s]: %s\n\n" % (cite, body)
+
+
+        return self.compressCodeBlocks(buffer)
+
+
+    def textToString(self, element) :
+
+        buffer = u""
+
+        for node in element.childNodes :
+
+            if node.nodeType == xml.dom.Node.TEXT_NODE :
+                buffer += node.nodeValue
+
+            elif node.nodeType == xml.dom.Node.ELEMENT_NODE :
+                tag = node.tagName
+
+                if tag == "text:span" :
+
+                    text = self.textToString(node) 
+
+                    if not text.strip() :
+                        return ""  # don't apply styles to white space
+
+                    styleName = node.getAttribute("text:style-name")
+                    style = self.textStyles.get(styleName, None)
+
+                    #print styleName, str(style)
+
+                    if style.fixed :
+                        buffer += "`" + text + "`"
+                        continue
+                    
+                    if style :
+                        if style.italic and style.bold :
+                            mark = "***"
+                        elif style.italic :
+                            mark = "_"
+                        elif style.bold :
+                            mark = "**"
+                        else :
+                            mark = ""
+                    else :
+                        mark = "<" + styleName + ">"
+
+                    buffer += "%s%s%s" % (mark, text, mark)
+                    
+                elif tag == "text:note" :
+                    cite = (node.getElementsByTagName("text:note-citation")[0]
+                                .childNodes[0].nodeValue)
+                               
+                    body = (node.getElementsByTagName("text:note-body")[0]
+                                .childNodes[0])
+
+                    self.footnotes.append((cite, self.textToString(body)))
+
+                    buffer += "[^%s]" % cite
+
+                elif tag in IGNORED_TAGS :
+                    pass
+
+                elif tag == "text:s" :
+                    try :
+                        num = int(node.getAttribute("text:c"))
+                        buffer += " "*num
+                    except :
+                        buffer += " "
+
+                elif tag == "text:tab" :
+                    buffer += "    "
+
+
+                elif tag == "text:a" :
+
+                    text = self.textToString(node)
+                    link = node.getAttribute("xlink:href")
+                    buffer += "[%s](%s)" % (text, link)
+                    
+                else :
+                    buffer += " {" + tag + "} "
+
+        return buffer
+
+    def paragraphToString(self, paragraph, indent = 0) :
+
+
+        style_name = paragraph.getAttribute("text:style-name")
+        paraProps = self.paragraphStyles.get(style_name) #, None)
+        text = self.textToString(paragraph)
+
+        #print style_name
+
+        if paraProps and not paraProps.code :
+            text = text.strip()
+
+        if paraProps.title :
+            self.hasTitle = 1
+            return text + "\n" + ("=" * len(text))
+
+        if paraProps.headingLevel :
+
+            level = paraProps.headingLevel
+            if self.hasTitle : level += 1
+
+            if level == 1 :
+                return text + "\n" + ("=" * len(text))
+            elif level == 2 :
+                return text + "\n" + ("-" * len(text))
+            else :
+                return "#" * level + " " + text
+
+        elif paraProps.code :
+            lines = ["    %s" % line for line in text.split("\n")]
+            return "\n".join(lines)
+
+        if paraProps.indented :
+            return self.wrapParagraph(text, indent = indent, blockquote = True)
+
+        else :
+            return self.wrapParagraph(text, indent = indent)
+        
+
+    def wrapParagraph(self, text, indent = 0, blockquote=False) :
+
+        counter = 0
+        buffer = ""
+        LIMIT = 50
+
+        if blockquote :
+            buffer += "> "
+        
+        for token in text.split() :
+
+            if counter > LIMIT - indent :
+                buffer += "\n" + " "*indent
+                if blockquote :
+                    buffer += "> "
+                counter = 0
+
+            buffer += token + " "
+            counter += len(token)
+
+        return buffer
+        
+
+
+if __name__ == "__main__" :
+
+
+    odt = OpenDocumentTextFile(sys.argv[1])
+
+    #print odt.fixedFonts
+
+    #sys.exit(0)
+    #out = open("out.txt", "wb")
+
+    unicode = odt.toString()
+    out_utf8 = unicode.encode("utf-8")
+
+    sys.stdout.write(out_utf8)
+
+    #out.write(

writing_extensions.txt

-### Overview
-
-Python-Markdown includes an API for extension writers to plug their own 
-custom functionality and/or syntax into the parser. There are preprocessors
-which allow you to alter the source before it is passed to the parser, 
-inline patterns which allow you to add, remove or override the syntax of
-any inline elements, and postprocessors which allow munging of the
-output of the parser before it is returned.
-
-As the parser builds an [ElementTree][] DOM object which is later rendered 
-as Unicode text, there are also some helpers provided to make manipulation of 
-the DOM tree easier. Each part of the API is discussed in its respective 
-section below. You may find reading the source of some [[existing extensions]] 
-helpful as well. For example, the [[footnote]] extension uses most of the 
-features documented here.
-
-* [Preprocessors][]
-    * [TextPreprocessors][]
-    * [Line Preprocessors][]
-* [InlinePatterns][]
-* [Postprocessors][]
-    * [DOM Postprocessors][]
-    * [TextProstprocessors][]
-* [Working with the DOM][]
-* [Integrating your code into Markdown][]
-    * [extendMarkdown][]
-    * [Config Settings][]
-    * [makeExtension][]
-
-<h3 id="preprocessors">Preprocessors</h3>
-
-Preprocessors munge the source text before it is passed into the Markdown 
-core. This is an excellent place to clean up bad syntax, extract things the 
-parser may otherwise choke on and perhaps even store it for later retrieval.
-
-There are two types of preprocessors: [TextPreprocessors][] and 
-[Line Preprocessors][].
-
-<h4 id="textpreprocessors">TextPreprocessors</h4>
-
-TextPreprocessors should inherit from `markdown.TextPreprocessor` and implement
-a `run` method with one argument `text`. The `run` method of each 
-TextPreprocessor will be passed the entire source text as a single Unicode
-string and should either return that single Unicode string, or an altered
-version of it.
-
-For example, a simple TextPreprocessor that normalizes newlines [^1] might look
-like this:
-
-    class NormalizePreprocessor(markdown.TextPreprocessor):
-        def run(self, text):
-            return text.replace("\r\n", "\n").replace("\r", "\n")
-
-[^1]: It should be noted that Markdown already normalizes newlines. This 
-example is for illustrative purposes only.
-
-<h4 id="linepreprocessors">Line Preprocessors</h4>
-
-Line Preprocessors should inherit from `markdown.Preprocessor` and implement 
-a `run` method with one argument `lines`. The `run` method of each Line
-Preprocessor will be passed the entire source text as a list of Unicode strings.
-Each string will contain one line of text. The `run` method should return
-either that list, or an altered list of Unicode strings.
-
-A pseudo example:
-
-    class MyPreprocessor(markdown.Preprocessor):
-        def run(self, lines):
-            new_lines = []
-            for line in lines:
-                m = MYREGEX.match(line)
-                if m:
-                    # do stuff
-                else:
-                    new_lines.append(line)
-            return new_lines
-
-<h3 id="inlinepatterns">Inline Patterns</h3>
-
-Inline Patterns implement the inline HTML element syntax for Markdown such as
-`*emphasis*` or `[links](http://example.com)`. Pattern objects should be 
-instances of classes that inherit from `markdown.Pattern` or one of its 
-children. Each pattern object uses a single regular expression and must have 
-the following methods:
-
-* `getCompiledRegExp()`: Returns a compiled regular expression.
-* `handleMatch(m)`: Accepts a match object and returns an ElementTree
-element of a plain Unicode string.
-
-Note that any regular expression returned by `getCompiledRegExp` must capture
-the whole block. Therefore, they should all start with `r'^(.*?)'` and end
-with `r'(.*?)!'. When using the default `getCompiledRegExp()` method provided 
-in the `Pattern` you can pass in a regular expression without that and 
-`getCompiledRegExp` will wrap your expression for you. This means that the first
-group of your match will be `m.group(2)` as `m.group(1)` will match everything 
-before the pattern.
-
-For an example, consider this simplified emphasis pattern:
-
-    class EmphasisPattern(markdown.Pattern):
-        def handleMatch(self, m):
-            el = markdown.etree.Element('em')
-            el.text = m.group(3)
-            return el
-
-As discussed in [Integrating Your Code Into Markdown][], an instance of this
-class will need to be provided to Markdown. That instance would be created
-like so:
-
-    # an oversimplified regex
-    MYPATTERN = r'\*([^*]+)\*'
-    # pass in pattern and create instance
-    emphasis = EmphasisPattern(MYPATTERN)
-
-Actually it would not be necessary to create that pattern (and not just because
-a more sophisticated emphasis pattern already exists in Markdown). The fact is,
-that example pattern is not very DRY. A pattern for `**strong**` text would
-be almost identical, with the exception that it would create a 'strong' element.
-Therefore, Markdown provides a number of generic pattern classes that can 
-provide some common functionality. For example, both emphasis and strong are
-implemented with separate instances of the `SimpleTagPettern` listed below. 
-Feel free to use or extend any of these Pattern classes.
-
-**Generic Pattern Classes**
-
-* `SimpleTextPattern(pattern)`:
-
-    Returns simple text of `group(2)` of a `pattern`.
-
-* `SimpleTagPattern(pattern, tag)`:
-
-    Returns an element of type "`tag`" with a text attribute of `group(3)`
-    of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
-
-* `SubstituteTagPattern(pattern, tag)`:
-
-    Returns an element of type "`tag`" with no children or text (i.e.: 'br').
-
-There may be other Pattern classes in the Markdown source that you could extend
-or use as well. Read through the source and see if there is anything you can 
-use. You might even get a few ideas for different approaches to your specific
-situation.
-
-<h3 id="postprocessors">Postprocessors</h3>
-
-Postprocessors manipulate a document after it has passed through the Markdown 
-core. This is were stored text gets added back in such as a list of footnotes, 
-a table of contents or raw html.
-
-There are two types of postprocessors: [DOM Postprocessors][] and 
-[TextPostprocessors][].
-
-<h4 id="dompostprocessors">DOM Postprocessors</h4>
-
-A DOM Postprocessor should inherit from `markdown.Postprocessor` and over-ride
-the `run` method which takes one argument `root` and should return either
-that root element or a modified root element.
-
-A pseudo example:
-
-    class MyPostprocessor(markdown.Postprocessor):
-    def run(self, root):
-        #do stufff
-        return my_modified_root
-
-For specifics on manipulating the DOM, see [Working with the DOM][] below.
-
-<h4 id="textpostprocessors">TextPostprocessors</h4>
-
-A TextPostprocessor should inherit from `markdown.TextPostprocessor` and
-over-ride the `run` method which takes one argument `text` and returns a
-Unicode string.
-
-TextPostprocessors are run after the DOM has been serialized back into Unicode
-text.  For example, this may be an appropriate place to add a table of contents
-to a document:
-
-    class TocTextPostprocessor(markdown.TextPostprocessor):
-    def run(self, text):
-        return MYMARKERRE.sub(MyToc, text)
-
-<h3 id="working_with_dom">Working with the DOM</h3>
-
-As mentioned, the Markdown parser converts a source document to an 
-[ElementTree][] DOM object before serializing that back to Unicode text.