Waylan Limberg  committed 220bc1f

First draft of writing_extensions.txt. It's a little rough yet and still lacks documentation regarding the use of ElementTree.

  • Participants
  • Parent commits 5a4d760

Comments (0)

Files changed (1)

File writing_extensions.txt

+### Overview
+Python-Markdown includes an API for extension writers to plug their own 
+custom functionality and/or syntax into the parser. There are preprocessors
+which allow you to alter the source before it is passed to the parser, 
+inline patterns which allow you to add, remove or override the syntax of
+any inline elements, and postprocessors which allow munging of the
+output of the parser before it is returned.
+As the parser builds an [ElementTree][] DOM object which is later rendered 
+as Unicode text, there are also some helpers provided to make manipulation of 
+the DOM tree easier. Each part of the API is discussed in its respective 
+section below. You may find reading the source of some [[existing extensions]] 
+helpful as well. For example, the [[footnote]] extension uses most of the 
+features documented here.
+* [Preprocessors][]
+    * [TextPreprocessors][]
+    * [Line Preprocessors][]
+* [InlinePatterns][]
+* [Postprocessors][]
+    * [DOM Postprocessors][]
+    * [TextProstprocessors][]
+* [Working with the DOM][]
+* [Integrating your code into Markdown][]
+    * [extendMarkdown][]
+    * [Config Settings][]
+    * [makeExtension][]
+<h3 id="preprocessors">Preprocessors</h3>
+Preprocessors munge the source text before it is passed into the Markdown 
+core. This is an excellent place to clean up bad syntax, extract things the 
+parser may otherwise choke on and perhaps even store it for later retrieval.
+There are two types of preprocessors: [TextPreprocessors][] and 
+[Line Preprocessors][].
+<h4 id="textpreprocessors">TextPreprocessors</h4>
+TextPreprocessors should inherit from `markdown.TextPreprocessor` and implement
+a `run` method with one argument `text`. The `run` method of each 
+TextPreprocessor will be passed the entire source text as a single Unicode
+string and should either return that single Unicode string, or an altered
+version of it.
+For example, a simple TextPreprocessor that normalizes newlines [^1] might look
+like this:
+    class NormalizePreprocessor(markdown.TextPreprocessor):
+        def run(self, text):
+            return text.replace("\r\n", "\n").replace("\r", "\n")
+[^1]: It should be noted that Markdown already normalizes newlines. This 
+example is for illustrative purposes only.
+<h4 id="linepreprocessors">Line Preprocessors</h4>
+Line Preprocessors should inherit from `markdown.Preprocessor` and implement 
+a `run` method with one argument `lines`. The `run` method of each Line
+Preprocessor will be passed the entire source text as a list of Unicode strings.
+Each string will contain one line of text. The `run` method should return
+either that list, or an altered list of Unicode strings.
+A pseudo example:
+    class MyPreprocessor(markdown.Preprocessor):
+        def run(self, lines):
+            new_lines = []
+            for line in lines:
+                m = MYREGEX.match(line)
+                if m:
+                    # do stuff
+                else:
+                    new_lines.append(line)
+            return new_lines
+<h3 id="inlinepatterns">Inline Patterns</h3>
+Inline Patterns implement the inline HTML element syntax for Markdown such as
+`*emphasis*` or `[links](`. Pattern objects should be 
+instances of classes that inherit from `markdown.Pattern` or one of its 
+children. Each pattern object uses a single regular expression and must have 
+the following methods:
+* `getCompiledRegExp()`: Returns a compiled regular expression.
+* `handleMatch(m)`: Accepts a match object and returns an ElementTree
+element of a plain Unicode string.
+Note that any regular expression returned by `getCompiledRegExp` must capture
+the whole block. Therefore, they should all start with `r'^(.*?)'` and end
+with `r'(.*?)!'. When using the default `getCompiledRegExp()` method provided 
+in the `Pattern` you can pass in a regular expression without that and 
+`getCompiledRegExp` will wrap your expression for you. This means that the first
+group of your match will be `` as `` will match everything 
+before the pattern.
+For an example, consider this simplified emphasis pattern:
+    class EmphasisPattern(markdown.Pattern):
+        def handleMatch(self, m):
+            el = markdown.etree.Element('em')
+            el.text =
+            return el
+As discussed in [Integrating Your Code Into Markdown][], an instance of this
+class will need to be provided to Markdown. That instance would be created
+like so:
+    # an oversimplified regex
+    MYPATTERN = r'\*([^*]+)\*'
+    # pass in pattern and create instance
+    emphasis = EmphasisPattern(MYPATTERN)
+Actually it would not be necessary to create that pattern (and not just because
+a more sophisticated emphasis pattern already exists in Markdown). The fact is,
+that example pattern is not very DRY. A pattern for `**strong**` text would
+be almost identical, with the exception that it would create a 'strong' element.
+Therefore, Markdown provides a number of generic pattern classes that can 
+provide some common functionality. For example, both emphasis and strong are
+implemented with separate instances of the `SimpleTagPettern` listed below. 
+Feel free to use or extend any of these Pattern classes.
+**Generic Pattern Classes**
+* `SimpleTextPattern(pattern)`:
+    Returns simple text of `group(2)` of a `pattern`.
+* `SimpleTagPattern(pattern, tag)`:
+    Returns an element of type "`tag`" with a text attribute of `group(3)`
+    of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
+* `SubstituteTagPattern(pattern, tag)`:
+    Returns an element of type "`tag`" with no children or text (i.e.: 'br').
+There may be other Pattern classes in the Markdown source that you could extend
+or use as well. Read through the source and see if there is anything you can 
+use. You might even get a few ideas for different approaches to your specific
+<h3 id="postprocessors">Postprocessors</h3>
+Postprocessors manipulate a document after it has passed through the Markdown 
+core. This is were stored text gets added back in such as a list of footnotes, 
+a table of contents or raw html.
+There are two types of postprocessors: [DOM Postprocessors][] and 
+<h4 id="dompostprocessors">DOM Postprocessors</h4>
+A DOM Postprocessor should inherit from `markdown.Postprocessor` and over-ride
+the `run` method which takes one argument `root` and should return either
+that root element or a modified root element.
+A pseudo example:
+    class MyPostprocessor(markdown.Postprocessor):
+    def run(self, root):
+        #do stufff
+        return my_modified_root
+For specifics on manipulating the DOM, see [Working with the DOM][] below.
+<h4 id="textpostprocessors">TextPostprocessors</h4>
+A TextPostprocessor should inherit from `markdown.TextPostprocessor` and
+over-ride the `run` method which takes one argument `text` and returns a
+Unicode string.
+TextPostprocessors are run after the DOM has been serialized back into Unicode
+text.  For example, this may be an appropriate place to add a table of contents
+to a document:
+    class TocTextPostprocessor(markdown.TextPostprocessor):
+    def run(self, text):
+        return MYMARKERRE.sub(MyToc, text)
+<h3 id="working_with_dom">Working with the DOM</h3>
+As mentioned, the Markdown parser converts a source document to an 
+[ElementTree][] DOM object before serializing that back to Unicode text. 
+Markdown has provided some helpers to ease that manipulation within the context 
+of the Markdown module...
+<h3 id="integrating_into_markdown">Integrating Your Code Into Markdown
+Once you have the various pieces of your extension built, you need to tell 
+Markdown about them and ensure that they are run in the proper sequence. 
+Markdown accepts a `Extension` instance for each extension. Therefore, you
+will need to define a class that extends `markdown.Extension` and over-rides
+the `extendMarkdown` method. Within this class you will manage configuration 
+options for your extension and attach the various processors and patterns to 
+the Markdown instance. 
+It is important to note that the order of the various processors and patterns 
+matters. For example, if we replace `http://...` links with <a> elements, and 
+*then* try to deal with  inline html, we will end up with a mess. Therefore, 
+the various types of processors and patterns are stored within an instance of 
+the Markdown class within lists. Your `Extension` class will need to manipulate
+those lists appropriately. You may insert instances of your processors and
+patterns into the appropriate location in a list, remove a built-in instances,
+or replace a built-in instance with your own.
+<h4 id="extendmarkdown">`extendMarkdown`</h4>
+The `extendMarkdown` method of a `markdown.Extension` class accepts two 
+* `md`:
+    A pointer to the instance of the Markdown class. You should use this to 
+    access the lists of processors and patterns. They are found under the 
+    following attributes:
+    * `md.textPreprocessors`
+    * `md.preprocessors`
+    * `md.inlinePatterns`
+    * `md.postpreprocessors`
+    * `md.textPostprocessors`
+    Some other things you may want to access in the markdown instance are:
+    * `md.inlineStash`
+    * `md.htmlStash`
+    * `md.registerExtension()`
+* `md_globals`
+    Contains all the various global variables within the markdown module.
+Of course, with access to those items, theoretically you have the option to 
+changing anything through various monkeypatching techniques. However, you should
+be aware that the various undocumented or private parts of markdown may change
+without notice and your monkeypatches may no longer work. Therefore, what you
+really should be doing is inserting processors and patterns into the markdown
+<h4 id="configsettings">Config Settings</h4>
+If an extension uses any parameters that the user may want to change,
+those parameters should be stored in `self.config` of your `markdown.Extension`
+class in the following format:
+    self.config = {parameter_1_name : [value1, description1],
+                   parameter_2_name : [value2, description2] }
+When stored this way the config parameters can be over-ridden from the
+command line or at the time Markdown is initiated:
+ -x myextension(SOME_PARAM=2) inputfile.txt > output.txt
+Note that parameters should always be assumed to be set to string
+values, and should be converted at run time. For example:
+    i = int(self.getConfig("SOME_PARAM"))
+<h4 id="makeextension">`makeExtension`</h4>
+Each extension should ideally be placed in its own module starting
+with the  ``mdx_`` prefix (e.g. ````).  The module must
+provide a module-level function called ``makeExtension`` that takes
+an optional parameter consisting of a dictionary of configuration over-rides 
+and returns an instance of the extension.  An example from the footnote extension:
+    def makeExtension(configs=None) :
+        return FootnoteExtension(configs=configs)
+By following the above example, when Markdown is passed the name of your 
+extension as a string (i.e.: ``'footnotes'``), it will automatically import
+the module and call the ``makeExtension`` function initiating your extension.
+However, Markdown will also accept an already existing instance of an extension.For example:
+    import markdown, mdx_myextension
+    configs = {...}
+    myext = mdx_myextension.MyExtension(configs=configs)
+    md = markdown.Markdown(extensions=[myext])
+This is useful if you need to implement a large number of extensions with more
+than one residing in a module.
+[Preprocessors]: #preprocessors
+[TextPreprocessors]: #textpreprocessors
+[Line Preprocessors]: #linepreprocessors
+[InlinePatterns]: #inlinepatterns
+[Postprocessors]: #postprocessors
+[DOM Postprocessors]: #dompostprocessors
+[TextProstprocessors]: #textpostprocessors
+[Working with the DOM]: #working_with_dom
+[Integrating your code into Markdown]: #integrating_into_markdown
+[extendMarkdown]: #extendmarkdown
+[Config Settings]: #configsettings
+[makeExtension]: #makeextension