Geoffrey Sneddon avatar Geoffrey Sneddon committed ec226b1

Name change.

Comments (0)

Files changed (24)

 <!DOCTYPE html><html lang=en-gb-x-sneddy><head><meta charset=utf-8>
-<title>spec-gen 1.0</title>
+<title>anolis 1.0</title>
 <link href=style.css rel=stylesheet>
 <style>
 a:not([href]) {
 </style>
 
 </head><body><header>
-	<h1>spec-gen 1.0</h1>
-	<h2 class="no-num no-toc">Documentation — 27 August 2008</h2>
+	<h1>anolis 1.0</h1>
+	<h2 class="no-num no-toc">Documentation — 28 August 2008</h2>
 </header>
 
 <h2 class="no-num no-toc" id=contents>Contents</h2>
 <!--begin-toc-->
 <ol class=toc>
 	<li><a href=#introduction><span class=secno>1 </span>Introduction</a></li>
-	<li><a href=#installing-the-spec-gen><span class=secno>2 </span>Installing the spec-gen</a>
+	<li><a href=#installing-anolis><span class=secno>2 </span>Installing anolis</a>
 		<ol>
 			<li><a href=#requirements><span class=secno>2.1 </span>Requirements</a></li>
 			<li><a href=#obtaining-a-copy><span class=secno>2.2 </span>Obtaining a copy</a></li>
 			<li><a href=#installation><span class=secno>2.3 </span>Installation</a></li>
 			<li><a href=#running-the-test-suite><span class=secno>2.4 </span>Running the test suite</a></li></ol></li>
-	<li><a href=#using-the-spec-gen><span class=secno>3 </span>Using the spec-gen</a></li>
+	<li><a href=#using-anolis><span class=secno>3 </span>Using anolis</a></li>
 	<li><a href=#processes><span class=secno>4 </span>Processes</a>
 		<ol>
 			<li><a href=#cross-referencing><span class=secno>4.1 </span>Cross-referencing</a></li>
 
 <h2 id=introduction><span class=secno>1 </span>Introduction</h2>
 
-<p>The need for the spec-gen came from the need for long technical documents to
+<p>The need for anolis came from the need for long technical documents to
 include niceties such as cross-references and a table of contents for the
 purpose of easy navigation — doing this manually can be a great chore
 especially when sections are numbered and a section is added, consequently
 changing the numbering of many others, leading to it being advantageous to do it
 programmatically.</p>
 
-<p>The spec-gen does this on HTML documents, as a number of sequential
-processes. Currently cross-referencing, section numbering, table of contents
-creation, and a number of substitutions are done (mainly relating to the current
+<p>Anolis does this on HTML documents, as a number of sequential processes.
+Currently cross-referencing, section numbering, table of contents creation, and
+a number of substitutions are done (mainly relating to the current
 date).</p>
 
-<h2 id=installing-the-spec-gen><span class=secno>2 </span>Installing the spec-gen</h2>
+<h2 id=installing-anolis><span class=secno>2 </span>Installing anolis</h2>
 
 <h3 id=requirements><span class=secno>2.1 </span>Requirements</h3>
 
 <h3 id=obtaining-a-copy><span class=secno>2.2 </span>Obtaining a copy</h3>
 
 <p>Releases are occasionally made. A link to the latest release can be found
-from the <a href=http://spec-gen.gsnedders.com>spec-gen website</a>.</p>
+from the <a href=http://anolis.gsnedders.com>anolis website</a>.</p>
 
 <p>Alternatively, a copy can be obtained from <dfn id=our-mercurial-repository>our <a href=http://www.selenic.com/mercurial/>Mercurial</a> repository</dfn>: this is
 where our ongoing development occurs, and allows any revision (and therefore any
 release) to be downloaded. Our repository is located at
-<code><!--begin-link--><a href=http://hg.gsnedders.com/spec-gen/>http://hg.gsnedders.com/spec-gen/</a><!--end-link--></code>.
+<code><!--begin-link--><a href=http://hg.gsnedders.com/anolis/>http://hg.gsnedders.com/anolis/</a><!--end-link--></code>.
 
 </p><h3 id=installation><span class=secno>2.3 </span>Installation</h3>
 
 
 <p><code>python runtests.py</code></p>
 
-<p>Any test failures should be reported at our <dfn id=bug-tracker><a href=http://bugs.gsnedders.com/projects/show/spec-gen>bug
-tracker</a></dfn>.</p>
+<p>Any test failures should be reported at our <dfn id=bug-tracker><a href=http://bugs.gsnedders.com/projects/show/anolis>bug tracker</a></dfn>.</p>
 
-<h2 id=using-the-spec-gen><span class=secno>3 </span>Using the spec-gen</h2>
+<h2 id=using-anolis><span class=secno>3 </span>Using anolis</h2>
 
-<p>The spec-gen is invoked through the <code>spec-gen</code> command. The
+<p>Anolis is invoked through the <code>anolis</code> command. The
 <dfn id=help><code>--help</code></dfn> (or <dfn id=h><code>-h</code></dfn>) option gives some
 basic help.</p>
 
 <a href=#fatal-error>fatal error</a>)<!--; passing the XXX: need double hyphen
 <dfn><code>xml</code></dfn> option uses libxml2's XML parser instead-->.</p>
 
-<p>The spec-gen offers a <dfn id=compatibility-mode>compatibility mode</dfn>, which aims to be
-compatible with the <a href=http://www.w3.org/Style/Group/css3-src/bin/postprocess>CSS3 module
-postprocessor</a> (within reason). This is mainly provided for the sake of
-pre-existing <a href=http://w3.org/>W3C</a> documents. The
+<p>anolis offers a <dfn id=compatibility-mode>compatibility mode</dfn>, which aims to be compatible
+with the <a href=http://www.w3.org/Style/Group/css3-src/bin/postprocess>CSS3
+module postprocessor</a> (within reason). This is mainly provided for the sake
+of pre-existing <a href=http://w3.org/>W3C</a> documents. The
 <dfn id=w3c-compat><code>--w3c-compat</code></dfn> option turns on this compatibility mode,
 although specific options that turn on just one compatibility feature at a time
 are also available (and are documented below under each <a href=#processes title=processes>process</a>) — these are all implied by the
 <code>dfn</code> element: the <a href=#definition>definition</a> itself is taken from the
 <code>title</code> attribute if it is present, otherwise it is taken from the
 <a href=#textcontent>textContent</a> property of the <code>dfn</code> element. By default,
-the spec-gen will throw a <a href=#fatal-error>fatal error</a> if a <a href=#term>term</a> is
-defined more than once: this behaviour can be turned off (causing the final
+anolis will throw a <a href=#fatal-error>fatal error</a> if a <a href=#term>term</a> is defined
+more than once: this behaviour can be turned off (causing the final
 <a href=#definition>definition</a> of the <a href=#term>term</a> to be the one that is used) by
 the <dfn id=allow-duplicate-dfns><code>--allow-duplicate-dfns</code></dfn> option.
 
 which this is partially based, and (with <a href=#w3c-compat><code>--w3c-compat</code></a>) claims to be
 partially compatible with. Further special thanks to Bert Bos for creating a
 number of things (especially as the algorithm for finding the <a href=#w3c-status>W3C
-status</a>) that took the author of the spec-gen many hours to reverse
+status</a>) that took the author of anolis many hours to reverse
 engineer.</p>
 </body></html>
 <!doctype html>
 <html lang="en-gb-x-sneddy">
 <meta charset="utf-8">
-<title>spec-gen 1.0</title>
+<title>anolis 1.0</title>
 <link rel="stylesheet" href="style.css">
 <style>
 a:not([href]) {
 
 <h2>Introduction</h2>
 
-<p>The need for the spec-gen came from the need for long technical documents to
+<p>The need for anolis came from the need for long technical documents to
 include niceties such as cross-references and a table of contents for the
 purpose of easy navigation — doing this manually can be a great chore
 especially when sections are numbered and a section is added, consequently
 changing the numbering of many others, leading to it being advantageous to do it
 programmatically.</p>
 
-<p>The spec-gen does this on HTML documents, as a number of sequential
-processes. Currently cross-referencing, section numbering, table of contents
-creation, and a number of substitutions are done (mainly relating to the current
+<p>Anolis does this on HTML documents, as a number of sequential processes.
+Currently cross-referencing, section numbering, table of contents creation, and
+a number of substitutions are done (mainly relating to the current
 date).</p>
 
-<h2>Installing the spec-gen</h2>
+<h2>Installing anolis</h2>
 
 <h3>Requirements</h3>
 
 <h3>Obtaining a copy</h3>
 
 <p>Releases are occasionally made. A link to the latest release can be found
-from the <a href="http://spec-gen.gsnedders.com">spec-gen website</a>.</p>
+from the <a href="http://anolis.gsnedders.com">anolis website</a>.</p>
 
 <p>Alternatively, a copy can be obtained from <dfn>our <a
 href="http://www.selenic.com/mercurial/">Mercurial</a> repository</dfn>: this is
 where our ongoing development occurs, and allows any revision (and therefore any
 release) to be downloaded. Our repository is located at
-<code><!--begin-link-->http://hg.gsnedders.com/spec-gen/<!--end-link--></code>.
+<code><!--begin-link-->http://hg.gsnedders.com/anolis/<!--end-link--></code>.
 
 <h3>Installation</h3>
 
 <p><code>python runtests.py</code></p>
 
 <p>Any test failures should be reported at our <dfn><a
-href="http://bugs.gsnedders.com/projects/show/spec-gen">bug
-tracker</a></dfn>.</p>
+href="http://bugs.gsnedders.com/projects/show/anolis">bug tracker</a></dfn>.</p>
 
-<h2>Using the spec-gen</h2>
+<h2>Using anolis</h2>
 
-<p>The spec-gen is invoked through the <code>spec-gen</code> command. The
+<p>Anolis is invoked through the <code>anolis</code> command. The
 <dfn><code>--help</code></dfn> (or <dfn><code>-h</code></dfn>) option gives some
 basic help.</p>
 
 <span>fatal error</span>)<!--; passing the XXX: need double hyphen
 <dfn><code>xml</code></dfn> option uses libxml2's XML parser instead-->.</p>
 
-<p>The spec-gen offers a <dfn>compatibility mode</dfn>, which aims to be
-compatible with the <a
-href="http://www.w3.org/Style/Group/css3-src/bin/postprocess">CSS3 module
-postprocessor</a> (within reason). This is mainly provided for the sake of
-pre-existing <a href="http://w3.org/">W3C</a> documents. The
+<p>anolis offers a <dfn>compatibility mode</dfn>, which aims to be compatible
+with the <a href="http://www.w3.org/Style/Group/css3-src/bin/postprocess">CSS3
+module postprocessor</a> (within reason). This is mainly provided for the sake
+of pre-existing <a href="http://w3.org/">W3C</a> documents. The
 <dfn><code>--w3c-compat</code></dfn> option turns on this compatibility mode,
 although specific options that turn on just one compatibility feature at a time
 are also available (and are documented below under each <span
 <code>dfn</code> element: the <span>definition</span> itself is taken from the
 <code>title</code> attribute if it is present, otherwise it is taken from the
 <span>textContent</span> property of the <code>dfn</code> element. By default,
-the spec-gen will throw a <span>fatal error</span> if a <span>term</span> is
-defined more than once: this behaviour can be turned off (causing the final
+anolis will throw a <span>fatal error</span> if a <span>term</span> is defined
+more than once: this behaviour can be turned off (causing the final
 <span>definition</span> of the <span>term</span> to be the one that is used) by
 the <dfn><code>--allow-duplicate-dfns</code></dfn> option.
 
 <p>Special thanks to Bert Bos for creating the CSS3 Module Postprocessor, on
 which this is partially based, and (with <code>--w3c-compat</code>) claims to be
 partially compatible with. Further special thanks to Bert Bos for creating a
-number of things (especially as the algorithm for finding the <span>W3C
-status</span>) that took the author of the spec-gen many hours to reverse
+number of things (especially the algorithm for finding the <span>W3C
+status</span>) that took the author of anolis many hours to reverse
 engineer.</p>
+#!/usr/bin/env python
+"""usage: anolis [options] input output
+
+Post-process a document, adding cross-references, table of contents, etc.
+"""
+
+import cProfile
+from optparse import OptionParser, SUPPRESS_HELP
+import sys
+import html5lib
+from html5lib import treebuilders, treewalkers, serializer
+import lxml.html
+from lxml import etree
+
+from anolislib import generator, utils
+
+def main():
+	# Create the options parser
+	optParser = getOptParser()
+	opts, args = optParser.parse_args()
+	
+	# Check we have enough arguments
+	if len(args) >= 2:
+		try:
+			# Get input
+			input = file(args[0], "r")
+			
+			# Parse as XML:
+			#if opts.xml:
+			if False:
+				tree = etree.parse(input)
+			# Parse as HTML using lxml.html
+			elif opts.lxml_html:
+				tree = lxml.html.parse(input)
+			# Parse as HTML using html5lib
+			else:
+				parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("lxml", etree))
+				tree = parser.parse(input)
+			
+			# Close the input file
+			input.close()
+			
+			# Remove the option we pass as an argument
+			processes = opts.processes
+			del opts.processes
+			
+			# Turn the options into a dict
+			kwargs = vars(opts)
+			
+			# Run the generator, and profile, or not, as the case may be
+			if kwargs["profile"]:
+				cProfile.runctx("gen.process(tree, processes, **kwargs)", {}, {"gen": generator, "tree": tree, "processes": processes, "kwargs": kwargs})
+			else:
+				generator.process(tree, processes, **kwargs)
+			
+			# Serialize to XML
+			#if opts.xml:
+			if False:
+				rendered = etree.tostring(tree, encoding="utf-8")
+			# Serialize to HTML using lxml.html
+			elif opts.lxml_html:
+				rendered = lxml.html.tostring(tree, encoding="utf-8")
+			# Serialize to HTML using html5lib
+			else:
+				walker = treewalkers.getTreeWalker("lxml")
+				s = serializer.htmlserializer.HTMLSerializer(**kwargs)
+				rendered = s.render(walker(tree), encoding="utf-8")
+			
+			# Get the output
+			output = file(args[1], "w")
+			
+			# Write to the output
+			output.write(rendered)
+			
+			# Close the output
+			output.close()
+		except (utils.AnolisException, IOError, etree.XMLSyntaxError), e:
+			sys.stderr.write(unicode(e) + u"\n")
+			sys.exit(1)
+	else:
+		sys.stderr.write(u"anolis expects two arguments. Use -h for help\n")
+		sys.exit(2)
+
+def getOptParser():
+	parser = OptionParser(usage = __doc__, version="%prog 1.0")
+	
+	parser.add_option("", "--enable", action="callback", callback=enable,
+		type="string", dest="processes", help="Enable the process given as the option value")
+	
+	parser.add_option("", "--disable", action="callback", callback=disable,
+		type="string", help="Disable the process given as the option value")
+	
+	#parser.add_option("", "", action="store_true",
+	#	dest="xml", help="Use an XML parser/serializer.")
+	
+	parser.add_option("", "--lxml.html", action="store_true",
+		dest="lxml_html", help="Use lxml's HTML parser/serializer.")
+	
+	parser.add_option("", "--newline-char", action="store", type="string",
+		dest="newline_char", help="Set the newline character/string used when creating new newlines. This should match the rest of the newlines in the document.")
+	
+	parser.add_option("", "--indent-char", action="store", type="string",
+		dest="indent_char", help="Set the character/string used when creating indenting new blocks of (X)HTML. This should match the rest of the indentation in the document.")
+	
+	parser.add_option("", "--force-html4-id", action="store_true",
+		dest="force_html4_id", help="Force the ID generation algorithm to create HTML 4 compliant IDs regardless of the DOCTYPE.")
+	
+	parser.add_option("", "--min-depth", action="store", type="int",
+		default=2, dest="min_depth", help="Highest ranking header to number/insert into TOC.")
+	
+	parser.add_option("", "--max-depth", action="store", type="int",
+		default=6, dest="max_depth", help="Lowest ranking header to number/insert into TOC.")
+	
+	parser.add_option("", "--allow-duplicate-dfns", action="store_true",
+		dest="allow_duplicate_dfns", help="Allow multiple definitions of terms when cross-referencing (the last instance of the term is used when referencing it).")
+	
+	parser.add_option("", "--w3c-compat", action="store_true",
+		dest="w3c_compat", help="Behave in a (mostly) compatible way to the W3C CSS WG's Postprocessor (this implies all of the other --w3c-compat options with the exception of --w3c-compat-crazy-substitution, as that is too crazy).")
+	
+	parser.add_option("", "--w3c-compat-xref-elements", action="store_true",
+		dest="w3c_compat_xref_elements", help="Uses the same list of elements to look for cross-references in as the W3C CSS WG's Postprocessor, even when the elements shouldn't semantically be used for cross-reference terms.")
+	
+	parser.add_option("", "--w3c-compat-xref-a-placement", action="store_true",
+		dest="w3c_compat_xref_a_placement", help="When cross-referencing elements apart from span, put the a element inside the element instead of outside the element.")
+	
+	parser.add_option("", "--w3c-compat-xref-normalization", action="store_true",
+		dest="w3c_compat_xref_normalization", help="Only use ASCII letters, numbers, and spaces in comparison of cross-reference terms.")
+	
+	parser.add_option("", "--w3c-compat-class-toc", action="store_true",
+		dest="w3c_compat_class_toc", help="Add @class='toc' on every ol element in the table of contents (instead of only the root ol element).")
+	
+	parser.add_option("", "--w3c-compat-substitutions", action="store_true",
+		dest="w3c_compat_substitutions", help="Do W3C specific substitutions.")
+	
+	parser.add_option("", "--w3c-compat-crazy-substitutions", action="store_true",
+		dest="w3c_compat_crazy_substitutions", help="Do crazy W3C specific substitutions, which may cause unexpected behaviour (i.e., replacing random strings within the document with no special marker).")
+	
+	parser.add_option("", "--profile", action="store_true",
+		dest="profile", help=SUPPRESS_HELP)
+	
+	parser.add_option("", "--inject-meta-charset", action="store_true",
+		dest="inject_meta_charset", help=SUPPRESS_HELP)
+	
+	parser.add_option("", "--strip-whitespace", action="store_true",
+		dest="strip_whitespace", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--omit-optional-tags", action="store_true",
+		dest="omit_optional_tags", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--quote-attr-values", action="store_true",
+		dest="quote_attr_values", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--use-best-quote-char", action="store_true",
+		dest="use_best_quote_char",	help=SUPPRESS_HELP)
+
+	parser.add_option("", "--no-minimize-boolean-attributes",
+		action="store_false", default=True,
+		dest="minimize_boolean_attributes", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--use-trailing-solidus", action="store_true",
+		dest="use_trailing_solidus", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--space-before-trailing-solidus",
+		action="store_true", default=False,
+		dest="space_before_trailing_solidus", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--escape-lt-in-attrs", action="store_true",
+		dest="escape_lt_in_attrs", help=SUPPRESS_HELP)
+
+	parser.add_option("", "--escape-rcdata", action="store_true",
+		dest="escape_rcdata", help=SUPPRESS_HELP)
+	
+	parser.set_defaults(
+		processes=set(["sub", "xref", "toc"]),
+		xml=False,
+		lxml_html=False,
+		newline_char=u"\n",
+		indent_char=u"\t",
+		force_html4_id=False,
+		min_depth=2,
+		max_depth=6,
+		allow_duplicate_dfns=False,
+		w3c_compat=False,
+		w3c_compat_xref_elements=False,
+		w3c_compat_xref_a_placement=False,
+		w3c_compat_xref_normalization=False,
+		w3c_compat_class_toc=False,
+		w3c_compat_substitutions=False,
+		w3c_compat_crazy_substitutions=False,
+		profile=False,
+		inject_meta_charset=False,
+		omit_optional_tags=False,
+		quote_attr_values=False,
+		use_best_quote_char=False,
+		minimize_boolean_attributes=False,
+		use_trailing_solidus=False,
+		space_before_trailing_solidus=False,
+		escape_lt_in_attrs=False,
+		escape_rcdata=False
+	)
+
+	return parser
+
+def enable(option, opt_str, value, parser, *args, **kwargs):
+	parser.values.processes.add(opt_str)
+
+def disable(option, opt_str, value, parser, *args, **kwargs):
+	parser.values.processes.discard(opt_str)
+
+if __name__ == "__main__":
+	main()

anolislib/__init__.py

+from generator import *

anolislib/generator.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+def process(tree, processes=set(["sub", "toc", "xref"]), **kwargs):
+	""" Process the given tree. """
+	
+	# Find number of passes to do
+	for process in processes:
+		try:
+			process_module = getattr(__import__('processes', globals(), locals(), [process], -1), process)
+		except ImportError:
+			process_module = __import__(process, globals(), locals(), [], -1)
+		
+		getattr(process_module, process)(tree, **kwargs)
Add a comment to this file

anolislib/processes/__init__.py

Empty file added.

anolislib/processes/outliner.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+from lxml import etree
+
+from anolislib import utils
+
+# Rank of heading elements (these are negative so h1 > h6)
+rank = {u"h1": -1, u"h2": -2, u"h3": -3, u"h4": -4, u"h5": -5, u"h6": -6, u"header": -1}
+
+class section(list):
+	"""Represents the section of a document."""
+	
+	header = None
+	
+	def __repr__(self):
+		return "<section %s>" % (repr(self.header))
+
+	def append(self, child):
+		list.append(self, child)
+		child.parent = self
+	
+	def extend(self, children):
+		list.extend(self, children)
+		for child in children:
+			child.parent = self
+
+class Outliner:
+	"""Build the outline of an HTML document."""
+	
+	def __init__(self, ElementTree, **kwargs):
+		self.ElementTree = ElementTree
+		self.stack = []
+		self.outlines = {}
+		self.current_outlinee = None
+		self.current_section = None
+	
+	def build(self, **kwargs):
+		for action, element in etree.iterwalk(self.ElementTree, events=("start", "end")):
+			# If the top of the stack is an element, and you are exiting that element
+			if action == "end" and self.stack and self.stack[-1] == element:
+				# Note: The element being exited is a heading content element.
+				assert element.tag in utils.heading_content
+				# Pop that element from the stack.
+				self.stack.pop()
+			
+			# If the top of the stack is a heading content element
+			elif self.stack and self.stack[-1].tag in utils.heading_content:
+				# Do nothing.
+				pass
+			
+			# When entering a sectioning content element or a sectioning root element
+			elif action == "start" and (element.tag in utils.sectioning_content or element.tag in utils.sectioning_root):
+				# If current outlinee is not null, push current outlinee onto the stack.
+				if self.current_outlinee is not None:
+					self.stack.append(self.current_outlinee)
+				# Let current outlinee be the element that is being entered.
+				self.current_outlinee = element
+				# Let current section be a newly created section for the current outlinee element.
+				self.current_section = section()
+				# Let there be a new outline for the new current outlinee, initialized with just the new current section as the only section in the outline.
+				self.outlines[self.current_outlinee] = [self.current_section]
+				
+			# When exiting a sectioning content element, if the stack is not empty
+			elif action == "end" and element.tag in utils.sectioning_content and self.stack:
+				# Pop the top element from the stack, and let the current outlinee be that element.
+				self.current_outlinee = self.stack.pop()
+				# Let current section be the last section in the outline of the current outlinee element.
+				self.current_section = self.outlines[self.current_outlinee][-1]
+				# Append the outline of the sectioning content element being exited to the current section. (This does not change which section is the last section in the outline.)
+				self.current_section += self.outlines[element]
+				
+			# When exiting a sectioning root element, if the stack is not empty
+			elif action == "end" and element.tag in utils.sectioning_root and self.stack:
+				# Pop the top element from the stack, and let the current outlinee be that element.
+				self.current_outlinee = self.stack.pop()
+				# Let current section be the last section in the outline of the current outlinee element.
+				self.current_section = self.outlines[self.current_outlinee][-1]
+				# Loop: If current section has no child sections, stop these steps.
+				while self.current_section:
+					# Let current section be the last child section of the current current section.
+					assert self.current_section != self.current_section[-1]
+					self.current_section = self.current_section[-1]
+					# Go back to the substep labeled Loop.
+					
+			# When exiting a sectioning content element or a sectioning root element
+			elif action == "end" and (element.tag in utils.sectioning_content or element.tag in utils.sectioning_root):
+				# Note: The current outlinee is the element being exited.
+				assert self.current_outlinee == element
+				# Let current section be the first section in the outline of the current outlinee element.
+				self.current_section = self.outlines[self.current_outlinee][0]
+				# Skip to the next step in the overall set of steps. (The walk is over.)
+				break
+				
+			# If the current outlinee is null.
+			elif self.current_outlinee is None:
+				# Do nothing.
+				pass
+			
+			# When entering a heading content element
+			elif action == "start" and element.tag in utils.heading_content:
+				# If the current section has no heading, let the element being entered be the heading for the current section.
+				if self.current_section.header is None:
+					self.current_section.header = element
+				
+				# Otherwise, if the element being entered has a rank equal to or greater than the heading of the last section of the outline of the current outlinee, then create a new section and append it to the outline of the current outlinee element, so that this new section is the new last section of that outline. Let current section be that new section. Let the element being entered be the new heading for the current section.
+				elif rank[element.tag] >= rank[self.outlines[self.current_outlinee][-1].header.tag]:
+					self.current_section = section()
+					self.outlines[self.current_outlinee].append(self.current_section)
+					self.current_section.header = element
+				
+				# Otherwise, run these substeps:
+				else:
+					# Let candidate section be current section.
+					candidate_section = self.current_section
+					while True:
+						# If the element being entered has a rank lower than the rank of the heading of the candidate section, then create a new section, and append it to candidate section. (This does not change which section is the last section in the outline.) Let current section be this new section. Let the element being entered be the new heading for the current section. Abort these substeps.
+						if rank[element.tag] < rank[candidate_section.header.tag]:
+							self.current_section = section()
+							candidate_section.append(self.current_section)
+							self.current_section.header = element
+							break
+						# Let new candidate section be the section that contains candidate section in the outline of current outlinee.
+						# Let candidate section be new candidate section.
+						candidate_section = candidate_section.parent
+						# Return to step 2.
+				# Push the element being entered onto the stack. (This causes the algorithm to skip any descendants of the element.)
+				self.stack.append(element)
+		
+		# If the current outlinee is null, then there was no sectioning content element or sectioning root element in the DOM. There is no outline.
+		try:
+			return self.outlines[self.current_outlinee]
+		except KeyError:
+			return None

anolislib/processes/sub.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+import re
+import time
+from lxml import etree
+from copy import deepcopy
+
+from anolislib import utils
+
+latest_version = re.compile(u"latest[%s]+version" % utils.spaceCharacters, re.IGNORECASE)
+
+w3c_tr_url_status = re.compile(r"http://www\.w3\.org/TR/[^/]*/(MO|WD|CR|PR|REC|PER|NOTE)-")
+
+year = re.compile(r"\[YEAR[^\]]*\]")
+year_sub = time.strftime(u"%Y", time.gmtime())
+year_identifier = u"[YEAR"
+
+date = re.compile(r"\[DATE[^\]]*\]")
+date_sub = time.strftime(u"%d %B %Y", time.gmtime()).lstrip(u"0")
+date_identifier = u"[DATE"
+
+cdate = re.compile(r"\[CDATE[^\]]*\]")
+cdate_sub = time.strftime(u"%Y%m%d", time.gmtime())
+cdate_identifier = u"[CDATE"
+
+title = re.compile(r"\[TITLE[^\]]*\]")
+title_identifier = u"[TITLE"
+
+status = re.compile(r"\[STATUS[^\]]*\]")
+status_identifier = u"[STATUS"
+
+longstatus = re.compile(r"\[LONGSTATUS[^\]]*\]")
+longstatus_identifier = u"[LONGSTATUS"
+longstatus_map = {
+	u"MO": u"W3C Member-only Draft",
+	u"ED": u"Editor's Draft",
+	u"WD": u"W3C Working Draft",
+	u"CR": u"W3C Candidate Recommendation",
+	u"PR": u"W3C Proposed Recommendation",
+	u"REC": u"W3C Recommendation",
+	u"PER": u"W3C Proposed Edited Recommendation",
+	u"NOTE": u"W3C Working Group Note"
+}
+
+w3c_stylesheet = re.compile(r"http://www\.w3\.org/StyleSheets/TR/W3C-[A-Z]+")
+w3c_stylesheet_identifier = u"http://www.w3.org/StyleSheets/TR/W3C-"
+
+string_subs = ((year, year_sub, year_identifier),
+               (date, date_sub, date_identifier),
+               (cdate, cdate_sub, cdate_identifier))
+
+logo = u"logo"
+logo_sub = etree.fromstring(u'<p><a href="http://www.w3.org/"><img alt="W3C" src="http://www.w3.org/Icons/w3c_home"/></a></p>')
+
+copyright = u"copyright"
+copyright_sub = etree.fromstring(u'<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &#xA9; ' + time.strftime(u"%Y", time.gmtime()) + u' <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&#xAE;</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>')
+
+basic_comment_subs = ()
+
+class sub(object):
+	"""Perform substitutions."""
+	
+	def __init__(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
+		if w3c_compat or w3c_compat_substitutions or w3c_compat_crazy_substitutions:
+			self.w3c_status = self.getW3CStatus(ElementTree, **kwargs)
+		self.stringSubstitutions(ElementTree, w3c_compat, w3c_compat_substitutions, w3c_compat_crazy_substitutions, **kwargs)
+		self.commentSubstitutions(ElementTree, w3c_compat, w3c_compat_substitutions, w3c_compat_crazy_substitutions, **kwargs)
+	
+	def stringSubstitutions(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
+		# Get doc_title from the title element
+		try:
+			doc_title = utils.textContent(ElementTree.getroot().find(u"head").find(u"title"))
+		except (AttributeError, TypeError):
+			doc_title = u""
+		
+		if w3c_compat or w3c_compat_substitutions:
+			# Get the right long status
+			doc_longstatus = longstatus_map[self.w3c_status]
+		
+		if w3c_compat_crazy_substitutions:
+			# Get the right stylesheet
+			doc_w3c_stylesheet = u"http://www.w3.org/StyleSheets/TR/W3C-" + self.w3c_status
+		
+		# Get all the subs we want
+		instance_string_subs = string_subs + ((title, doc_title, title_identifier),)
+		
+		# And even more in compat. mode
+		if w3c_compat or w3c_compat_substitutions:
+			instance_string_subs += ((status, self.w3c_status, status_identifier),
+			                         (longstatus, doc_longstatus, longstatus_identifier))
+		
+		# And more that aren't even enabled by default in compat. mode
+		if w3c_compat_crazy_substitutions:
+			instance_string_subs += ((w3c_stylesheet, doc_w3c_stylesheet, w3c_stylesheet_identifier),)
+		
+		for node in ElementTree.iter():
+			for regex, sub, identifier in instance_string_subs:
+				if node.text is not None and identifier in node.text:
+					node.text = regex.sub(sub, node.text)
+				if node.tail is not None and identifier in node.tail:
+					node.tail = regex.sub(sub, node.tail)
+				for name, value in node.attrib.items():
+					if identifier in value:
+						node.attrib[name] = regex.sub(sub, value)
+	
+	def commentSubstitutions(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
+		# Basic substitutions
+		instance_basic_comment_subs = basic_comment_subs
+		
+		# Add more basic substitutions in compat. mode
+		if w3c_compat or w3c_compat_substitutions:
+			instance_basic_comment_subs += ((logo, logo_sub),
+			                                (copyright, copyright_sub))
+		
+		# Set of nodes to remove
+		to_remove = set()
+		
+		# Link
+		in_link = False
+		for node in ElementTree.iter():
+			if in_link:
+				if node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == u"end-link":
+					if node.getparent() is not link_parent:
+						raise DifferentParentException, u"begin-link and end-link have different parents"
+					utils.removeInteractiveContentChildren(link)
+					link.set(u"href", utils.textContent(link))
+					in_link = False
+				else:
+					if node.getparent() is link_parent:
+						link.append(deepcopy(node))
+					to_remove.add(node)
+			elif node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == u"begin-link":
+				link_parent = node.getparent()
+				in_link = True
+				link = etree.Element(u"a")
+				link.text = node.tail
+				node.tail = None
+				node.addnext(link)
+		
+		# Basic substitutions
+		for comment, sub in instance_basic_comment_subs:
+			begin_sub = u"begin-" + comment
+			end_sub = u"end-" + comment
+			in_sub = False
+			for node in ElementTree.iter():
+				if in_sub:
+					if node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == end_sub:
+						if node.getparent() is not sub_parent:
+							raise DifferentParentException, u"%s and %s have different parents" % begin_sub, end_sub
+						in_sub = False
+					else:
+						to_remove.add(node)
+				elif node.tag is etree.Comment:
+					if node.text.strip(utils.spaceCharacters) == begin_sub:
+						sub_parent = node.getparent()
+						in_sub = True
+						node.tail = None
+						node.addnext(deepcopy(sub))
+					elif node.text.strip(utils.spaceCharacters) == comment:
+						node.addprevious(etree.Comment(begin_sub))
+						node.addprevious(deepcopy(sub))
+						node.addprevious(etree.Comment(end_sub))
+						node.getprevious().tail = node.tail
+						to_remove.add(node)
+		
+		# Remove nodes
+		for node in to_remove:
+			node.getparent().remove(node)
+	
+	def getW3CStatus(self, ElementTree, **kwargs):
+		# Get all text nodes that contain case-insensitively "latest version" with any amount of whitespace inside the phrase, or contain http://www.w3.org/TR/
+		for text in ElementTree.xpath(u"//text()[contains(translate(., 'LATEST', 'latest'), 'latest') and contains(translate(., 'VERSION', 'version'), 'version') or contains(., 'http://www.w3.org/TR/')]"):
+			if latest_version.search(text):
+				return u"ED"
+			elif w3c_tr_url_status.search(text):
+				return w3c_tr_url_status.search(text).group(1)
+		# Didn't find any status, return the default (ED)
+		else:
+			return u"ED"
+
+class DifferentParentException(utils.AnolisException):
+	"""begin-link and end-link do not have the same parent."""
+	pass

anolislib/processes/toc.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+from lxml import etree
+from copy import deepcopy
+
+from anolislib import utils
+from anolislib.processes import outliner
+
+# These are just the non-interactive elements to be removed
+remove_elements_from_toc = frozenset([u"dfn",])
+# These are, however, all the attributes to be removed
+remove_attributes_from_toc = frozenset([u"id",])
+
+class toc(object):
+	"""Build and add TOC."""
+	
+	toc = None
+	
+	def __init__(self, ElementTree, **kwargs):
+		self.toc = etree.Element(u"ol", {u"class": u"toc"})
+		self.buildToc(ElementTree, **kwargs)
+		self.addToc(ElementTree, **kwargs)
+	
+	def buildToc(self, ElementTree, min_depth = 2, max_depth = 6, w3c_compat = False, w3c_compat_class_toc = False, **kwargs):
+		# Build the outline of the document
+		outline_creator = outliner.Outliner(ElementTree, **kwargs)
+		outline = outline_creator.build(**kwargs)
+		
+		# Get a list of all the top level sections, and their depth (0)
+		sections = [(section, 0) for section in reversed(outline)]
+		
+		# Numbering
+		num = []
+		
+		# Set of elements to remove (due to odd behaviour of Element.iter() this has to be done afterwards)
+		to_remove = set()
+		
+		# Loop over all sections in a DFS
+		while sections:
+			# Get the section and depth at the end of list
+			section, depth = sections.pop()
+					
+			# If we have a header, regardless of how deep we are
+			if section.header is not None:
+				# Get the element that represents the section header's text
+				if section.header.tag == u"header":
+					i = 1
+					while i <= 6:
+						section_header_text_element = section.header.find(u"h" + unicode(i))
+						if section_header_text_element is not None:
+							break
+					else:
+						section_header_text_element = None
+				else:
+					section_header_text_element = section.header
+			else:
+				section_header_text_element = None
+			
+			# If we have a section heading text element, regardless of depth
+			if section_header_text_element is not None:
+				# Remove any existing number
+				for element in section_header_text_element.iter(u"span"):
+					if utils.elementHasClass(element, u"secno"):
+						# Preserve the element tail
+						if element.tail is not None:
+							if element.getprevious() is not None:
+								if element.getprevious().tail is None:
+									element.getprevious().tail = element.tail
+								else:
+									element.getprevious().tail += element.tail
+							else:
+								if element.getparent().text is None:
+									element.getparent().text = element.tail
+								else:
+									element.getparent().text += element.tail
+						# Remove the element
+						to_remove.add(element)
+			
+			# Check we're in the valid depth range (min/max_depth are 1 based, depth is 0 based)
+			if depth >= min_depth - 1 and depth <= max_depth - 1:
+				# Calculate the corrected depth (i.e., the actual depth within the numbering/TOC)
+				corrected_depth = depth - min_depth + 1
+				
+				# Numbering:
+				# No children, no sibling, move back to parent's sibling
+				if corrected_depth + 1 < len(num):
+					del num[corrected_depth + 1:]
+				# Children
+				elif corrected_depth == len(num):
+					num.append(0)
+				
+				# Increment the current section's number
+				if section_header_text_element is not None and not utils.elementHasClass(section_header_text_element, u"no-num") or section_header_text_element is None and section:
+					num[-1] += 1
+				
+				# Get the current TOC section for this depth, and add another item to it
+				if section_header_text_element is not None and not utils.elementHasClass(section_header_text_element, u"no-toc") or section_header_text_element is None and section:
+					# Find the appropriate section of the TOC 
+					i = 0
+					toc_section = self.toc
+					while i < corrected_depth:
+						try:
+							# If the final li has no children, or the last children isn't an ol element
+							if len(toc_section[-1]) == 0 or toc_section[-1][-1].tag != u"ol":
+								toc_section[-1].append(etree.Element(u"ol"))
+								self.indentNode(toc_section[-1][-1], (i + 1) * 2, **kwargs)
+								if w3c_compat or w3c_compat_class_toc:
+									toc_section[-1][-1].set(u"class", u"toc")
+						except IndexError:
+							# If the current ol has no li in it
+							toc_section.append(etree.Element(u"li"))
+							self.indentNode(toc_section[0], (i + 1) * 2 - 1, **kwargs)
+							toc_section[0].append(etree.Element(u"ol"))
+							self.indentNode(toc_section[0][0], (i + 1) * 2, **kwargs)
+							if w3c_compat or w3c_compat_class_toc:
+								toc_section[0][0].set(u"class", u"toc")
+						# TOC Section is now the final child (ol) of the final item (li) in the previous section
+						assert toc_section[-1].tag == u"li"
+						assert toc_section[-1][-1].tag == u"ol"
+						toc_section = toc_section[-1][-1]
+						i += 1
+					# Add the current item to the TOC
+					item = etree.Element(u"li")
+					toc_section.append(item)
+					self.indentNode(item, (i + 1) * 2 - 1, **kwargs)
+					
+				# If we have a header
+				if section_header_text_element is not None:
+					# Remove all the elements in the list of nodes to remove (so that the removal of existing numbers doesn't lead to crazy IDs)
+					for element in to_remove:
+						element.getparent().remove(element)
+					to_remove = set()
+					
+					# Add ID to header
+					id = utils.generateID(section_header_text_element, **kwargs)
+					if section_header_text_element.get(u"id") is not None:
+						del section_header_text_element.attrib[u"id"]
+					section.header.set(u"id", id)
+					
+					# Add number, if @class doesn't contain no-num
+					if not utils.elementHasClass(section_header_text_element, u"no-num"):
+						section_header_text_element[0:0] = [etree.Element(u"span", {u"class": u"secno"})]
+						section_header_text_element[0].tail = section_header_text_element.text
+						section_header_text_element.text = None
+						section_header_text_element[0].text = u".".join(map(unicode, num))
+						section_header_text_element[0].text += u" "
+					# Add to TOC, if @class doesn't contain no-toc
+					if not utils.elementHasClass(section_header_text_element, u"no-toc"):
+						link = deepcopy(section_header_text_element)
+						item.append(link)
+						# Make it link to the header
+						link.tag = u"a"
+						link.set(u"href", u"#" + id)
+						# Remove interactive content child elements
+						utils.removeInteractiveContentChildren(link)
+						# Remove other child elements
+						for element_name in remove_elements_from_toc:
+							# Iterate over all the desendants of the new link with that element name
+							for element in link.iterdescendants(element_name):
+								# Copy content, to prepare for the node being removed
+								utils.copyContentForRemoval(element)
+								# Add the element of the list of elements to remove
+								to_remove.add(element)
+						# Remove unwanted attributes
+						for element in link.iter(tag=etree.Element):
+							for attribute_name in remove_attributes_from_toc:
+								if element.get(attribute_name) is not None:
+									del element.attrib[attribute_name]
+						# We don't want the old tail (or any tail, for that matter)
+						link.tail = None
+			# Add subsections in reverse order (so the next one is executed next) with a higher depth value
+			sections.extend((child_section, depth + 1) for child_section in reversed(section))
+		# Remove all the elements in the list of nodes to remove
+		for element in to_remove:
+			element.getparent().remove(element)
+	
+	def addToc(self, ElementTree, **kwargs):
+		to_remove = set()
+		in_toc = False
+		for node in ElementTree.iter():
+			if in_toc:
+				if node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == u"end-toc":
+					if node.getparent() is not toc_parent:
+						raise DifferentParentException, u"begin-toc and end-toc have different parents"
+					in_toc = False
+				else:
+					to_remove.add(node)
+			elif node.tag is etree.Comment:
+				if node.text.strip(utils.spaceCharacters) == u"begin-toc":
+					toc_parent = node.getparent()
+					in_toc = True
+					node.tail = None
+					node.addnext(deepcopy(self.toc))
+					self.indentNode(node.getnext(), 0, **kwargs)
+				elif node.text.strip(utils.spaceCharacters) == u"toc":
+					node.addprevious(etree.Comment(u"begin-toc"))
+					self.indentNode(node.getprevious(), 0, **kwargs)
+					node.addprevious(deepcopy(self.toc))
+					self.indentNode(node.getprevious(), 0, **kwargs)
+					node.addprevious(etree.Comment(u"end-toc"))
+					self.indentNode(node.getprevious(), 0, **kwargs)
+					node.getprevious().tail = node.tail
+					to_remove.add(node)
+		for node in to_remove:
+			node.getparent().remove(node)
+	
+	def indentNode(self, node, indent=0, newline_char=u"\n", indent_char=u"\t", **kwargs):
+		whitespace = newline_char + indent_char * indent
+		if node.getprevious() is not None:
+			if node.getprevious().tail is None:
+				node.getprevious().tail = whitespace
+			else:
+				node.getprevious().tail += whitespace
+		else:
+			if node.getparent().text is None:
+				node.getparent().text = whitespace
+			else:
+				node.getparent().text += whitespace
+
+class DifferentParentException(utils.AnolisException):
+	"""begin-toc and end-toc do not have the same parent."""
+	pass

anolislib/processes/xref.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+import re
+from lxml import etree
+from copy import deepcopy
+
+from anolislib import utils
+
+instance_elements = frozenset([u"span", u"abbr", u"code", u"var", u"i"])
+w3c_instance_elements = frozenset([u"abbr", u"acronym", u"b", u"bdo", u"big", u"code", u"del", u"em", u"i", u"ins", u"kbd", u"label", u"legend", u"q", u"samp", u"small", u"span", u"strong", u"sub", u"sup", u"tt", u"var"])
+
+# Instances cannot be in the stack with any of these element, or with interactive elements
+instance_not_in_stack_with = frozenset([u"dfn",])
+
+non_alphanumeric_spaces = re.compile(r"[^a-zA-Z0-9 \-]+")
+
+class xref(object):
+	"""Add cross-references."""
+	
+	def __init__(self, ElementTree, **kwargs):
+		self.dfns = {}
+		self.buildReferences(ElementTree, **kwargs)
+		self.addReferences(ElementTree, **kwargs)
+	
+	def buildReferences(self, ElementTree, allow_duplicate_dfns=False, **kwargs):
+		for dfn in ElementTree.iter(u"dfn"):
+			term = self.getTerm(dfn, **kwargs)
+			
+			if len(term) > 0:
+				if not allow_duplicate_dfns and term in self.dfns:
+					raise DuplicateDfnException, u'The term "%s" is defined more than once' % term
+				
+				link_to = dfn
+				
+				for parent_element in dfn.iterancestors(tag=etree.Element):
+					if parent_element.tag in utils.heading_content:
+						link_to = parent_element
+						break
+				
+				id = utils.generateID(link_to, **kwargs)
+				
+				link_to.set(u"id", id)
+				
+				self.dfns[term] = id
+	
+	def addReferences(self, ElementTree, w3c_compat = False, w3c_compat_xref_elements = False, w3c_compat_xref_a_placement = False, **kwargs):
+		for element in ElementTree.iter(tag=etree.Element):
+			if element.tag in instance_elements or (w3c_compat or w3c_compat_xref_elements) and element.tag in w3c_instance_elements:
+				term = self.getTerm(element, w3c_compat=w3c_compat, **kwargs)
+				
+				if term in self.dfns:
+					goodParentingAndChildren = True
+					
+					for parent_element in element.iterancestors(tag=etree.Element):
+						if parent_element.tag in instance_not_in_stack_with or utils.isInteractiveContent(parent_element):
+							goodParentingAndChildren = False
+							break
+					else:
+						for child_element in element.iterdescendants(tag=etree.Element):
+							if child_element.tag in instance_not_in_stack_with or utils.isInteractiveContent(child_element):
+								goodParentingAndChildren = False
+								break
+					
+					if goodParentingAndChildren:
+						if element.tag == u"span":
+							element.tag = u"a"
+							element.set(u"href", u"#" + self.dfns[term])
+						else:
+							link = etree.Element(u"a", {u"href": u"#" + self.dfns[term]})
+							if w3c_compat or w3c_compat_xref_a_placement:
+								for node in element:
+									link.append(node)
+								link.text = element.text
+								element.text = None
+								element.append(link)
+							else:
+								element.addprevious(link)
+								link.append(element)
+								link.tail = link[0].tail
+								link[0].tail = None
+	
+	def getTerm(self, element, w3c_compat = False, w3c_compat_xref_normalization = False, **kwargs):
+		if element.get(u"title") is not None:
+			term = element.get(u"title")
+		else:
+			term = utils.textContent(element)
+		
+		term = term.strip(utils.spaceCharacters).lower()
+		
+		term = utils.spacesRegex.sub(u" ", term)
+		
+		if w3c_compat or w3c_compat_xref_normalization:
+			term = non_alphanumeric_spaces.sub(u"", term)
+		
+		return term
+
+class DuplicateDfnException(utils.AnolisException):
+	"""Term already defined."""
+	pass

anolislib/utils.py

+# coding=UTF-8
+# Copyright (c) 2008 Geoffrey Sneddon
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+import re
+import sys
+from lxml import etree
+
+from html5lib.constants import spaceCharacters
+
+ids = {}
+
+spaceCharacters = u"".join(spaceCharacters)
+spacesRegex = re.compile(u"[%s]+" % spaceCharacters)
+
+heading_content = frozenset([u"h1", u"h2", u"h3", u"h4", u"h5", u"h6", u"header"])
+sectioning_content = frozenset([u"body", u"section", u"nav", u"article", u"aside"])
+sectioning_root = frozenset([u"blockquote", u"figure", u"td", u"datagrid"])
+
+always_interactive_content = frozenset([u"a", u"bb", u"details", u"datagrid"])
+media_elements = frozenset([u"audio", u"video"])
+
+non_sgml_name = re.compile("[^A-Za-z0-9_:.]+")
+
+if sys.maxunicode == 0xFFFF:
+	# UTF-16 Python
+	non_ifragment = re.compile(u"([\u0000-\u0020\u0022\u0023\u0025\\\u002D\u003C\u003E\u005B-\u005E\u0060\u007B-\u007D\u007F-\u0099\uD800-\uF8FF\uFDD0-\uFDDF\uFFF0-\uFFFF]|\U0001FFFE|\U0001FFFF|\U0002FFFE|\U0002FFFF|\U0003FFFE|\U0003FFFF|\U0004FFFE|\U0004FFFF|\U0005FFFE|\U0005FFFF|\U0006FFFE|\U0006FFFF|\U0007FFFE|\U0007FFFF|\U0008FFFE|\U0008FFFF|\U0009FFFE|\U0009FFFF|\U000AFFFE|\U000AFFFF|\U000BFFFE|\U000BFFFF|\U000CFFFE|\U000CFFFF|\uDB3F[\uDFFE-\uDFFF]|[\uDB40-\uDB43][\uDC00-\uDFFF]|\uDB7F[\uDFFE-\uDFFF]|[\uDB80-\uDBFF][\uDC00-\uDFFF])+")
+else:
+	# UTF-32 Python
+	non_ifragment = re.compile(u"[^A-Za-z0-9._~!$&'()*+,;=:@/?\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF\U00010000-\U0001FFFD\U00020000-\U0002FFFD\U00030000-\U0003FFFD\U00040000-\U0004FFFD\U00050000-\U0005FFFD\U00060000-\U0006FFFD\U00070000-\U0007FFFD\U00080000-\U0008FFFD\U00090000-\U0009FFFD\U000A0000-\U000AFFFD\U000B0000-\U000BFFFD\U000C0000-\U000CFFFD\U000D0000-\U000DFFFD\U000E1000-\U000EFFFD]+")
+
+def splitOnSpaces(string):
+	return spacesRegex.split(string)
+
+def elementHasClass(Element, class_name):
+	if Element.get(u"class") and class_name in splitOnSpaces(Element.get(u"class")):
+		return True
+	else:
+		return False
+
+def generateID(Element, force_html4_id=False, **kwargs):
+	if Element.get(u"id") is not None:
+		return Element.get(u"id")
+	elif Element.get(u"title") is not None and Element.get(u"title").strip(spaceCharacters) is not u"":
+		source = Element.get(u"title")
+	else:
+		source = textContent(Element)
+	
+	source = source.strip(spaceCharacters).lower()
+	
+	if source == u"":
+		source = u"generatedID"
+	elif force_html4_id or Element.getroottree().docinfo.public_id in \
+		(u"-//W3C//DTD HTML 4.0//EN",
+		 u"-//W3C//DTD HTML 4.0 Transitional//EN",
+		 u"-//W3C//DTD HTML 4.0 Frameset//EN",
+		 u"-//W3C//DTD HTML 4.01//EN",
+		 u"-//W3C//DTD HTML 4.01 Transitional//EN",
+		 u"-//W3C//DTD HTML 4.01 Frameset//EN",
+		 u"ISO/IEC 15445:2000//DTD HyperText Markup Language//EN",
+		 u"ISO/IEC 15445:2000//DTD HTML//EN",
+		 u"-//W3C//DTD XHTML 1.0 Strict//EN",
+		 u"-//W3C//DTD XHTML 1.0 Transitional//EN",
+		 u"-//W3C//DTD XHTML 1.0 Frameset//EN",
+		 u"-//W3C//DTD XHTML 1.1//EN"):
+		source = non_sgml_name.sub(u"-", source).strip(u"-")
+		try:
+			if not source[0].isalpha():
+				source = u"x" + source
+		except IndexError:
+			source = u"generatedID"
+	else:
+		source = non_ifragment.sub(u"-", source).strip(u"-")
+	
+	# Initally set the id to the source
+	id = source
+	
+	i = 0
+	while getElementById(Element.getroottree().getroot(), id) is not None:
+		id = source + u"-" + unicode(i)
+		i += 1
+	
+	ids[Element.getroottree().getroot()][id] = Element
+	
+	return id
+
+def textContent(Element):
+	return etree.tostring(Element, encoding=unicode, method='text', with_tail=False)
+
+def getElementById(base, id):
+	if base in ids:
+		try:
+			return ids[base][id]
+		except KeyError:
+			return None
+	else:
+		ids[base] = {}
+		for element in base.iter(tag=etree.Element):
+			if element.get(u"id"):
+				ids[base][element.get(u"id")] = element
+		return getElementById(base, id)
+
+def escapeXPathString(string):
+	return u"concat('', '%s')" % string.replace(u"'", u"', \"'\", '")
+
+def removeInteractiveContentChildren(element):
+	# Set of elements to remove
+	to_remove = set()
+	
+	# Iter over decendants of element
+	for child in element.iterdescendants(etree.Element):
+		if isInteractiveContent(child):
+			# Copy content, to prepare for the node being removed
+			copyContentForRemoval(child)
+			# Add the element of the list of elements to remove
+			to_remove.add(child)
+	
+	# Remove all elements to be removed
+	for element in to_remove:
+		element.getparent().remove(element)
+
+def isInteractiveContent(element):
+	if element.tag in always_interactive_content \
+	or element.tag in media_elements and element.get(u"controls") is not None \
+	or element.tag == u"menu" and element.get(u"type") is not None and element.get(u"type").lower() == u"toolbar":
+		return True
+	else:
+		return False
+
+def copyContentForRemoval(node):
+	# Preserve the text, if it is an element
+	if isinstance(node.tag, basestring) and node.text is not None:
+		if node.getprevious() is not None:
+			if node.getprevious().tail is None:
+				node.getprevious().tail = node.text
+			else:
+				node.getprevious().tail += node.text
+		else:
+			if node.getparent().text is None:
+				node.getparent().text = node.text
+			else:
+				node.getparent().text += node.text
+	# Re-parent all the children of the element we're removing
+	for child in node:
+		node.addprevious(child)
+	# Preserve the element tail
+	if node.tail is not None:
+		if node.getprevious() is not None:
+			if node.getprevious().tail is None:
+				node.getprevious().tail = node.tail
+			else:
+				node.getprevious().tail += node.tail
+		else:
+			if node.getparent().text is None:
+				node.getparent().text = node.tail
+			else:
+				node.getparent().text += node.tail
+						
+class AnolisException(Exception):
+	"""Generic anolis error."""
+	pass
 
 from lxml import etree
 
-from specGen import generator
+from anolislib import generator
 
 def get_files(*args):
 	return glob.glob(os.path.join(*args))
 				# Get the expected result
 				expected = open(file_name[:-9] + ".html", "r")
 				
-				# Run the spec-gen
+				# Run anolis
 				generator.process(tree)
 				
 				# Get the output
 from distutils.core import setup
 
-setup(name = "specGen",
+setup(name = "anolislib",
 	license="""MIT""",
 	version = "1.0",
 	author = "Geoffrey Sneddon",
 	author_email = "geoffers@gmail.com",
-	packages = ["specGen", "specGen/processes"],
-	scripts = ["spec-gen"],
+	packages = ["anolislib", "anolislib/processes"],
+	scripts = ["anolis"],
 	)

spec-gen

-#!/usr/bin/env python
-"""usage: spec-gen [options] input output
-
-Post-process a document, adding cross-references, table of contents, etc.
-"""
-
-import cProfile
-from optparse import OptionParser, SUPPRESS_HELP
-import sys
-import html5lib
-from html5lib import treebuilders, treewalkers, serializer
-import lxml.html
-from lxml import etree
-
-from specGen import generator, utils
-
-def main():
-	# Create the options parser
-	optParser = getOptParser()
-	opts, args = optParser.parse_args()
-	
-	# Check we have enough arguments
-	if len(args) >= 2:
-		try:
-			# Get input
-			input = file(args[0], "r")
-			
-			# Parse as XML:
-			#if opts.xml:
-			if False:
-				tree = etree.parse(input)
-			# Parse as HTML using lxml.html
-			elif opts.lxml_html:
-				tree = lxml.html.parse(input)
-			# Parse as HTML using html5lib
-			else:
-				parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("lxml", etree))
-				tree = parser.parse(input)
-			
-			# Close the input file
-			input.close()
-			
-			# Remove the option we pass as an argument
-			processes = opts.processes
-			del opts.processes
-			
-			# Turn the options into a dict
-			kwargs = vars(opts)
-			
-			# Run the generator, and profile, or not, as the case may be
-			if kwargs["profile"]:
-				cProfile.runctx("gen.process(tree, processes, **kwargs)", {}, {"gen": generator, "tree": tree, "processes": processes, "kwargs": kwargs})
-			else:
-				generator.process(tree, processes, **kwargs)
-			
-			# Serialize to XML
-			#if opts.xml:
-			if False:
-				rendered = etree.tostring(tree, encoding="utf-8")
-			# Serialize to HTML using lxml.html
-			elif opts.lxml_html:
-				rendered = lxml.html.tostring(tree, encoding="utf-8")
-			# Serialize to HTML using html5lib
-			else:
-				walker = treewalkers.getTreeWalker("lxml")
-				s = serializer.htmlserializer.HTMLSerializer(**kwargs)
-				rendered = s.render(walker(tree), encoding="utf-8")
-			
-			# Get the output
-			output = file(args[1], "w")
-			
-			# Write to the output
-			output.write(rendered)
-			
-			# Close the output
-			output.close()
-		except (utils.SpecGenException, IOError, etree.XMLSyntaxError), e:
-			sys.stderr.write(unicode(e) + u"\n")
-			sys.exit(1)
-	else:
-		sys.stderr.write(u"spec-gen expects two arguments. Use -h for help\n")
-		sys.exit(2)
-
-def getOptParser():
-	parser = OptionParser(usage = __doc__, version="%prog 1.0")
-	
-	parser.add_option("", "--enable", action="callback", callback=enable,
-		type="string", dest="processes", help="Enable the process given as the option value")
-	
-	parser.add_option("", "--disable", action="callback", callback=disable,
-		type="string", help="Disable the process given as the option value")
-	
-	#parser.add_option("", "", action="store_true",
-	#	dest="xml", help="Use an XML parser/serializer.")
-	
-	parser.add_option("", "--lxml.html", action="store_true",
-		dest="lxml_html", help="Use lxml's HTML parser/serializer.")
-	
-	parser.add_option("", "--newline-char", action="store", type="string",
-		dest="newline_char", help="Set the newline character/string used when creating new newlines. This should match the rest of the newlines in the document.")
-	
-	parser.add_option("", "--indent-char", action="store", type="string",
-		dest="indent_char", help="Set the character/string used when creating indenting new blocks of (X)HTML. This should match the rest of the indentation in the document.")
-	
-	parser.add_option("", "--force-html4-id", action="store_true",
-		dest="force_html4_id", help="Force the ID generation algorithm to create HTML 4 compliant IDs regardless of the DOCTYPE.")
-	
-	parser.add_option("", "--min-depth", action="store", type="int",
-		default=2, dest="min_depth", help="Highest ranking header to number/insert into TOC.")
-	
-	parser.add_option("", "--max-depth", action="store", type="int",
-		default=6, dest="max_depth", help="Lowest ranking header to number/insert into TOC.")
-	
-	parser.add_option("", "--allow-duplicate-dfns", action="store_true",
-		dest="allow_duplicate_dfns", help="Allow multiple definitions of terms when cross-referencing (the last instance of the term is used when referencing it).")
-	
-	parser.add_option("", "--w3c-compat", action="store_true",
-		dest="w3c_compat", help="Behave in a (mostly) compatible way to the W3C CSS WG's Postprocessor (this implies all of the other --w3c-compat options with the exception of --w3c-compat-crazy-substitution, as that is too crazy).")
-	
-	parser.add_option("", "--w3c-compat-xref-elements", action="store_true",
-		dest="w3c_compat_xref_elements", help="Uses the same list of elements to look for cross-references in as the W3C CSS WG's Postprocessor, even when the elements shouldn't semantically be used for cross-reference terms.")
-	
-	parser.add_option("", "--w3c-compat-xref-a-placement", action="store_true",
-		dest="w3c_compat_xref_a_placement", help="When cross-referencing elements apart from span, put the a element inside the element instead of outside the element.")
-	
-	parser.add_option("", "--w3c-compat-xref-normalization", action="store_true",
-		dest="w3c_compat_xref_normalization", help="Only use ASCII letters, numbers, and spaces in comparison of cross-reference terms.")
-	
-	parser.add_option("", "--w3c-compat-class-toc", action="store_true",
-		dest="w3c_compat_class_toc", help="Add @class='toc' on every ol element in the table of contents (instead of only the root ol element).")
-	
-	parser.add_option("", "--w3c-compat-substitutions", action="store_true",
-		dest="w3c_compat_substitutions", help="Do W3C specific substitutions.")
-	
-	parser.add_option("", "--w3c-compat-crazy-substitutions", action="store_true",
-		dest="w3c_compat_crazy_substitutions", help="Do crazy W3C specific substitutions, which may cause unexpected behaviour (i.e., replacing random strings within the document with no special marker).")
-	
-	parser.add_option("", "--profile", action="store_true",
-		dest="profile", help=SUPPRESS_HELP)
-	
-	parser.add_option("", "--inject-meta-charset", action="store_true",
-		dest="inject_meta_charset", help=SUPPRESS_HELP)
-	
-	parser.add_option("", "--strip-whitespace", action="store_true",
-		dest="strip_whitespace", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--omit-optional-tags", action="store_true",
-		dest="omit_optional_tags", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--quote-attr-values", action="store_true",
-		dest="quote_attr_values", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--use-best-quote-char", action="store_true",
-		dest="use_best_quote_char",	help=SUPPRESS_HELP)
-
-	parser.add_option("", "--no-minimize-boolean-attributes",
-		action="store_false", default=True,
-		dest="minimize_boolean_attributes", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--use-trailing-solidus", action="store_true",
-		dest="use_trailing_solidus", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--space-before-trailing-solidus",
-		action="store_true", default=False,
-		dest="space_before_trailing_solidus", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--escape-lt-in-attrs", action="store_true",
-		dest="escape_lt_in_attrs", help=SUPPRESS_HELP)
-
-	parser.add_option("", "--escape-rcdata", action="store_true",
-		dest="escape_rcdata", help=SUPPRESS_HELP)
-	
-	parser.set_defaults(
-		processes=set(["sub", "xref", "toc"]),
-		xml=False,
-		lxml_html=False,
-		newline_char=u"\n",
-		indent_char=u"\t",
-		force_html4_id=False,
-		min_depth=2,
-		max_depth=6,
-		allow_duplicate_dfns=False,
-		w3c_compat=False,
-		w3c_compat_xref_elements=False,
-		w3c_compat_xref_a_placement=False,
-		w3c_compat_xref_normalization=False,
-		w3c_compat_class_toc=False,
-		w3c_compat_substitutions=False,
-		w3c_compat_crazy_substitutions=False,
-		profile=False,
-		inject_meta_charset=False,
-		omit_optional_tags=False,
-		quote_attr_values=False,
-		use_best_quote_char=False,
-		minimize_boolean_attributes=False,
-		use_trailing_solidus=False,
-		space_before_trailing_solidus=False,
-		escape_lt_in_attrs=False,
-		escape_rcdata=False
-	)
-
-	return parser
-
-def enable(option, opt_str, value, parser, *args, **kwargs):
-	parser.values.processes.add(opt_str)
-
-def disable(option, opt_str, value, parser, *args, **kwargs):
-	parser.values.processes.discard(opt_str)
-
-if __name__ == "__main__":
-	main()

specGen/__init__.py

-from generator import *

specGen/generator.py

-# coding=UTF-8
-# Copyright (c) 2008 Geoffrey Sneddon
-# 
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-# 
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-# 
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-def process(tree, processes=set(["sub", "toc", "xref"]), **kwargs):
-	""" Process the given tree. """
-	
-	# Find number of passes to do
-	for process in processes:
-		try:
-			process_module = getattr(__import__('processes', globals(), locals(), [process], -1), process)
-		except ImportError:
-			process_module = __import__(process, globals(), locals(), [], -1)
-		
-		getattr(process_module, process)(tree, **kwargs)
Add a comment to this file

specGen/processes/__init__.py

Empty file removed.

specGen/processes/outliner.py

-# coding=UTF-8
-# Copyright (c) 2008 Geoffrey Sneddon
-# 
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-# 
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-# 
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-from lxml import etree
-
-from specGen import utils
-
-# Rank of heading elements (these are negative so h1 > h6)
-rank = {u"h1": -1, u"h2": -2, u"h3": -3, u"h4": -4, u"h5": -5, u"h6": -6, u"header": -1}
-
-class section(list):
-	"""Represents the section of a document."""
-	
-	header = None
-	
-	def __repr__(self):
-		return "<section %s>" % (repr(self.header))
-
-	def append(self, child):
-		list.append(self, child)
-		child.parent = self
-	
-	def extend(self, children):
-		list.extend(self, children)
-		for child in children:
-			child.parent = self
-
-class Outliner:
-	"""Build the outline of an HTML document."""
-	
-	def __init__(self, ElementTree, **kwargs):
-		self.ElementTree = ElementTree
-		self.stack = []
-		self.outlines = {}
-		self.current_outlinee = None
-		self.current_section = None
-	
-	def build(self, **kwargs):
-		for action, element in etree.iterwalk(self.ElementTree, events=("start", "end")):
-			# If the top of the stack is an element, and you are exiting that element
-			if action == "end" and self.stack and self.stack[-1] == element:
-				# Note: The element being exited is a heading content element.
-				assert element.tag in utils.heading_content
-				# Pop that element from the stack.
-				self.stack.pop()
-			
-			# If the top of the stack is a heading content element
-			elif self.stack and self.stack[-1].tag in utils.heading_content:
-				# Do nothing.
-				pass
-			
-			# When entering a sectioning content element or a sectioning root element
-			elif action == "start" and (element.tag in utils.sectioning_content or element.tag in utils.sectioning_root):
-				# If current outlinee is not null, push current outlinee onto the stack.
-				if self.current_outlinee is not None:
-					self.stack.append(self.current_outlinee)
-				# Let current outlinee be the element that is being entered.
-				self.current_outlinee = element
-				# Let current section be a newly created section for the current outlinee element.
-				self.current_section = section()
-				# Let there be a new outline for the new current outlinee, initialized with just the new current section as the only section in the outline.
-				self.outlines[self.current_outlinee] = [self.current_section]
-				
-			# When exiting a sectioning content element, if the stack is not empty
-			elif action == "end" and element.tag in utils.sectioning_content and self.stack:
-				# Pop the top element from the stack, and let the current outlinee be that element.
-				self.current_outlinee = self.stack.pop()
-				# Let current section be the last section in the outline of the current outlinee element.
-				self.current_section = self.outlines[self.current_outlinee][-1]
-				# Append the outline of the sectioning content element being exited to the current section. (This does not change which section is the last section in the outline.)
-				self.current_section += self.outlines[element]
-				
-			# When exiting a sectioning root element, if the stack is not empty
-			elif action == "end" and element.tag in utils.sectioning_root and self.stack:
-				# Pop the top element from the stack, and let the current outlinee be that element.
-				self.current_outlinee = self.stack.pop()
-				# Let current section be the last section in the outline of the current outlinee element.
-				self.current_section = self.outlines[self.current_outlinee][-1]
-				# Loop: If current section has no child sections, stop these steps.
-				while self.current_section:
-					# Let current section be the last child section of the current current section.
-					assert self.current_section != self.current_section[-1]
-					self.current_section = self.current_section[-1]
-					# Go back to the substep labeled Loop.
-					
-			# When exiting a sectioning content element or a sectioning root element
-			elif action == "end" and (element.tag in utils.sectioning_content or element.tag in utils.sectioning_root):
-				# Note: The current outlinee is the element being exited.
-				assert self.current_outlinee == element
-				# Let current section be the first section in the outline of the current outlinee element.
-				self.current_section = self.outlines[self.current_outlinee][0]
-				# Skip to the next step in the overall set of steps. (The walk is over.)
-				break
-				
-			# If the current outlinee is null.
-			elif self.current_outlinee is None:
-				# Do nothing.
-				pass
-			
-			# When entering a heading content element
-			elif action == "start" and element.tag in utils.heading_content:
-				# If the current section has no heading, let the element being entered be the heading for the current section.
-				if self.current_section.header is None:
-					self.current_section.header = element
-				
-				# Otherwise, if the element being entered has a rank equal to or greater than the heading of the last section of the outline of the current outlinee, then create a new section and append it to the outline of the current outlinee element, so that this new section is the new last section of that outline. Let current section be that new section. Let the element being entered be the new heading for the current section.
-				elif rank[element.tag] >= rank[self.outlines[self.current_outlinee][-1].header.tag]:
-					self.current_section = section()
-					self.outlines[self.current_outlinee].append(self.current_section)
-					self.current_section.header = element
-				
-				# Otherwise, run these substeps:
-				else:
-					# Let candidate section be current section.
-					candidate_section = self.current_section
-					while True:
-						# If the element being entered has a rank lower than the rank of the heading of the candidate section, then create a new section, and append it to candidate section. (This does not change which section is the last section in the outline.) Let current section be this new section. Let the element being entered be the new heading for the current section. Abort these substeps.
-						if rank[element.tag] < rank[candidate_section.header.tag]:
-							self.current_section = section()
-							candidate_section.append(self.current_section)
-							self.current_section.header = element
-							break
-						# Let new candidate section be the section that contains candidate section in the outline of current outlinee.
-						# Let candidate section be new candidate section.
-						candidate_section = candidate_section.parent
-						# Return to step 2.
-				# Push the element being entered onto the stack. (This causes the algorithm to skip any descendants of the element.)
-				self.stack.append(element)
-		
-		# If the current outlinee is null, then there was no sectioning content element or sectioning root element in the DOM. There is no outline.
-		try:
-			return self.outlines[self.current_outlinee]
-		except KeyError:
-			return None

specGen/processes/sub.py

-# coding=UTF-8
-# Copyright (c) 2008 Geoffrey Sneddon
-# 
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-# 
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-# 
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-import re
-import time
-from lxml import etree
-from copy import deepcopy
-
-from specGen import utils
-
-latest_version = re.compile(u"latest[%s]+version" % utils.spaceCharacters, re.IGNORECASE)
-
-w3c_tr_url_status = re.compile(r"http://www\.w3\.org/TR/[^/]*/(MO|WD|CR|PR|REC|PER|NOTE)-")
-
-year = re.compile(r"\[YEAR[^\]]*\]")
-year_sub = time.strftime(u"%Y", time.gmtime())
-year_identifier = u"[YEAR"
-
-date = re.compile(r"\[DATE[^\]]*\]")
-date_sub = time.strftime(u"%d %B %Y", time.gmtime()).lstrip(u"0")
-date_identifier = u"[DATE"
-
-cdate = re.compile(r"\[CDATE[^\]]*\]")
-cdate_sub = time.strftime(u"%Y%m%d", time.gmtime())
-cdate_identifier = u"[CDATE"
-
-title = re.compile(r"\[TITLE[^\]]*\]")
-title_identifier = u"[TITLE"
-
-status = re.compile(r"\[STATUS[^\]]*\]")
-status_identifier = u"[STATUS"
-
-longstatus = re.compile(r"\[LONGSTATUS[^\]]*\]")
-longstatus_identifier = u"[LONGSTATUS"
-longstatus_map = {
-	u"MO": u"W3C Member-only Draft",
-	u"ED": u"Editor's Draft",
-	u"WD": u"W3C Working Draft",
-	u"CR": u"W3C Candidate Recommendation",
-	u"PR": u"W3C Proposed Recommendation",
-	u"REC": u"W3C Recommendation",
-	u"PER": u"W3C Proposed Edited Recommendation",
-	u"NOTE": u"W3C Working Group Note"
-}
-
-w3c_stylesheet = re.compile(r"http://www\.w3\.org/StyleSheets/TR/W3C-[A-Z]+")
-w3c_stylesheet_identifier = u"http://www.w3.org/StyleSheets/TR/W3C-"
-
-string_subs = ((year, year_sub, year_identifier),
-               (date, date_sub, date_identifier),
-               (cdate, cdate_sub, cdate_identifier))
-
-logo = u"logo"
-logo_sub = etree.fromstring(u'<p><a href="http://www.w3.org/"><img alt="W3C" src="http://www.w3.org/Icons/w3c_home"/></a></p>')
-
-copyright = u"copyright"
-copyright_sub = etree.fromstring(u'<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &#xA9; ' + time.strftime(u"%Y", time.gmtime()) + u' <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&#xAE;</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>')
-
-basic_comment_subs = ()
-
-class sub(object):
-	"""Perform substitutions."""
-	
-	def __init__(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
-		if w3c_compat or w3c_compat_substitutions or w3c_compat_crazy_substitutions:
-			self.w3c_status = self.getW3CStatus(ElementTree, **kwargs)
-		self.stringSubstitutions(ElementTree, w3c_compat, w3c_compat_substitutions, w3c_compat_crazy_substitutions, **kwargs)
-		self.commentSubstitutions(ElementTree, w3c_compat, w3c_compat_substitutions, w3c_compat_crazy_substitutions, **kwargs)
-	
-	def stringSubstitutions(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
-		# Get doc_title from the title element
-		try:
-			doc_title = utils.textContent(ElementTree.getroot().find(u"head").find(u"title"))
-		except (AttributeError, TypeError):
-			doc_title = u""
-		
-		if w3c_compat or w3c_compat_substitutions:
-			# Get the right long status
-			doc_longstatus = longstatus_map[self.w3c_status]
-		
-		if w3c_compat_crazy_substitutions:
-			# Get the right stylesheet
-			doc_w3c_stylesheet = u"http://www.w3.org/StyleSheets/TR/W3C-" + self.w3c_status
-		
-		# Get all the subs we want
-		instance_string_subs = string_subs + ((title, doc_title, title_identifier),)
-		
-		# And even more in compat. mode
-		if w3c_compat or w3c_compat_substitutions:
-			instance_string_subs += ((status, self.w3c_status, status_identifier),
-			                         (longstatus, doc_longstatus, longstatus_identifier))
-		
-		# And more that aren't even enabled by default in compat. mode
-		if w3c_compat_crazy_substitutions:
-			instance_string_subs += ((w3c_stylesheet, doc_w3c_stylesheet, w3c_stylesheet_identifier),)
-		
-		for node in ElementTree.iter():
-			for regex, sub, identifier in instance_string_subs:
-				if node.text is not None and identifier in node.text:
-					node.text = regex.sub(sub, node.text)
-				if node.tail is not None and identifier in node.tail:
-					node.tail = regex.sub(sub, node.tail)
-				for name, value in node.attrib.items():
-					if identifier in value:
-						node.attrib[name] = regex.sub(sub, value)
-	
-	def commentSubstitutions(self, ElementTree, w3c_compat=False, w3c_compat_substitutions=False, w3c_compat_crazy_substitutions=False, **kwargs):
-		# Basic substitutions
-		instance_basic_comment_subs = basic_comment_subs
-		
-		# Add more basic substitutions in compat. mode
-		if w3c_compat or w3c_compat_substitutions:
-			instance_basic_comment_subs += ((logo, logo_sub),
-			                                (copyright, copyright_sub))
-		
-		# Set of nodes to remove
-		to_remove = set()
-		
-		# Link
-		in_link = False
-		for node in ElementTree.iter():
-			if in_link:
-				if node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == u"end-link":
-					if node.getparent() is not link_parent:
-						raise DifferentParentException, u"begin-link and end-link have different parents"
-					utils.removeInteractiveContentChildren(link)
-					link.set(u"href", utils.textContent(link))
-					in_link = False
-				else:
-					if node.getparent() is link_parent:
-						link.append(deepcopy(node))
-					to_remove.add(node)
-			elif node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == u"begin-link":
-				link_parent = node.getparent()
-				in_link = True
-				link = etree.Element(u"a")
-				link.text = node.tail
-				node.tail = None
-				node.addnext(link)
-		
-		# Basic substitutions
-		for comment, sub in instance_basic_comment_subs:
-			begin_sub = u"begin-" + comment
-			end_sub = u"end-" + comment
-			in_sub = False
-			for node in ElementTree.iter():
-				if in_sub:
-					if node.tag is etree.Comment and node.text.strip(utils.spaceCharacters) == end_sub:
-						if node.getparent() is not sub_parent:
-							raise DifferentParentException, u"%s and %s have different parents" % begin_sub, end_sub
-						in_sub = False
-					else:
-						to_remove.add(node)
-				elif node.tag is etree.Comment:
-					if node.text.strip(utils.spaceCharacters) == begin_sub:
-						sub_parent = node.getparent()
-						in_sub = True
-						node.tail = None
-						node.addnext(deepcopy(sub))
-					elif node.text.strip(utils.spaceCharacters) == comment:
-						node.addprevious(etree.Comment(begin_sub))
-						node.addprevious(deepcopy(sub))
-						node.addprevious(etree.Comment(end_sub))
-						node.getprevious().tail = node.tail
-						to_remove.add(node)
-		
-		# Remove nodes
-		for node in to_remove:
-			node.getparent().remove(node)
-	
-	def getW3CStatus(self, ElementTree, **kwargs):
-		# Get all text nodes that contain case-insensitively "latest version" with any amount of whitespace inside the phrase, or contain http://www.w3.org/TR/
-		for text in ElementTree.xpath(u"//text()[contains(translate(., 'LATEST', 'latest'), 'latest') and contains(translate(., 'VERSION', 'version'), 'version') or contains(., 'http://www.w3.org/TR/')]"):
-			if latest_version.search(text):
-				return u"ED"
-			elif w3c_tr_url_status.search(text):
-				return w3c_tr_url_status.search(text).group(1)
-		# Didn't find any status, return the default (ED)
-		else:
-			return u"ED"
-
-class DifferentParentException(utils.SpecGenException):
-	"""begin-link and end-link do not have the same parent."""
-	pass

specGen/processes/toc.py

-# coding=UTF-8
-# Copyright (c) 2008 Geoffrey Sneddon
-# 
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-# 
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-# 
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-from lxml import etree
-from copy import deepcopy
-
-from specGen import utils
-from specGen.processes import outliner
-
-# These are just the non-interactive elements to be removed
-remove_elements_from_toc = frozenset([u"dfn",])
-# These are, however, all the attributes to be removed
-remove_attributes_from_toc = frozenset([u"id",])
-
-class toc(object):
-	"""Build and add TOC."""
-	
-	toc = None
-	
-	def __init__(self, ElementTree, **kwargs):
-		self.toc = etree.Element(u"ol", {u"class": u"toc"})
-		self.buildToc(ElementTree, **kwargs)
-		self.addToc(ElementTree, **kwargs)
-	
-	def buildToc(self, ElementTree, min_depth = 2, max_depth = 6, w3c_compat = False, w3c_compat_class_toc = False, **kwargs):
-		# Build the outline of the document
-		outline_creator = outliner.Outliner(ElementTree, **kwargs)
-		outline = outline_creator.build(**kwargs)
-		
-		# Get a list of all the top level sections, and their depth (0)
-		sections = [(section, 0) for section in reversed(outline)]
-		
-		# Numbering
-		num = []
-		
-		# Set of elements to remove (due to odd behaviour of Element.iter() this has to be done afterwards)
-		to_remove = set()
-		
-		# Loop over all sections in a DFS
-		while sections:
-			# Get the section and depth at the end of list
-			section, depth = sections.pop()
-					
-			# If we have a header, regardless of how deep we are
-			if section.header is not None:
-				# Get the element that represents the section header's text
-				if section.header.tag == u"header":
-					i = 1
-					while i <= 6:
-						section_header_text_element = section.header.find(u"h" + unicode(i))
-						if section_header_text_element is not None:
-							break
-					else:
-						section_header_text_element = None
-				else:
-					section_header_text_element = section.header
-			else:
-				section_header_text_element = None
-			
-			# If we have a section heading text element, regardless of depth
-			if section_header_text_element is not None:
-				# Remove any existing number
-				for element in section_header_text_element.iter(u"span"):
-					if utils.elementHasClass(element, u"secno"):
-						# Preserve the element tail
-						if element.tail is not None:
-							if element.getprevious() is not None:
-								if element.getprevious().tail is None:
-									element.getprevious().tail = element.tail
-								else:
-									element.getprevious().tail += element.tail
-							else:
-								if element.getparent().text is None:
-									element.getparent().text = element.tail
-								else:
-									element.getparent().text += element.tail
-						# Remove the element
-						to_remove.add(element)
-			
-			# Check we're in the valid depth range (min/max_depth are 1 based, depth is 0 based)
-			if depth >= min_depth - 1 and depth <= max_depth - 1:
-				# Calculate the corrected depth (i.e., the actual depth within the numbering/TOC)
-				corrected_depth = depth - min_depth + 1
-				
-				# Numbering:
-				# No children, no sibling, move back to parent's sibling
-				if corrected_depth + 1 < len(num):
-					del num[corrected_depth + 1:]
-				# Children
-				elif corrected_depth == len(num):
-					num.append(0)
-				
-				# Increment the current section's number
-				if section_header_text_element is not None and not utils.elementHasClass(section_header_text_element, u"no-num") or section_header_text_element is None and section:
-					num[-1] += 1
-				
-				# Get the current TOC section for this depth, and add another item to it
-				if section_header_text_element is not None and not utils.elementHasClass(section_header_text_element, u"no-toc") or section_header_text_element is None and section:
-					# Find the appropriate section of the TOC 
-					i = 0
-					toc_section = self.toc
-					while i < corrected_depth:
-						try:
-							# If the final li has no children, or the last children isn't an ol element
-							if len(toc_section[-1]) == 0 or toc_section[-1][-1].tag != u"ol":
-								toc_section[-1].append(etree.Element(u"ol"))
-								self.indentNode(toc_section[-1][-1], (i + 1) * 2, **kwargs)
-								if w3c_compat or w3c_compat_class_toc:
-									toc_section[-1][-1].set(u"class", u"toc")
-						except IndexError:
-							# If the current ol has no li in it
-							toc_section.append(etree.Element(u"li"))
-							self.indentNode(toc_section[0], (i + 1) * 2 - 1, **kwargs)
-							toc_section[0].append(etree.Element(u"ol"))
-							self.indentNode(toc_section[0][0], (i + 1) * 2, **kwargs)
-							if w3c_compat or w3c_compat_class_toc:
-								toc_section[0][0].set(u"class", u"toc")
-						# TOC Section is now the final child (ol) of the final item (li) in the previous section
-						assert toc_section[-1].tag == u"li"
-						assert toc_section[-1][-1].tag == u"ol"
-						toc_section = toc_section[-1][-1]
-						i += 1
-					# Add the current item to the TOC
-					item = etree.Element(u"li")
-					toc_section.append(item)
-					self.indentNode(item, (i + 1) * 2 - 1, **kwargs)
-					
-				# If we have a header
-				if section_header_text_element is not None:
-					# Remove all the elements in the list of nodes to remove (so that the removal of existing numbers doesn't lead to crazy IDs)
-					for element in to_remove:
-						element.getparent().remove(element)
-					to_remove = set()
-					
-					# Add ID to header
-					id = utils.generateID(section_header_text_elemen