Commits

Geoffrey Sneddon committed 9531983

Allow any whitespace in "latest version".

Comments (0)

Files changed (3)

 
 </head><body><header>
 	<h1>spec-gen 1.0RC1-dev</h1>
-	<h2 class="no-num no-toc">Documentation — 26 August 2008</h2>
+	<h2 class="no-num no-toc">Documentation — 27 August 2008</h2>
 </header>
 
 <h2 class="no-num no-toc" id=contents>Contents</h2>
 order):</p>
 
 <ol>
-	<li>If the node contains, case-insensitively, "latest version" (where the
-middle space is one or more U+0020 SPACE, U+0009 CHARACTER TABULATION (tab),
-U+000A LINE FEED (LF), or U+000D CARRIAGE RETURN (CR) characters — a difference from the normal definition of <a href=#whitespace>whitespace</a>, as it does not include
-U+000C FORM FEED (FF)), searching stops, and the default is used (ED).
+	<li>If the node contains, case-insensitively, "latest", followed by one or
+more <a href=#whitespace>whitespace</a> characters, followed by "version", searching stops,
+and the default is used (ED).
 	</li><li>Otherwise, if the node, case-sensitively, contains
 "http://www.w3.org/TR/" followed by one of "MO", "WD", "CR", "PR", "REC", "PER",
 or "NOTE", which in turn is followed by U+002D HYPHEN-MINUS (-), then searching
 order):</p>
 
 <ol>
-	<li>If the node contains, case-insensitively, "latest version" (where the
-middle space is one or more U+0020 SPACE, U+0009 CHARACTER TABULATION (tab),
-U+000A LINE FEED (LF), or U+000D CARRIAGE RETURN (CR) characters — a difference from the normal definition of <span>whitespace</span>, as it does not include
-U+000C FORM FEED (FF)), searching stops, and the default is used (ED).
+	<li>If the node contains, case-insensitively, "latest", followed by one or
+more <span>whitespace</span> characters, followed by "version", searching stops,
+and the default is used (ED).
 	<li>Otherwise, if the node, case-sensitively, contains
 "http://www.w3.org/TR/" followed by one of "MO", "WD", "CR", "PR", "REC", "PER",
 or "NOTE", which in turn is followed by U+002D HYPHEN-MINUS (-), then searching

specGen/processes/sub.py

 
 from specGen import utils
 
+latest_version = re.compile(u"latest[%s]+version" % utils.spaceCharacters, re.IGNORECASE)
+
 w3c_tr_url_status = re.compile(r"http://www\.w3\.org/TR/[^/]*/(MO|WD|CR|PR|REC|PER|NOTE)-")
 
 year = re.compile(r"\[YEAR[^\]]*\]")
 	
 	def getW3CStatus(self, ElementTree, **kwargs):
 		# Get all text nodes that contain case-insensitively "latest version" with any amount of whitespace inside the phrase, or contain http://www.w3.org/TR/
-		for text in ElementTree.xpath(u"//text()[contains(normalize-space(translate(., 'AEILNORSTV', 'aeilnorstv')), 'latest version') or contains(., 'http://www.w3.org/TR/')]"):
-			if u"latest version" in text.lower():
+		for text in ElementTree.xpath(u"//text()[contains(translate(., 'LATEST', 'latest'), 'latest') and contains(translate(., 'VERSION', 'version'), 'version') or contains(., 'http://www.w3.org/TR/')]"):
+			if latest_version.search(text):
 				return u"ED"
 			elif w3c_tr_url_status.search(text):
 				return w3c_tr_url_status.search(text).group(1)