Jason R. Coombs avatar Jason R. Coombs committed 78a5561

Adding changes from SimpleParse 2.1.1 as found on PyPI

Comments (0)

Files changed (24)

 general documentation.  See license.txt for licensing
 information.  (This is a BSD-licensed package).
 '''
+__version__="2.1.1"
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
-<html>
-<head>
+<html><head>
   <meta content="en-us" http-equiv="Content-Language">
-  <meta content="text/html; charset=windows-1252"
- http-equiv="Content-Type">
+  <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">
   <meta content="Microsoft FrontPage 4.0" name="GENERATOR">
   <meta content="FrontPage.Editor.Document" name="ProgId">
-  <title>SimpleParse 2.1</title>
-  <link href="sitestyle.css" type="text/css" rel="stylesheet">
-  <meta content="Mike C. Fletcher" name="author">
-</head>
+  
+  <title>SimpleParse 2.1</title><link href="sitestyle.css" type="text/css" rel="stylesheet">
+  <meta content="Mike C. Fletcher" name="author"></head>
 <body>
 <h1>SimpleParse <font size="-2">A Parser Generator for mxTextTools
 v2.1.0</font></h1>
   <li><a href="common_problems.html">Common Problems</a> -- description
 of a number of common bugs, errors, pitfalls and anti-patterns when
 using the engine.</li>
-  <li><a
- href="http://www.ibm.com/developerworks/linux/library/l-simple.html">IBM
+  <li><a href="http://www.ibm.com/developerworks/linux/library/l-simple.html">IBM
 DeveloperWorks Article</a> by Dr. David Mertz -- discusses (and teaches
 the use of) SimpleParse 1.0, contrasting the EBNF-based parser with
 tools such as regexen for text-processing tasks. &nbsp;Watch also
   </li>
 </ul>
 <h2>Acquisition and Installation</h2>
-<p> You will need a copy of Python with <a
- href="http://www.python.org/sigs/distutils-sig/download.html">distutils</a>
+<p> You will need a copy of Python with <a href="http://www.python.org/sigs/distutils-sig/download.html">distutils</a>
 support (Python versions 2.0 and above include this). You'll also need
 a C
 compiler compatible with your Python build and understood by distutils.</p>
-<p>To install the base SimpleParse engine, <a
- href="http://sourceforge.net/project/showfiles.php?group_id=55673">download
+<p>To install the base SimpleParse engine, <a href="http://sourceforge.net/project/showfiles.php?group_id=55673">download
 the latest version</a> in your preferred format. If you are using the
 Win32 installer, simply run the executable. If you are using one of the
 source distributions, unpack the distribution into a
 your system.<br>
 </p>
 <h2>Features/Changelog</h2>
-<p>New in 2.1.0a1:</p>
+<p>New in 2.1.1a2:</p><ul><li>Disable all of the mxDebugPrintf functionality, which should allow us to build on Win32 with Mingw32 for Python 2.6</li></ul><p>New in 2.1.1a1:</p><ul><li>Fixes to build under Python 2.6</li><li>Rename of simpleparse.xml to simpleparse.xmlparser to avoid conflicts with standard library "xml"</li><li>Eliminate use of .message on exceptions, as this has been deprecated in Python 2.6</li></ul><p>New in 2.1.0a1:</p>
 <ul>
   <li>Includes (patched) mxTextTools extension as part of SimpleParse,
 no longer uses stand-alone mxTextTools installations<br>
 children are reported as if the enclosing production did not exist
 (allows you to use productions for organisational as well as
 reporting purposes)</li>
-  <li>Exposure of <a
- href="processing_result_trees.html#nonstandardresulttrees">callout
+  <li>Exposure of <a href="processing_result_trees.html#nonstandardresulttrees">callout
 mechanism</a> in mxTextTools</li>
   <li>Exposure of "LookAhead" mechanism in mxTextTools (allows you to
 spell "is followed by", "is not followed by", or "matches x but
 group to specify that all subsequent items must succeed. &nbsp;You can
 specify an error message format by using a string literal after the !
 character.</li>
-  <li>Library of common constructs (<a
- href="pydoc/simpleparse.common.html">simpleparse.common</a> package)
+  <li>Library of common constructs (<a href="pydoc/simpleparse.common.html">simpleparse.common</a> package)
 which are easily included in your grammars<br>
   </li>
   <li>Hexidecimal escapes for string and character ranges</li>
   <li>The library of common patterns is extremely sparse</li>
   <li>Unicode support</li>
   <li>There is no analysis and only minimal reduction done on the
-grammar. &nbsp;Having now read most of <a
- href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques - A
+grammar. &nbsp;Having now read most of <a href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques - A
 Practical Guide</a>, I can see how some fairly significant changes will
 be required to support such operations (and thereby the more common
 parsing techniques).<br>
 this may seem silly, but it would be nice to implement a more advanced
 parsing algorithm directly in C, without going through the
 assembly-like
-interface of mxTextTools. &nbsp;Given that Marc-Andr&eacute; isn't
+interface of mxTextTools. &nbsp;Given that Marc-Andr isn't
 interested
 in adopting the non-recursive codebase, there's not much point
 retaining compatability with mxTextTools, so moving to a more
 argue for using the non-recursive rewrite.</p>
 <p>To build the non-recursive TextTools engine, you'll need to
 get the source distribution for the non-recursive implementation from
-the <a
- href="http://sourceforge.net/project/showfiles.php?group_id=55673">SimpleParse
+the <a href="http://sourceforge.net/project/showfiles.php?group_id=55673">SimpleParse
 file repository</a>.&nbsp; Note,
 there are incompatabilities in the mxBase 2.1 versions that make it
 necessary to use the versions specified below to build the
 non-recursive versions.<br>
 </p>
 <ul>
-  <li>Python 2.2.x, <a
- href="http://lists.egenix.com/mailman-archives/egenix-users/2002-August/000078.html">mxBase
-2.1b5</a>, non-recursive <a
- href="https://sourceforge.net/project/showfiles.php?group_id=55673&amp;package_id=53017&amp;release_id=108636">1.0.0b4</a><br>
+  <li>Python 2.2.x, <a href="http://lists.egenix.com/mailman-archives/egenix-users/2002-August/000078.html">mxBase
+2.1b5</a>, non-recursive <a href="https://sourceforge.net/project/showfiles.php?group_id=55673&amp;package_id=53017&amp;release_id=108636">1.0.0b4</a><br>
   </li>
-  <li>Python 2.3.x, <a
- href="http://lists.egenix.com/mailman-archives/egenix-users/2003-August/000262.html">mxBase
-2.1</a> August 2003 Shapshot, non-recursive <a
- href="https://sourceforge.net/project/showfiles.php?group_id=55673&amp;package_id=53017">1.0.0b5+</a></li>
+  <li>Python 2.3.x, <a href="http://lists.egenix.com/mailman-archives/egenix-users/2003-August/000262.html">mxBase
+2.1</a> August 2003 Shapshot, non-recursive <a href="https://sourceforge.net/project/showfiles.php?group_id=55673&amp;package_id=53017">1.0.0b5+</a></li>
 </ul>
 <p>This archive is intended to be expanded over the
 mxBase source archive from the top-level directory, replacing one file
 <p>Extensions to the eGenix extensions (most significantly the rewrite
 of the core loop) are copyright Mike Fletcher and released under the
 SimpleParse License below:</p>
-<p>&nbsp;&nbsp;&nbsp; Copyright &copy; 2003-2006, Mike Fletcher</p>
+<p>&nbsp;&nbsp;&nbsp; Copyright  2003-2006, Mike Fletcher</p>
 <p>SimpleParse License:</p>
-<p style="margin-left: 80px;">Copyright &copy; 1998-2006, Copyright by
+<p style="margin-left: 80px;">Copyright  1998-2006, Copyright by
 Mike C. Fletcher; All Rights Reserved.<br>
 mailto: <a href="mailto:mcfletch@users.sourceforge.net">mcfletch@users.sourceforge.net</a>
 </p>
 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
 TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
 PERFORMANCE OF THIS SOFTWARE!</p>
-<p align="center">A <a href="http://sourceforge.net"> <img
- alt="SourceForge Logo"
- src="http://sourceforge.net/sflogo.php?group_id=55673&amp;type=5"
- border="0" height="62" width="210"></a><br>
+<p align="center">A <a href="http://sourceforge.net"> <img alt="SourceForge Logo" src="http://sourceforge.net/sflogo.php?group_id=55673&amp;type=5" border="0" height="62" width="210"></a><br>
 Open Source <a href="http://simpleparse.sourceforge.net/">project</a><br>
 </p>
-</body>
-</html>
+</body></html>
 	line = -1
 	production = ""
 	expected = ""
+	error_message = None
 	DEFAULTTEMPLATE = """Failed parsing production "%(production)s" @pos %(position)s (~line %(line)s:%(lineChar)s).\nExpected syntax: %(expected)s\nGot text: %(text)s"""
 	def __str__( self ):
 		"""Create a string representation of the error"""
-		if self.message:
-			return '%s: %s'%( self.__class__.__name__, self.messageFormat(self.message) )
+		if self.error_message:
+			return '%s: %s'%( self.__class__.__name__, self.messageFormat(self.error_message) )
 		else:
 			return '%s: %s'%( self.__class__.__name__, self.messageFormat() )
 	def messageFormat( self, template=None):

objectgenerator.py

 	def __call__( self, text, position, end ):
 		"""Method called by mxTextTools iff the base production fails"""
 		error = ParserSyntaxError( self.message )
-		error.message = self.message
+		error.error_message = self.message
 		error.production = self.production
 		error.expected= self.expected
 		error.buffer = text
 	python setup.py install
 to install the packages from the source archive.
 """
-from setuptools import setup, Extension
+#from setuptools import setup, Extension
+from distutils.core import setup, Extension
 import os, sys, string
 
+def findVersion( ):
+	a = {}
+	exec( open( '__init__.py' ).read(), a, a )
+	return a['__version__']
+
 def isPackage( filename ):
 	"""Is the given filename a Python package"""
 	return (
 		}
 	setup (
 		name = "SimpleParse",
-		version = "2.1.0a2",
+		version = findVersion(),
 		description = "A Parser Generator for Python (w/mxTextTools derivative)",
 		author = "Mike C. Fletcher",
 		author_email = "mcfletch@users.sourceforge.net",
 		options = options,
 
 		packages = packages.keys(),
-		include_package_data = True,
-		zip_safe = False,
+#		include_package_data = True,
+#		zip_safe = False,
 		ext_modules=[
 			Extension(
 				"simpleparse.stt.TextTools.mxTextTools.mxTextTools", 
 					'stt/TextTools/mxTextTools/mxte.c',
 					'stt/TextTools/mxTextTools/mxbmse.c',
 				],
-				include_dirs=['stt/TextTools/mxTextTools']
+				include_dirs=['stt/TextTools/mxTextTools'],
+				define_macros=[ ('MX_BUILDING_MXTEXTTOOLS',1) ],
 			),
 		],
 		**extraArguments

stt/TextTools/TextTools.py

 # Extra stuff useful in combination with the C functions
 #
 
-def replace(text,what,with,start=0,stop=None,
+def replace(text,what,with_what,start=0,stop=None,
 
             SearchObject=TextSearch,join=join,joinlist=joinlist,tag=tag,
             string_replace=string.replace,type=type,
         what = so.match
     if stop is None:
         if start == 0 and len(what) < 2:
-            return string_replace(text,what,with)
+            return string_replace(text,what,with_what)
         stop = len(text)
     t = ((text,sWordStart,so,+2),
          # Found something, replace and continue searching
-         (with,Skip+AppendTagobj,len(what),-1,-1),
+         (with_what,Skip+AppendTagobj,len(what),-1,-1),
          # Rest of text
          (text,Move,ToEOF)
          )
 
 # Alternative (usually slower) versions using different techniques:
 
-def _replace2(text,what,with,start=0,stop=None,
+def _replace2(text,what,with_what,start=0,stop=None,
 
               join=join,joinlist=joinlist,tag=tag,
               TextSearchType=TextSearchType,TextSearch=TextSearch):
 
-    """Analogon to string.replace; returns a string with all occurences
-       of what in text[start:stop] replaced by with.
+    """Analogon to string.replace; returns a string with_what all occurences
+       of what in text[start:stop] replaced by with_what.
        
        This version uses a one entry tag-table and a
        Boyer-Moore-Search-object.  what can be a string or a
         stop = len(text)
     if type(what) is not TextSearchType:
         what=TextSearch(what)
-    t = ((with,sFindWord,what,+1,+0),)
+    t = ((with_what,sFindWord,what,+1,+0),)
     found,taglist,last = tag(text,t,start,stop)
     if not found: 
         return text
     return join(joinlist(text,taglist))
 
-def _replace3(text,what,with,
+def _replace3(text,what,with_what,
 
               join=string.join,TextSearch=TextSearch,
               TextSearchType=TextSearchType):
     l = []
     x = 0
     for left,right in slices:
-        l.append(text[x:left] + with)
+        l.append(text[x:left] + with_what)
         x = right
     l.append(text[x:])
     return join(l,'')
 
-def _replace4(text,what,with,
+def _replace4(text,what,with_what,
 
               join=join,joinlist=joinlist,tag=tag,TextSearch=TextSearch,
               TextSearchType=TextSearchType):
         return text
     repl = [None]*len(slices)
     for i in range(len(slices)):
-        repl[i] = (with,)+slices[i]
+        repl[i] = (with_what,)+slices[i]
     return join(joinlist(text,repl))
 
 def multireplace(text,replacements,start=0,stop=None,
     """ Apply multiple replacement to a text at once.
 
         replacements must be list of tuples (replacement, left,
-        right).  It is used to replace the slice text[left:right] with
+        right).  It is used to replace the slice text[left:right] with_what
         the string replacement.
 
         Note that the replacements do not affect one another.  Indices
         tag() this funtion *does* make copies of the found stings,
         though.
 
-        Returns a tuple (rc,tagdict,next) with the same meaning of rc
+        Returns a tuple (rc,tagdict,next) with_what the same meaning of rc
         and next as tag(); tagdict is the new dictionary or None in
         case rc is 0.
           
 
 def invset(chars):
     
-    """ Return a set with all characters *except* the ones in chars.
+    """ Return a set with_what all characters *except* the ones in chars.
     """
     return set(chars,0)
 
         characters into one space.
 
         The result is a one line text string. Tim Peters will like
-        this function called with '-' separator ;-)
+        this function called with_what '-' separator ;-)
         
     """
     return join(charset.split(text), separator)
         print 'Replacing strings'
         print '-'*72
         print
-        for what,with in (('m','M'),('mx','MX'),('mxText','MXTEXT'),
+        for what,with_what in (('m','M'),('mx','MX'),('mxText','MXTEXT'),
                           ('hmm','HMM'),('hmmm','HMM'),('hmhmm','HMM')):
-            print 'Replace "%s" with "%s"' % (what,with)
+            print 'Replace "%s" with "%s"' % (what,with_what)
             t.start()
             for i in range(100):
-                rtext = string.replace(text,what,with)
+                rtext = string.replace(text,what,with_what)
             print 'with string.replace:',t.stop(),'sec.'
             t.start()
             for i in range(100):
-                ttext = replace(text,what,with)
+                ttext = replace(text,what,with_what)
             print 'with tag.replace:',t.stop(),'sec.'
             if ttext != rtext:
                 print 'results are NOT ok !'
                 mismatch(rtext,ttext)
             t.start()
             for i in range(100):
-                ttext = _replace2(text,what,with)
+                ttext = _replace2(text,what,with_what)
             print 'with tag._replace2:',t.stop(),'sec.'
             if ttext != rtext:
                 print 'results are NOT ok !'
                 print rtext
             t.start()
             for i in range(100):
-                ttext = _replace3(text,what,with)
+                ttext = _replace3(text,what,with_what)
             print 'with tag._replace3:',t.stop(),'sec.'
             if ttext != rtext:
                 print 'results are NOT ok !'
                 print rtext
             t.start()
             for i in range(100):
-                ttext = _replace4(text,what,with)
+                ttext = _replace4(text,what,with_what)
             print 'with tag._replace4:',t.stop(),'sec.'
             if ttext != rtext:
                 print 'results are NOT ok !'

stt/TextTools/mxTextTools/highcommands.h

-/* non-recursive high-level commands 
-
-  The contract here is:
-
-	The commands may alter any of the tag-specific variables
-
-	errors may be indicated if encountered in childReturnCode and the error* variables
-
-*/
-
-	case MATCH_SWORDSTART:
-	case MATCH_SWORDEND:
-	case MATCH_SFINDWORD:
-		/* these items basically follow the low-level contract, with the
-			only exception being that MATCH_SFINDWORD will change childStart
-		*/
-	    {
-			int wordstart, wordend;
-			int returnCode;
-
-			DPRINTF("\nsWordStart/End/sFindWord :\n"
-				" in string   = '%.40s'\n",text+childPosition);
-			childStart = childPosition;
-			returnCode = TE_SEARCHAPI(
-				match,
-				text,
-				childStart,
-				sliceright,
-				&wordstart,
-				&wordend
-			);
-			if (returnCode < 0) {
-				childReturnCode = ERROR_CODE;
-				errorType = PyExc_SystemError;
-				errorMessage = PyString_FromFormat(
-					 "Search-object search returned value < 0 (%i): probable bug in text processing engine",
-					 returnCode
-				);
-			} else if (returnCode == 0) { 
-				/* not matched */
-				DPRINTF(" (no success)\n");
-				childReturnCode = FAILURE_CODE;
-			} else { 
-				/* matched, adjust childPosition according to the word start/end/find requirements */
-				if (command == MATCH_SWORDSTART) {
-					childPosition = wordstart;
-				} else {
-					childPosition = wordend;
-				}
-				if (command == MATCH_SFINDWORD) {
-					/* XXX logic problem with lookahead
-					should it reset to real childStart or 
-					the fake one created here? */
-					childStart = wordstart;
-				}
-				DPRINTF(" [%i:%i] (matched and remembered this slice)\n",
-					childStart,childPosition);
-			}
-			break;
-	    }
-
-	case MATCH_LOOP:
-		/* No clue what this is supposed to do, real surprising if it works...
-
-		*/
-	    DPRINTF("\nLoop: pre loop counter = %i\n",loopcount);
-	    
-		if (loopcount > 0) {
-			/* we are inside a loop */
-			loopcount--;
-		} else if (loopcount < 0) {
-			/* starting a new loop */
-			if (PyInt_Check(match)) {
-				loopcount = PyInt_AS_LONG(match);
-				loopstart = childPosition;
-			} else {
-				childReturnCode = ERROR_CODE;
-				errorType = PyExc_TypeError;
-				errorMessage = PyString_FromFormat(
-					 "Tag Table entry %i: expected an integer (command=Loop) got a %.50s",
-					 index,
-					 match->ob_type->tp_name
-				);
-			}
-		}
-		if (childReturnCode == NULL_CODE ) {
-
-			if (loopcount == 0) {
-				/* finished loop */
-				loopcount = -1;
-			}
-			if (loopstart == childPosition) {
-				/* not matched */
-				childReturnCode = FAILURE_CODE;
-			} else {
-				childReturnCode = SUCCESS_CODE;
-				/* on success, add match from start of the whole loop to end of current iteration? 
-				
-				Would be really good if I had a clue what this is supposed to do :) .
-				*/
-				childStart = loopstart;
-			}
-			DPRINTF("\nloop: post loop counter = %i\n",loopcount);
-		}
-		break;
-
-	case MATCH_LOOPCONTROL:
-
-	    DPRINTF("\nLoopControl: loop counter = %i, "
-		    "setting it to = %li\n",
-		    loopcount,PyInt_AS_LONG(match));
-
-	    loopcount = PyInt_AS_LONG(match);
-		break;
-
-	case MATCH_CALL:
-	case MATCH_CALLARG:
-		/* call and callarg actually follow the low-level contract */
-
-	    {
-			PyObject *fct = NULL;
-			int argc = -1;
-			
-			if (!PyTuple_Check(match)) {
-				argc = 0;
-				fct = match;
-			} else {
-				argc = PyTuple_GET_SIZE(match) - 1;
-				if (argc < 0) {
-					/* how is this even possible? */
-					childReturnCode = ERROR_CODE;
-					errorType = PyExc_TypeError;
-					errorMessage = PyString_FromFormat(
-						"Tag Table entry %i: "
-						"expected a tuple (fct,arg0,arg1,...)"
-						"(command=CallArg)",
-						index
-					);
-				} else {
-					fct = PyTuple_GET_ITEM(match,0);
-				}
-			}
-			
-			if (childReturnCode == NULL_CODE && PyCallable_Check(fct)) {
-				PyObject *args;
-				register PyObject *w;
-				register int argIndex;
-
-				DPRINTF("\nCall[Arg] :\n");
-			
-				childStart = childPosition;
-
-				/* Build args = (textobj,childStart,sliceright[,arg0,arg1,...]) */
-				args = PyTuple_New(3 + argc);
-				if (!args) {
-					childReturnCode = ERROR_CODE;
-					errorType = PyExc_SystemError;
-					errorMessage = PyString_FromFormat(
-						 "Unable to create argument tuple for CallArgs command at index %i",
-						 index
-					);
-				} else {
-					Py_INCREF(textobj);
-					PyTuple_SET_ITEM(args,0,textobj);
-					w = PyInt_FromLong(childStart);
-					if (!w){
-						childReturnCode = ERROR_CODE;
-						errorType = PyExc_SystemError;
-						errorMessage = PyString_FromFormat(
-							 "Unable to convert an integer %i to a Python Integer",
-							 childStart
-						);
-					} else {
-						PyTuple_SET_ITEM(args,1,w);
-						w = PyInt_FromLong(sliceright);
-						if (!w) {
-							childReturnCode = ERROR_CODE;
-							errorType = PyExc_SystemError;
-							errorMessage = PyString_FromFormat(
-								 "Unable to convert an integer %i to a Python Integer",
-								 sliceright
-							);
-						} else {
-							PyTuple_SET_ITEM(args,2,w);
-							for (argIndex = 0; argIndex < argc; argIndex++) {
-							w = PyTuple_GET_ITEM(match,argIndex + 1);
-							Py_INCREF(w);
-							PyTuple_SET_ITEM(args,3 + argIndex,w);
-							}
-							/* now actually call the object */
-							w = PyEval_CallObject(fct,args);
-							Py_DECREF(args);
-							if (w == NULL) {
-								childReturnCode = ERROR_CODE;
-								/* child's error should be allowed to propagate */
-							} else if (!PyInt_Check(w)) {
-								childReturnCode = ERROR_CODE;
-								errorType = PyExc_TypeError;
-								errorMessage = PyString_FromFormat(
-									 "Tag Table entry %i: matching function has to return an integer, returned a %.50s",
-									 index,
-									 w->ob_type->tp_name
-								);
-							} else {
-								childPosition = PyInt_AS_LONG(w);
-								Py_DECREF(w);
-
-								if (childStart == childPosition) { 
-									/* not matched */
-									DPRINTF(" (no success)\n");
-									childReturnCode = FAILURE_CODE;
-								}
-							}
-						}
-					}
-				}
-			} else {
-				childReturnCode = ERROR_CODE;
-				errorType = PyExc_TypeError;
-				errorMessage = PyString_FromFormat(
-					"Tag Table entry %i: "
-					"expected a callable object, got a %.50s"
-					"(command=Call[Arg])",
-					index,
-					fct->ob_type->tp_name
-				);
-			}
-			break;
-		}
+/* non-recursive high-level commands 
+
+  The contract here is:
+
+	The commands may alter any of the tag-specific variables
+
+	errors may be indicated if encountered in childReturnCode and the error* variables
+
+*/
+
+	case MATCH_SWORDSTART:
+	case MATCH_SWORDEND:
+	case MATCH_SFINDWORD:
+		/* these items basically follow the low-level contract, with the
+			only exception being that MATCH_SFINDWORD will change childStart
+		*/
+	    {
+			Py_ssize_t wordstart, wordend;
+			int returnCode;
+
+			DPRINTF("\nsWordStart/End/sFindWord :\n"
+				" in string   = '%.40s'\n",text+childPosition);
+			childStart = childPosition;
+			returnCode = TE_SEARCHAPI(
+				match,
+				text,
+				childStart,
+				sliceright,
+				&wordstart,
+				&wordend
+			);
+			if (returnCode < 0) {
+				childReturnCode = ERROR_CODE;
+				errorType = PyExc_SystemError;
+				errorMessage = PyString_FromFormat(
+					 "Search-object search returned value < 0 (%i): probable bug in text processing engine",
+					 returnCode
+				);
+			} else if (returnCode == 0) { 
+				/* not matched */
+				DPRINTF(" (no success)\n");
+				childReturnCode = FAILURE_CODE;
+			} else { 
+				/* matched, adjust childPosition according to the word start/end/find requirements */
+				if (command == MATCH_SWORDSTART) {
+					childPosition = wordstart;
+				} else {
+					childPosition = wordend;
+				}
+				if (command == MATCH_SFINDWORD) {
+					/* XXX logic problem with lookahead
+					should it reset to real childStart or 
+					the fake one created here? */
+					childStart = wordstart;
+				}
+				DPRINTF(" [%i:%i] (matched and remembered this slice)\n",
+					childStart,childPosition);
+			}
+			break;
+	    }
+
+	case MATCH_LOOP:
+		/* No clue what this is supposed to do, real surprising if it works...
+
+		*/
+	    DPRINTF("\nLoop: pre loop counter = %i\n",loopcount);
+	    
+		if (loopcount > 0) {
+			/* we are inside a loop */
+			loopcount--;
+		} else if (loopcount < 0) {
+			/* starting a new loop */
+			if (PyInt_Check(match)) {
+				loopcount = PyInt_AS_LONG(match);
+				loopstart = childPosition;
+			} else {
+				childReturnCode = ERROR_CODE;
+				errorType = PyExc_TypeError;
+				errorMessage = PyString_FromFormat(
+					 "Tag Table entry %d: expected an integer (command=Loop) got a %.50s",
+					 (unsigned int)index,
+					 match->ob_type->tp_name
+				);
+			}
+		}
+		if (childReturnCode == NULL_CODE ) {
+
+			if (loopcount == 0) {
+				/* finished loop */
+				loopcount = -1;
+			}
+			if (loopstart == childPosition) {
+				/* not matched */
+				childReturnCode = FAILURE_CODE;
+			} else {
+				childReturnCode = SUCCESS_CODE;
+				/* on success, add match from start of the whole loop to end of current iteration? 
+				
+				Would be really good if I had a clue what this is supposed to do :) .
+				*/
+				childStart = loopstart;
+			}
+			DPRINTF("\nloop: post loop counter = %i\n",loopcount);
+		}
+		break;
+
+	case MATCH_LOOPCONTROL:
+
+	    DPRINTF("\nLoopControl: loop counter = %i, "
+		    "setting it to = %li\n",
+		    loopcount,PyInt_AS_LONG(match));
+
+	    loopcount = PyInt_AS_LONG(match);
+		break;
+
+	case MATCH_CALL:
+	case MATCH_CALLARG:
+		/* call and callarg actually follow the low-level contract */
+
+	    {
+			PyObject *fct = NULL;
+			int argc = -1;
+			
+			if (!PyTuple_Check(match)) {
+				argc = 0;
+				fct = match;
+			} else {
+				argc = PyTuple_GET_SIZE(match) - 1;
+				if (argc < 0) {
+					/* how is this even possible? */
+					childReturnCode = ERROR_CODE;
+					errorType = PyExc_TypeError;
+					errorMessage = PyString_FromFormat(
+						"Tag Table entry %d: "
+						"expected a tuple (fct,arg0,arg1,...)"
+						"(command=CallArg)",
+						(unsigned int)index
+					);
+				} else {
+					fct = PyTuple_GET_ITEM(match,0);
+				}
+			}
+			
+			if (childReturnCode == NULL_CODE && PyCallable_Check(fct)) {
+				PyObject *args;
+				register PyObject *w;
+				register Py_ssize_t argIndex;
+
+				DPRINTF("\nCall[Arg] :\n");
+			
+				childStart = childPosition;
+
+				/* Build args = (textobj,childStart,sliceright[,arg0,arg1,...]) */
+				args = PyTuple_New(3 + argc);
+				if (!args) {
+					childReturnCode = ERROR_CODE;
+					errorType = PyExc_SystemError;
+					errorMessage = PyString_FromFormat(
+						 "Unable to create argument tuple for CallArgs command at index %d",
+						 (unsigned int)index
+					);
+				} else {
+					Py_INCREF(textobj);
+					PyTuple_SET_ITEM(args,0,textobj);
+					w = PyInt_FromLong(childStart);
+					if (!w){
+						childReturnCode = ERROR_CODE;
+						errorType = PyExc_SystemError;
+						errorMessage = PyString_FromFormat(
+							 "Unable to convert an integer %d to a Python Integer",
+							 (unsigned int)childStart
+						);
+					} else {
+						PyTuple_SET_ITEM(args,1,w);
+						w = PyInt_FromLong(sliceright);
+						if (!w) {
+							childReturnCode = ERROR_CODE;
+							errorType = PyExc_SystemError;
+							errorMessage = PyString_FromFormat(
+								 "Unable to convert an integer %d to a Python Integer",
+								 (unsigned int)sliceright
+							);
+						} else {
+							PyTuple_SET_ITEM(args,2,w);
+							for (argIndex = 0; argIndex < argc; argIndex++) {
+							w = PyTuple_GET_ITEM(match,argIndex + 1);
+							Py_INCREF(w);
+							PyTuple_SET_ITEM(args,3 + argIndex,w);
+							}
+							/* now actually call the object */
+							w = PyEval_CallObject(fct,args);
+							Py_DECREF(args);
+							if (w == NULL) {
+								childReturnCode = ERROR_CODE;
+								/* child's error should be allowed to propagate */
+							} else if (!PyInt_Check(w)) {
+								childReturnCode = ERROR_CODE;
+								errorType = PyExc_TypeError;
+								errorMessage = PyString_FromFormat(
+									 "Tag Table entry %d: matching function has to return an integer, returned a %.50s",
+									 (unsigned int)index,
+									 w->ob_type->tp_name
+								);
+							} else {
+								childPosition = PyInt_AS_LONG(w);
+								Py_DECREF(w);
+
+								if (childStart == childPosition) { 
+									/* not matched */
+									DPRINTF(" (no success)\n");
+									childReturnCode = FAILURE_CODE;
+								}
+							}
+						}
+					}
+				}
+			} else {
+				childReturnCode = ERROR_CODE;
+				errorType = PyExc_TypeError;
+				errorMessage = PyString_FromFormat(
+					"Tag Table entry %d: "
+					"expected a callable object, got a %.50s"
+					"(command=Call[Arg])",
+					(unsigned int)index,
+					fct->ob_type->tp_name
+				);
+			}
+			break;
+		}

stt/TextTools/mxTextTools/lowlevelcommands.h

-/* Low-level matching commands code fragment
-
-  The contract here is:
-
-	all commands move forward through the buffer
-	
-	failure to move forward indicates failure of the tag
-	
-	moving forward indicates success of the tag
-	
-	errors may be indicated if encountered in childReturnCode and the error* variables
-	
-	only childPosition should be updated otherwise
-
-*/
-TE_CHAR *m = TE_STRING_AS_STRING(match);
-if (m == NULL) {
-	childReturnCode = ERROR_CODE;
-	errorType = PyExc_TypeError;
-	errorMessage = PyString_FromFormat(
-		 "Low-level command (%i) argument in entry %i couldn't be converted to a string object, is a %.50s",
-		 command,
-		 index,
-		 textobj->ob_type->tp_name
-
-	);
-} else {
-
-switch (command) {
-
-	case MATCH_ALLIN:
-
-		{
-			register int ml = TE_STRING_GET_SIZE(match);
-			register TE_CHAR *tx = &text[childPosition];
-
-			DPRINTF("\nAllIn :\n"
-				" looking for   = '%.40s'\n"
-				" in string     = '%.40s'\n",m,tx);
-
-			if (ml > 1) {
-				for (; childPosition < sliceright; tx++, childPosition++) {
-					register int j;
-					register TE_CHAR *mj = m;
-					register TE_CHAR ctx = *tx;
-					for (j=0; j < ml && ctx != *mj; mj++, j++) ;
-					if (j == ml) break;
-				}
-			} else if (ml == 1) {
-				/* one char only: use faster variant: */
-				for (; childPosition < sliceright && *tx == *m; tx++, childPosition++) ;
-			}
-			break;
-		}
-
-	case MATCH_ALLNOTIN:
-
-		{
-			register int ml = TE_STRING_GET_SIZE(match);
-			register TE_CHAR *tx = &text[childPosition];
-
-			DPRINTF("\nAllNotIn :\n"
-				" looking for   = '%.40s'\n"
-				" not in string = '%.40s'\n",m,tx);
-
-			if (ml != 1) {
-				for (; childPosition < sliceright; tx++, childPosition++) {
-					register int j;
-					register TE_CHAR *mj = m;
-					register TE_CHAR ctx = *tx;
-					for (j=0; j < ml && ctx != *mj; mj++, j++) ;
-					if (j != ml) break;
-				}
-			} else {
-				/* one char only: use faster variant: */
-				for (; childPosition < sliceright && *tx != *m; tx++, childPosition++) ;
-			}
-			break;
-		}
-
-	case MATCH_IS: 
-		
-		{
-			DPRINTF("\nIs :\n"
-				" looking for   = '%.40s'\n"
-				" in string     = '%.40s'\n",m,text+childPosition);
-
-			if (childPosition < sliceright && *(&text[childPosition]) == *m) {
-				childPosition++;
-			}
-			break;
-		}
-
-	case MATCH_ISIN:
-
-	{
-		register int ml = TE_STRING_GET_SIZE(match);
-		register TE_CHAR ctx = text[childPosition];
-
-		DPRINTF("\nIsIn :\n"
-			" looking for   = '%.40s'\n"
-			" in string     = '%.40s'\n",m,text+childPosition);
-
-		if (ml > 0 && childPosition < sliceright) {
-		register int j;
-		register TE_CHAR *mj = m;
-		for (j=0; j < ml && ctx != *mj; mj++, j++) ;
-		if (j != ml) childPosition++;
-		}
-
-		break;
-	}
-
-	case MATCH_ISNOTIN:
-
-	{
-		register int ml = TE_STRING_GET_SIZE(match);
-		register TE_CHAR ctx = text[childPosition];
-
-		DPRINTF("\nIsNotIn :\n"
-			" looking for   = '%.40s'\n"
-			" not in string = '%.40s'\n",m,text+childPosition);
-
-		if (ml > 0 && childPosition < sliceright) {
-		register int j;
-		register TE_CHAR *mj = m;
-		for (j=0; j < ml && ctx != *mj; mj++, j++) ;
-		if (j == ml) childPosition++;
-		}
-		else
-		childPosition++;
-
-		break;
-	}
-
-	case MATCH_WORD:
-
-	{
-		int ml1 = TE_STRING_GET_SIZE(match) - 1;
-		register TE_CHAR *tx = &text[childPosition + ml1];
-		register int j = ml1;
-		register TE_CHAR *mj = &m[j];
-
-		DPRINTF("\nWord :\n"
-			" looking for   = '%.40s'\n"
-			" in string     = '%.40s'\n",m,&text[childPosition]);
-
-		if (childPosition+ml1 >= sliceright) break;
-		
-		/* compare from right to left */
-		for (; j >= 0 && *tx == *mj;
-		 tx--, mj--, j--) ;
-
-		if (j >= 0) /* not matched */
-		childPosition = startPosition; /* reset */
-		else
-		childPosition += ml1 + 1;
-		break;
-	}
-
-	case MATCH_WORDSTART:
-	case MATCH_WORDEND:
-
-	{
-		int ml1 = TE_STRING_GET_SIZE(match) - 1;
-
-		if (ml1 >= 0) {
-		register TE_CHAR *tx = &text[childPosition];
-			
-		DPRINTF("\nWordStart/End :\n"
-			" looking for   = '%.40s'\n"
-			" in string     = '%.40s'\n",m,tx);
-
-		/* Brute-force method; from right to left */
-		for (;;) {
-			register int j = ml1;
-			register TE_CHAR *mj = &m[j];
-
-			if (childPosition+j >= sliceright) {
-			/* reached eof: no match, rewind */
-			childPosition = startPosition;
-			break;
-			}
-
-			/* scan from right to left */
-			for (tx += j; j >= 0 && *tx == *mj; 
-			 tx--, mj--, j--) ;
-			/*
-			DPRINTF("match text[%i+%i]: %c == %c\n",
-					childPosition,j,*tx,*mj);
-			*/
-
-			if (j < 0) {
-			/* found */
-			if (command == MATCH_WORDEND) childPosition += ml1 + 1;
-			break;
-			}
-			/* not found: rewind and advance one char */
-			tx -= j - 1;
-			childPosition++;
-		}
-		}
-
-		break;
-	}
-
-#if (TE_TABLETYPE == MXTAGTABLE_STRINGTYPE)
-
-	/* Note: These two only work for 8-bit set strings. */
-	case MATCH_ALLINSET:
-
-	{
-		register TE_CHAR *tx = &text[childPosition];
-		unsigned char *m = PyString_AS_STRING(match);
-
-		DPRINTF("\nAllInSet :\n"
-			" looking for   = set at 0x%lx\n"
-			" in string     = '%.40s'\n",(long)match,tx);
-
-		for (;
-		 childPosition < sliceright &&
-		 (m[((unsigned char)*tx) >> 3] & 
-		  (1 << (*tx & 7))) > 0;
-		 tx++, childPosition++) ;
-
-		break;
-	}
-
-	case MATCH_ISINSET:
-
-	{
-		register TE_CHAR *tx = &text[childPosition];
-		unsigned char *m = PyString_AS_STRING(match);
-
-		DPRINTF("\nIsInSet :\n"
-			" looking for   = set at 0x%lx\n"
-			" in string     = '%.40s'\n",(long)match,tx);
-
-		if (childPosition < sliceright &&
-		(m[((unsigned char)*tx) >> 3] & 
-		 (1 << (*tx & 7))) > 0)
-		childPosition++;
-
-		break;
-	}
-
-#endif
-
-	case MATCH_ALLINCHARSET:
-
-	{
-		int matching;
-
-		DPRINTF("\nAllInCharSet :\n"
-			" looking for   = CharSet at 0x%lx\n"
-			" in string     = '%.40s'\n",
-			(long)match, &text[childPosition]);
-		
-		matching = mxCharSet_Match(match,
-					   textobj,
-					   childPosition,
-					   sliceright,
-					   1);
-		if (matching < 0) {
-			childReturnCode = ERROR_CODE;
-			errorType = PyExc_SystemError;
-			errorMessage = PyString_FromFormat(
-				 "Character set match returned value < 0 (%i): probable bug in text processing engine",
-				 matching
-			);
-		} else {
-			childPosition += matching;
-		}
-		break;
-	}
-
-	case MATCH_ISINCHARSET:
-
-		{
-			int test;
-
-			DPRINTF("\nIsInCharSet :\n"
-				" looking for   = CharSet at 0x%lx\n"
-				" in string     = '%.40s'\n",
-				(long)match, &text[childPosition]);
-
-#if (TE_TABLETYPE == MXTAGTABLE_STRINGTYPE)
-			test = mxCharSet_ContainsChar(match, text[childPosition]);
-#else
-			test = mxCharSet_ContainsUnicodeChar(match, text[childPosition]);
-#endif
-			if (test < 0) {
-				childReturnCode = ERROR_CODE;
-				errorType = PyExc_SystemError;
-				errorMessage = PyString_FromFormat(
-					 "Character set match returned value < 0 (%i): probable bug in text processing engine",
-					 test
-				);
-			} else if (test) {
-				childPosition++;
-			}
-			break;
-		}
-	default:
-		{
-			childReturnCode = ERROR_CODE;
-			errorType = PyExc_ValueError;
-			errorMessage = PyString_FromFormat(
-				 "Unrecognised Low-Level command code %i, maximum low-level code is %i",
-				 command,
-				 MATCH_MAX_LOWLEVEL
-			);
-		}
-/* end of the switch, this child is finished */
-}
-} /* end of the wrapping if-check */
-
-/* simple determination for these commands (hence calling them low-level) */
-if (childReturnCode == NULL_CODE) {
-	if (childPosition > childStart) {
-		childReturnCode = SUCCESS_CODE;
-	} else {
-		childReturnCode = FAILURE_CODE;
-	}
-}
+/* Low-level matching commands code fragment
+
+  The contract here is:
+
+	all commands move forward through the buffer
+	
+	failure to move forward indicates failure of the tag
+	
+	moving forward indicates success of the tag
+	
+	errors may be indicated if encountered in childReturnCode and the error* variables
+	
+	only childPosition should be updated otherwise
+
+*/
+TE_CHAR *m = TE_STRING_AS_STRING(match);
+if (m == NULL) {
+	childReturnCode = ERROR_CODE;
+	errorType = PyExc_TypeError;
+	errorMessage = PyString_FromFormat(
+		 "Low-level command (%i) argument in entry %d couldn't be converted to a string object, is a %.50s",
+		 command,
+		 (unsigned int)index,
+		 textobj->ob_type->tp_name
+
+	);
+} else {
+
+switch (command) {
+
+	case MATCH_ALLIN:
+
+		{
+			register Py_ssize_t ml = TE_STRING_GET_SIZE(match);
+			register TE_CHAR *tx = &text[childPosition];
+
+			DPRINTF("\nAllIn :\n"
+				" looking for   = '%.40s'\n"
+				" in string     = '%.40s'\n",m,tx);
+
+			if (ml > 1) {
+				for (; childPosition < sliceright; tx++, childPosition++) {
+					register Py_ssize_t j;
+					register TE_CHAR *mj = m;
+					register TE_CHAR ctx = *tx;
+					for (j=0; j < ml && ctx != *mj; mj++, j++) ;
+					if (j == ml) break;
+				}
+			} else if (ml == 1) {
+				/* one char only: use faster variant: */
+				for (; childPosition < sliceright && *tx == *m; tx++, childPosition++) ;
+			}
+			break;
+		}
+
+	case MATCH_ALLNOTIN:
+
+		{
+			register Py_ssize_t ml = TE_STRING_GET_SIZE(match);
+			register TE_CHAR *tx = &text[childPosition];
+
+			DPRINTF("\nAllNotIn :\n"
+				" looking for   = '%.40s'\n"
+				" not in string = '%.40s'\n",m,tx);
+
+			if (ml != 1) {
+				for (; childPosition < sliceright; tx++, childPosition++) {
+					register Py_ssize_t j;
+					register TE_CHAR *mj = m;
+					register TE_CHAR ctx = *tx;
+					for (j=0; j < ml && ctx != *mj; mj++, j++) ;
+					if (j != ml) break;
+				}
+			} else {
+				/* one char only: use faster variant: */
+				for (; childPosition < sliceright && *tx != *m; tx++, childPosition++) ;
+			}
+			break;
+		}
+
+	case MATCH_IS: 
+		
+		{
+			DPRINTF("\nIs :\n"
+				" looking for   = '%.40s'\n"
+				" in string     = '%.40s'\n",m,text+childPosition);
+
+			if (childPosition < sliceright && *(&text[childPosition]) == *m) {
+				childPosition++;
+			}
+			break;
+		}
+
+	case MATCH_ISIN:
+
+	{
+		register Py_ssize_t ml = TE_STRING_GET_SIZE(match);
+		register TE_CHAR ctx = text[childPosition];
+
+		DPRINTF("\nIsIn :\n"
+			" looking for   = '%.40s'\n"
+			" in string     = '%.40s'\n",m,text+childPosition);
+
+		if (ml > 0 && childPosition < sliceright) {
+		register Py_ssize_t j;
+		register TE_CHAR *mj = m;
+		for (j=0; j < ml && ctx != *mj; mj++, j++) ;
+		if (j != ml) childPosition++;
+		}
+
+		break;
+	}
+
+	case MATCH_ISNOTIN:
+
+	{
+		register Py_ssize_t ml = TE_STRING_GET_SIZE(match);
+		register TE_CHAR ctx = text[childPosition];
+
+		DPRINTF("\nIsNotIn :\n"
+			" looking for   = '%.40s'\n"
+			" not in string = '%.40s'\n",m,text+childPosition);
+
+		if (ml > 0 && childPosition < sliceright) {
+		register Py_ssize_t j;
+		register TE_CHAR *mj = m;
+		for (j=0; j < ml && ctx != *mj; mj++, j++) ;
+		if (j == ml) childPosition++;
+		}
+		else
+		childPosition++;
+
+		break;
+	}
+
+	case MATCH_WORD:
+
+	{
+		Py_ssize_t ml1 = TE_STRING_GET_SIZE(match) - 1;
+		register TE_CHAR *tx = &text[childPosition + ml1];
+		register Py_ssize_t j = ml1;
+		register TE_CHAR *mj = &m[j];
+
+		DPRINTF("\nWord :\n"
+			" looking for   = '%.40s'\n"
+			" in string     = '%.40s'\n",m,&text[childPosition]);
+
+		if (childPosition+ml1 >= sliceright) break;
+		
+		/* compare from right to left */
+		for (; j >= 0 && *tx == *mj;
+		 tx--, mj--, j--) ;
+
+		if (j >= 0) /* not matched */
+		childPosition = startPosition; /* reset */
+		else
+		childPosition += ml1 + 1;
+		break;
+	}
+
+	case MATCH_WORDSTART:
+	case MATCH_WORDEND:
+
+	{
+		Py_ssize_t ml1 = TE_STRING_GET_SIZE(match) - 1;
+
+		if (ml1 >= 0) {
+		register TE_CHAR *tx = &text[childPosition];
+			
+		DPRINTF("\nWordStart/End :\n"
+			" looking for   = '%.40s'\n"
+			" in string     = '%.40s'\n",m,tx);
+
+		/* Brute-force method; from right to left */
+		for (;;) {
+			register Py_ssize_t j = ml1;
+			register TE_CHAR *mj = &m[j];
+
+			if (childPosition+j >= sliceright) {
+			/* reached eof: no match, rewind */
+			childPosition = startPosition;
+			break;
+			}
+
+			/* scan from right to left */
+			for (tx += j; j >= 0 && *tx == *mj; 
+			 tx--, mj--, j--) ;
+			/*
+			DPRINTF("match text[%i+%i]: %c == %c\n",
+					childPosition,j,*tx,*mj);
+			*/
+
+			if (j < 0) {
+			/* found */
+			if (command == MATCH_WORDEND) childPosition += ml1 + 1;
+			break;
+			}
+			/* not found: rewind and advance one char */
+			tx -= j - 1;
+			childPosition++;
+		}
+		}
+
+		break;
+	}
+
+#if (TE_TABLETYPE == MXTAGTABLE_STRINGTYPE)
+
+	/* Note: These two only work for 8-bit set strings. */
+	case MATCH_ALLINSET:
+
+	{
+		register TE_CHAR *tx = &text[childPosition];
+		unsigned char *m = (unsigned char *)PyString_AS_STRING(match);
+
+		DPRINTF("\nAllInSet :\n"
+			" looking for   = set at 0x%lx\n"
+			" in string     = '%.40s'\n",(long)match,tx);
+
+		for (;
+		 childPosition < sliceright &&
+		 (m[((unsigned char)*tx) >> 3] & 
+		  (1 << (*tx & 7))) > 0;
+		 tx++, childPosition++) ;
+
+		break;
+	}
+
+	case MATCH_ISINSET:
+
+	{
+		register TE_CHAR *tx = &text[childPosition];
+		unsigned char *m = (unsigned char *)PyString_AS_STRING(match);
+
+		DPRINTF("\nIsInSet :\n"
+			" looking for   = set at 0x%lx\n"
+			" in string     = '%.40s'\n",(long)match,tx);
+
+		if (childPosition < sliceright &&
+		(m[((unsigned char)*tx) >> 3] & 
+		 (1 << (*tx & 7))) > 0)
+		childPosition++;
+
+		break;
+	}
+
+#endif
+
+	case MATCH_ALLINCHARSET:
+
+	{
+		Py_ssize_t matching;
+
+		DPRINTF("\nAllInCharSet :\n"
+			" looking for   = CharSet at 0x%lx\n"
+			" in string     = '%.40s'\n",
+			(long)match, &text[childPosition]);
+		
+		matching = mxCharSet_Match(match,
+					   textobj,
+					   childPosition,
+					   sliceright,
+					   1);
+		if (matching < 0) {
+			childReturnCode = ERROR_CODE;
+			errorType = PyExc_SystemError;
+			errorMessage = PyString_FromFormat(
+				 "Character set match returned value < 0 (%d): probable bug in text processing engine",
+				 (unsigned int)matching
+			);
+		} else {
+			childPosition += matching;
+		}
+		break;
+	}
+
+	case MATCH_ISINCHARSET:
+
+		{
+			int test;
+
+			DPRINTF("\nIsInCharSet :\n"
+				" looking for   = CharSet at 0x%lx\n"
+				" in string     = '%.40s'\n",
+				(long)match, &text[childPosition]);
+
+#if (TE_TABLETYPE == MXTAGTABLE_STRINGTYPE)
+			test = mxCharSet_ContainsChar(match, text[childPosition]);
+#else
+			test = mxCharSet_ContainsUnicodeChar(match, text[childPosition]);
+#endif
+			if (test < 0) {
+				childReturnCode = ERROR_CODE;
+				errorType = PyExc_SystemError;
+				errorMessage = PyString_FromFormat(
+					 "Character set match returned value < 0 (%i): probable bug in text processing engine",
+					 test
+				);
+			} else if (test) {
+				childPosition++;
+			}
+			break;
+		}
+	default:
+		{
+			childReturnCode = ERROR_CODE;
+			errorType = PyExc_ValueError;
+			errorMessage = PyString_FromFormat(
+				 "Unrecognised Low-Level command code %i, maximum low-level code is %i",
+				 command,
+				 MATCH_MAX_LOWLEVEL
+			);
+		}
+/* end of the switch, this child is finished */
+}
+} /* end of the wrapping if-check */
+
+/* simple determination for these commands (hence calling them low-level) */
+if (childReturnCode == NULL_CODE) {
+	if (childPosition > childStart) {
+		childReturnCode = SUCCESS_CODE;
+	} else {
+		childReturnCode = FAILURE_CODE;
+	}
+}

stt/TextTools/mxTextTools/mcfpyapi.h

-/* Marc-Andre's Hex version determination code */
-#ifndef PY_VERSION_HEX
-# if PYTHON_API_VERSION == 1007
-#  define PY_VERSION_HEX 0x010500F0
-# endif
-# if PYTHON_API_VERSION == 1006
-#  define PY_VERSION_HEX 0x010400F0
-# endif
-# if PYTHON_API_VERSION < 1006
-#  define PY_VERSION_HEX 0
-# endif
-#endif
-
-
-#if PY_HEX_VERSION < 0x02020000
-/*  Python 2.2 features backported to earlier Python versions */
-
-#ifndef PYSTRING_FROMFORMAT_BACKPORT
-#define PYSTRING_FROMFORMAT_BACKPORT
-/* PyString_FromFormat back-porting code
-
-	There are no docs for when PyString_FromFormat shows up that I can see,
-	appears to be Python version 2.2.0
-
-	This PyString_FromFormat back-porting code is from Python 2.2.1:
-		Copyright (c) 2001, 2002 Python Software Foundation.
-		All Rights Reserved.
-
-		Copyright (c) 2000 BeOpen.com.
-		All Rights Reserved.
-
-		Copyright (c) 1995-2001 Corporation for National Research Initiatives.
-		All Rights Reserved.
-
-		Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
-		All Rights Reserved.
-
-*/
-
-
-#include <ctype.h>
-PyObject *
-PyString_FromFormatV(const char *format, va_list vargs)
-{
-	va_list count;
-	int n = 0;
-	const char* f;
-	char *s;
-	PyObject* string;
-
-#ifdef VA_LIST_IS_ARRAY
-	memcpy(count, vargs, sizeof(va_list));
-#else
-	count = vargs;
-#endif
-	/* step 1: figure out how large a buffer we need */
-	for (f = format; *f; f++) {
-		if (*f == '%') {
-			const char* p = f;
-			while (*++f && *f != '%' && !isalpha(Py_CHARMASK(*f)))
-				;
-
-			/* skip the 'l' in %ld, since it doesn't change the
-			   width.  although only %d is supported (see
-			   "expand" section below), others can be easily
-			   added */
-			if (*f == 'l' && *(f+1) == 'd')
-				++f;
-			
-			switch (*f) {
-			case 'c':
-				(void)va_arg(count, int);
-				/* fall through... */
-			case '%':
-				n++;
-				break;
-			case 'd': case 'i': case 'x':
-				(void) va_arg(count, int);
-				/* 20 bytes is enough to hold a 64-bit
-				   integer.  Decimal takes the most space.
-				   This isn't enough for octal. */
-				n += 20;
-				break;
-			case 's':
-				s = va_arg(count, char*);
-				n += strlen(s);
-				break;
-			case 'p':
-				(void) va_arg(count, int);
-				/* maximum 64-bit pointer representation:
-				 * 0xffffffffffffffff
-				 * so 19 characters is enough.
-				 * XXX I count 18 -- what's the extra for?
-				 */
-				n += 19;
-				break;
-			default:
-				/* if we stumble upon an unknown
-				   formatting code, copy the rest of
-				   the format string to the output
-				   string. (we cannot just skip the
-				   code, since there's no way to know
-				   what's in the argument list) */ 
-				n += strlen(p);
-				goto expand;
-			}
-		} else
-			n++;
-	}
- expand:
-	/* step 2: fill the buffer */
-	/* Since we've analyzed how much space we need for the worst case,
-	   use sprintf directly instead of the slower PyOS_snprintf. */
-	string = PyString_FromStringAndSize(NULL, n);
-	if (!string)
-		return NULL;
-	
-	s = PyString_AsString(string);
-
-	for (f = format; *f; f++) {
-		if (*f == '%') {
-			const char* p = f++;
-			int i, longflag = 0;
-			/* parse the width.precision part (we're only
-			   interested in the precision value, if any) */
-			n = 0;
-			while (isdigit(Py_CHARMASK(*f)))
-				n = (n*10) + *f++ - '0';
-			if (*f == '.') {
-				f++;
-				n = 0;
-				while (isdigit(Py_CHARMASK(*f)))
-					n = (n*10) + *f++ - '0';
-			}
-			while (*f && *f != '%' && !isalpha(Py_CHARMASK(*f)))
-				f++;
-			/* handle the long flag, but only for %ld.  others
-			   can be added when necessary. */
-			if (*f == 'l' && *(f+1) == 'd') {
-				longflag = 1;
-				++f;
-			}
-
-			switch (*f) {
-			case 'c':
-				*s++ = va_arg(vargs, int);
-				break;
-			case 'd':
-				if (longflag)
-					sprintf(s, "%ld", va_arg(vargs, long));
-				else
-					sprintf(s, "%d", va_arg(vargs, int));
-				s += strlen(s);
-				break;
-			case 'i':
-				sprintf(s, "%i", va_arg(vargs, int));
-				s += strlen(s);
-				break;
-			case 'x':
-				sprintf(s, "%x", va_arg(vargs, int));
-				s += strlen(s);
-				break;
-			case 's':
-				p = va_arg(vargs, char*);
-				i = strlen(p);
-				if (n > 0 && i > n)
-					i = n;
-				memcpy(s, p, i);
-				s += i;
-				break;
-			case 'p':
-				sprintf(s, "%p", va_arg(vargs, void*));
-				/* %p is ill-defined:  ensure leading 0x. */
-				if (s[1] == 'X')
-					s[1] = 'x';
-				else if (s[1] != 'x') {
-					memmove(s+2, s, strlen(s)+1);
-					s[0] = '0';
-					s[1] = 'x';
-				}
-				s += strlen(s);
-				break;
-			case '%':
-				*s++ = '%';
-				break;
-			default:
-				strcpy(s, p);
-				s += strlen(s);
-				goto end;
-			}
-		} else
-			*s++ = *f;
-	}
-	
- end:
-	_PyString_Resize(&string, s - PyString_AS_STRING(string));
-	return string;
-}
-	
-PyObject *
-PyString_FromFormat(const char *format, ...) 
-{
-	PyObject* ret;
-	va_list vargs;
-
-#ifdef HAVE_STDARG_PROTOTYPES
-	va_start(vargs, format);
-#else
-	va_start(vargs);
-#endif
-	ret = PyString_FromFormatV(format, vargs);
-	va_end(vargs);
-	return ret;
-}
-/* end PyString_FromFormat back-porting code */
-#endif /* PYSTRING_FROMFORMAT_BACKPORT */
-
-#endif /* < Python 2.2 */
-
+/* Marc-Andre's Hex version determination code */
+#ifndef PY_VERSION_HEX
+# if PYTHON_API_VERSION == 1007
+#  define PY_VERSION_HEX 0x010500F0
+# endif
+# if PYTHON_API_VERSION == 1006
+#  define PY_VERSION_HEX 0x010400F0
+# endif
+# if PYTHON_API_VERSION < 1006
+#  define PY_VERSION_HEX 0
+# endif
+#endif
+
+
+#if 0
+/* PY_HEX_VERSION < 0x02020000 */
+/*  Python 2.2 features backported to earlier Python versions */
+
+
+#ifndef PYSTRING_FROMFORMAT_BACKPORT
+#define PYSTRING_FROMFORMAT_BACKPORT
+/* PyString_FromFormat back-porting code
+
+	There are no docs for when PyString_FromFormat shows up that I can see,
+	appears to be Python version 2.2.0
+
+	This PyString_FromFormat back-porting code is from Python 2.2.1:
+		Copyright (c) 2001, 2002 Python Software Foundation.
+		All Rights Reserved.
+
+		Copyright (c) 2000 BeOpen.com.
+		All Rights Reserved.
+
+		Copyright (c) 1995-2001 Corporation for National Research Initiatives.
+		All Rights Reserved.
+
+		Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
+		All Rights Reserved.
+
+*/
+
+
+#include <ctype.h>
+PyObject *
+PyString_FromFormatV(const char *format, va_list vargs)
+{
+	va_list count;
+	int n = 0;
+	const char* f;
+	char *s;
+	PyObject* string;
+
+#ifdef VA_LIST_IS_ARRAY
+	memcpy(count, vargs, sizeof(va_list));
+#else
+	count = vargs;
+#endif
+	/* step 1: figure out how large a buffer we need */
+	for (f = format; *f; f++) {
+		if (*f == '%') {
+			const char* p = f;
+			while (*++f && *f != '%' && !isalpha(Py_CHARMASK(*f)))
+				;
+
+			/* skip the 'l' in %ld, since it doesn't change the
+			   width.  although only %d is supported (see
+			   "expand" section below), others can be easily
+			   added */
+			if (*f == 'l' && *(f+1) == 'd')
+				++f;
+			
+			switch (*f) {
+			case 'c':
+				(void)va_arg(count, int);
+				/* fall through... */
+			case '%':
+				n++;
+				break;
+			case 'd': case 'i': case 'x':
+				(void) va_arg(count, int);
+				/* 20 bytes is enough to hold a 64-bit
+				   integer.  Decimal takes the most space.
+				   This isn't enough for octal. */
+				n += 20;
+				break;
+			case 's':
+				s = va_arg(count, char*);
+				n += strlen(s);
+				break;
+			case 'p':
+				(void) va_arg(count, int);
+				/* maximum 64-bit pointer representation:
+				 * 0xffffffffffffffff
+				 * so 19 characters is enough.
+				 * XXX I count 18 -- what's the extra for?
+				 */
+				n += 19;
+				break;
+			default:
+				/* if we stumble upon an unknown
+				   formatting code, copy the rest of
+				   the format string to the output
+				   string. (we cannot just skip the
+				   code, since there's no way to know
+				   what's in the argument list) */ 
+				n += strlen(p);
+				goto expand;
+			}
+		} else
+			n++;
+	}
+ expand:
+	/* step 2: fill the buffer */
+	/* Since we've analyzed how much space we need for the worst case,
+	   use sprintf directly instead of the slower PyOS_snprintf. */
+	string = PyString_FromStringAndSize(NULL, n);
+	if (!string)
+		return NULL;
+	
+	s = PyString_AsString(string);
+
+	for (f = format; *f; f++) {
+		if (*f == '%') {
+			const char* p = f++;
+			int i, longflag = 0;
+			/* parse the width.precision part (we're only
+			   interested in the precision value, if any) */
+			n = 0;
+			while (isdigit(Py_CHARMASK(*f)))
+				n = (n*10) + *f++ - '0';
+			if (*f == '.') {
+				f++;
+				n = 0;
+				while (isdigit(Py_CHARMASK(*f)))
+					n = (n*10) + *f++ - '0';
+			}
+			while (*f && *f != '%' && !isalpha(Py_CHARMASK(*f)))
+				f++;
+			/* handle the long flag, but only for %ld.  others
+			   can be added when necessary. */
+			if (*f == 'l' && *(f+1) == 'd') {
+				longflag = 1;
+				++f;
+			}
+
+			switch (*f) {
+			case 'c':
+				*s++ = va_arg(vargs, int);
+				break;
+			case 'd':
+				if (longflag)
+					sprintf(s, "%ld", va_arg(vargs, long));
+				else
+					sprintf(s, "%d", va_arg(vargs, int));
+				s += strlen(s);
+				break;
+			case 'i':
+				sprintf(s, "%i", va_arg(vargs, int));
+				s += strlen(s);
+				break;
+			case 'x':
+				sprintf(s, "%x", va_arg(vargs, int));
+				s += strlen(s);
+				break;
+			case 's':
+				p = va_arg(vargs, char*);
+				i = strlen(p);
+				if (n > 0 && i > n)
+					i = n;
+				memcpy(s, p, i);
+				s += i;
+				break;
+			case 'p':
+				sprintf(s, "%p", va_arg(vargs, void*));
+				/* %p is ill-defined:  ensure leading 0x. */
+				if (s[1] == 'X')
+					s[1] = 'x';
+				else if (s[1] != 'x') {
+					memmove(s+2, s, strlen(s)+1);
+					s[0] = '0';
+					s[1] = 'x';
+				}
+				s += strlen(s);
+				break;
+			case '%':
+				*s++ = '%';
+				break;
+			default:
+				strcpy(s, p);
+				s += strlen(s);
+				goto end;
+			}
+		} else
+			*s++ = *f;
+	}
+	
+ end:
+	_PyString_Resize(&string, s - PyString_AS_STRING(string));
+	return string;
+}
+	
+PyObject *
+PyString_FromFormat(const char *format, ...) 
+{
+	PyObject* ret;
+	va_list vargs;
+
+#ifdef HAVE_STDARG_PROTOTYPES
+	va_start(vargs, format);
+#else
+	va_start(vargs);
+#endif
+	ret = PyString_FromFormatV(format, vargs);
+	va_end(vargs);
+	return ret;
+}
+/* end PyString_FromFormat back-porting code */
+#endif /* PYSTRING_FROMFORMAT_BACKPORT */
+
+#endif /* < Python 2.2 */
+

stt/TextTools/mxTextTools/mxTextTools.c

 PyObject *mxTextTools_ToUpper(void)
 {
     char tr[256];
-    int i;
+    Py_ssize_t i;
     
     for (i = 0; i < 256; i++)
 	tr[i] = toupper((char)i);
 PyObject *mxTextTools_ToLower(void)
 {
     char tr[256];
-    int i;
+    Py_ssize_t i;
     
     for (i = 0; i < 256; i++)
 	tr[i] = tolower((char)i);
 /* Get the match length from an TextSearch object or -1 in case of an
    error. */
 
-int mxTextSearch_MatchLength(PyObject *self)
+Py_ssize_t mxTextSearch_MatchLength(PyObject *self)
 {
     Py_Assert(mxTextSearch_Check(self),
 	      PyExc_TypeError,
 }
 
 static
-int trivial_search(const char *text,
-		   int start,
-		   int stop,
+Py_ssize_t trivial_search(const char *text,
+		   Py_ssize_t start,
+		   Py_ssize_t stop,
 		   const char *match,
-		   int match_len)
+		   Py_ssize_t match_len)
 {
-    int ml1 = match_len - 1;
+    Py_ssize_t ml1 = match_len - 1;
     register const char *tx = &text[start];
-    register int x = start;
+    register Py_ssize_t x = start;
 
     if (ml1 < 0) 
 	return start;
 
     /* Brute-force method; from right to left */
     for (;;) {
-	register int j = ml1;
+	register Py_ssize_t j = ml1;
 	register const char *mj = &match[j];
 
 	if (x + j >= stop)
 
 #ifdef HAVE_UNICODE
 static
-int trivial_unicode_search(const Py_UNICODE *text,
-			   int start,
-			   int stop,
+Py_ssize_t trivial_unicode_search(const Py_UNICODE *text,
+			   Py_ssize_t start,
+			   Py_ssize_t stop,
 			   const Py_UNICODE *match,
-			   int match_len)
+			   Py_ssize_t match_len)
 {
-    int ml1 = match_len - 1;
+    Py_ssize_t ml1 = match_len - 1;
     register const Py_UNICODE *tx = &text[start];
-    register int x = start;
+    register Py_ssize_t x = start;
 
     if (ml1 < 0) 
 	return start;
 
     /* Brute-force method; from right to left */
     for (;;) {
-	register int j = ml1;
+	register Py_ssize_t j = ml1;
 	register const Py_UNICODE *mj = &match[j];
 
 	if (x + j >= stop)
 
 */
 
-int mxTextSearch_SearchBuffer(PyObject *self,
+Py_ssize_t mxTextSearch_SearchBuffer(PyObject *self,
 			      char *text,
-			      int start,
-			      int stop,
-			      int *sliceleft,
-			      int *sliceright)
+			      Py_ssize_t start,
+			      Py_ssize_t stop,
+			      Py_ssize_t *sliceleft,
+			      Py_ssize_t *sliceright)
 {
-    int nextpos;
-    int match_len;
+    Py_ssize_t nextpos;
+    Py_ssize_t match_len;
 
     Py_Assert(mxTextSearch_Check(self),
 	      PyExc_TypeError,
 }
 
 #ifdef HAVE_UNICODE
-int mxTextSearch_SearchUnicode(PyObject *self,
+Py_ssize_t mxTextSearch_SearchUnicode(PyObject *self,
 			       Py_UNICODE *text,
-			       int start,
-			       int stop,
-			       int *sliceleft,
-			       int *sliceright)
+			       Py_ssize_t start,
+			       Py_ssize_t stop,
+			       Py_ssize_t *sliceleft,
+			       Py_ssize_t *sliceright)
 {
-    int nextpos;
-    int match_len;
+    Py_ssize_t nextpos;
+    Py_ssize_t match_len;
 
     Py_Assert(mxTextSearch_Check(self),
 	      PyExc_TypeError,
 	       "where the substring was found, (start,start) otherwise.")
 {
     PyObject *text;
-    int start = 0;
-    int stop = INT_MAX;
-    int sliceleft, sliceright;
+    Py_ssize_t start = 0;
+    Py_ssize_t stop = INT_MAX;
+    Py_ssize_t sliceleft, sliceright;
     int rc;
 
     Py_Get3Args("O|ii:TextSearch.search",
 	       "where the substring was found, -1 otherwise.")
 {
     PyObject *text;
-    int start = 0;
-    int stop = INT_MAX;
-    int sliceleft, sliceright;
+    Py_ssize_t start = 0;
+    Py_ssize_t stop = INT_MAX;
+    Py_ssize_t sliceleft, sliceright;
     int rc;
 
     Py_Get3Args("O|ii:TextSearch.find",
 {
     PyObject *text;
     PyObject *list = 0;
-    int start = 0;
-    int stop = INT_MAX;