Commits

clach04  committed 1996d3f Draft

Jython 2.2 support, fixes problem with urls being duplicated.
URLs such as "http://google.com"; end up with href="http://http://google.com";.

Test suite changes for jython, before:

Ran 72 tests in 1.951s

FAILED (failures=19, errors=20)

After:

Ran 72 tests in 3.569s

FAILED (failures=16, errors=18)

This is not as clean as a Python 2.4 run but it is an improvement.

Original attempt at a fix simply converted tokens into unicode, e.g.:

def handle_url(self, t):
- if not protocol_pattern.match(t):
+ if not protocol_pattern.match(unicode(t)):

This fixed plain URLs, for example tests like test_simple_url_1(), but not youtube links, for example tests like test_youtube_embed_1()

Comment from Matt on this original attempt:

matt chisholm Date 2012-03-14

I think I understand this. The variable t is an instance of class Token, which derives from unicode. The re.match method in Jython or Python 2.2 probably fails when passed something that is not an instance of str or unicode, and this probably works in later versions of Python because the built-in types unicode and str didn't both inherit from basestring until 2.3. Casting the Token instance to unicode allows re.match to operate on it.

I think this is the wrong way to fix this for Python 2.2; if I add another pattern match against a token somewhere else, it will break in the same way, require the same process of deduction to figure out what's going on, and then require another similar patch. A better fix would fix it systemwide in one place, although I don't know what that would look like.

PottyMouth was written for 2.4 originally, was never intended to support 2.2, and 2.2 is almost ten years old. I don't want to clutter the code with patches for 2.2.

  • Participants
  • Parent commits 724196e

Comments (0)

Files changed (1)

File python/pottymouth.py

 short_line_length = 50
 encoding = 'utf8' # Default output encoding
 
+if sys.version_info < (2, 3):
+    # Monkey patch re (regex) compiled objects so that .match()
+    # params are always converted into Unicode, only tested with Jython 2.2.1
+    # PottyMouth Tokens are derived from Unicode unicode/str only derived
+    # from basestring in python 2.3
+    
+    
+    class ProxyPatternObject(object):
+        """Simple proxy, implementes an alternative method"""
+        
+        def __init__(self, thing_to_wrap):
+            self.__thing = thing_to_wrap
+            # thing maybe a class or an object/function
+        
+        def __call__(self, *args, **kwargs):
+            return self.__thing(*args, **kwargs)
+        
+        def __getattr__(self, attr):
+            if self.__dict__.has_key(attr):
+                return self.__dict__[attr]
+            else:
+                return getattr(self.__thing, attr)
+        
+        def __setattr__(self, attr, value):
+            thing_name = '_%s__thing' % self.__class__.__name__
+            # Check if attribute(s) is being set in the __init__ method
+            if attr == thing_name:
+                return dict.__setattr__(self, attr, value)
+            else:
+                # set it in the thing
+                setattr(self.__thing, attr, value)
+
+        def match(self, x):
+            return self.__thing.match(unicode(x))
+    
+    
+    re_compile = re.compile
+    
+    def wrapped_re_compile(*args, **kwargs):
+        print 'mmm, curry wrapped_re_compile'
+        result = re_compile(*args, **kwargs)
+        return ProxyPatternObject(result)
+    
+    re.compile = wrapped_re_compile
 
 class TokenMatcher(object):
 
                    url_white_lists=('https?://www\.mysite\.com/allowed/url\?id=\d+',),
                    )
 
-    if not sys.stdin.isatty():
-        parse_and_print(w, sys.stdin.read())
-        raise SystemExit(0)
-    elif len(sys.argv) >= 2:
+    if len(sys.argv) >= 2:
         # simple command line processing of file names
         for i, filename in enumerate(sys.argv[1:]):
             if i: print '=' * 70
             fileobj.close()
             parse_and_print(w, text)
         raise SystemExit(0)
+    elif not sys.stdin.isatty():
+        parse_and_print(w, sys.stdin.read())
+        raise SystemExit(0)
 
     EOF_DESCRIPTION = 'Ctrl-D'
     if sys.platform == 'win32':