Got an exception using PyPy

Issue #221 resolved
Chuancong Gao created an issue

When using the latest version of PyPy on OS X El Capitan, the following exception happens. Regex is imported by another module agate. It seems like a bug.

  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 345, in compile
    return _compile(pattern, flags, kwargs)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 535, in _compile
    req_offset, req_chars, req_flags = _get_required_string(parsed, info.flags)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 4236, in _get_required_string
    req_offset, required = parsed.get_required_string(bool(flags & REVERSE))
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 1910, in get_required_string
    return self.max_width(), None
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 2411, in max_width
    return max(b.max_width() for b in self.branches)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 2411, in <genexpr>
    return max(b.max_width() for b in self.branches)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 3454, in max_width
    return sum(s.max_width() for s in self.items)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 3454, in <genexpr>
    return sum(s.max_width() for s in self.items)
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 3903, in max_width[])
  File "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/", line 3902, in <genexpr>
    return max(len(_regex.fold_case(fold_flags, i)) for i in
SystemError: An exception was set, but function returned a value

  1. Matthew Barnett repo owner

    What was the pattern?

    Could you edit "/Users/genaminer/.virtualenv/pypy-5.4.1/site-packages/" to print out (using 'ascii') the pattern, the flags and kwargs just before calling _compile?

    It might be difficult to track down the bug if can't reproduce it.

  2. Chuancong Gao reporter

    Sorry for the very late reply. I was travelling recently. I just repeated this problem on macOS Sierra using PyPy 5.4.1. Python 2.7.12 and PyPy 5.3.0 works fine.

    I added the following statement in _compile (line 417) in

    print pattern, flags, kwargs

    and got the following output:

     0 {}
                \p{Uppercase_Letter} {2,}                          # 2 or more adjacent letters - UP always
                \p{Uppercase_Letter}                               # target one uppercase letter, then
                        [^\p{Lowercase_Letter}…\p{Term}--,،﹐,]+    # not chars breaks possible UP (…abc.?!:;)
                        \p{Uppercase_Letter} {2}                   # and 2 uppercase letters
                    \p{Uppercase_Letter} {2}                       # 2 uppercase letters
                    [^\p{Lowercase_Letter}…\p{Term}--,،﹐,]+       # not chars breaks possible UP (…abc.?!:;), then
                \p{Uppercase_Letter}                               # target one uppercase letter, then
                        \p{Lowercase_Letter}                       # not lowercase letter
                        […\p{Term}--,،﹐,]\p{Uppercase_Letter}      # and not dot (.?…!:;) with uppercase letter
         320 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+ 2 {}
    [^\p{AlNum}]+|(?<!\p{AlNum})(?:\L<stop_words>)(?!\p{AlNum}) 2 {'stop_words': ('a', 'an', 'the')}
  3. Matthew Barnett repo owner

    I've updated the sources in this repository (but not on PyPI) with the hope that it'll reveal what the exception is, because I have no idea what the problem is!

  4. Chuancong Gao reporter

    I installed the version in source. Now I get this error:

    debug: OperationError:
    debug:  operror-type: TypeError
    debug:  operror-value: exceptions must be old-style classes or derived from BaseException, not NotImplemented

    The issue can be reproduced easily by using the below code. It only happens when re.IGNORECASE is set.

    import regex as re
    x = r'(?:\L<stop_words>)'
    y = ('test',)
    re.compile(x, re.IGNORECASE, stop_words=y)
  5. Matthew Barnett repo owner

    Fixed in regex 2016.10.22.

    Bytestrings are usually handled via the buffer protocol, but PyPy was complaining for some reason (very strange!), so I've coded around it...

