Invalid match when using negative lookbehind and pipe

Create issue
Issue #216 resolved
Marti R. created an issue

Another bug discovered with Topy (https://github.com/intgr/topy). When using negative lookbehind with an expression containing | then it seems it's ignored entirely:

In [1]: import regex
In [2]: regex.match('foo(?<!foo|x)', 'foo')
Out[2]: <regex.Match object; span=(0, 3), match='foo'>

But I think it should be equivalent to the following regex, which works as I expect:

In [3]: regex.match('foo(?<!foo)(?<!x)', 'foo')
Out[3]: None

Using Python 3.5.1, 64-bit on OS X, from HomeBrew.

Also works correctly with built-in re:

In [4]: import re
In [5]: re.match('foo(?<!foo|fox)', 'foo')
OUt[5]: None

Comments (5)

  1. animalize

    Still matches in .FULLCASE mode on regex 2016.07.14, run this in UTF-8 envirement:

    print(regex.search(r'(?fi)^.*$(?!<ß|x)', 'ss'))
    print(regex.search(r'(?fi)^.*$(?!<ss|x)', 'ß'))
    

    output:

    <regex.Match object; span=(0, 2), match='ss'>
    <regex.Match object; span=(0, 1), match='ß'>
    

    Full-case folding and fuzzy matching are two code bombs, they have made too much trouble.
    I'm expecting to see one day you cut them off under the impulsion. (half seriously, half in jest.)

  2. Matthew Barnett repo owner

    @animalize: negative lookbehind starts (?<!; you have (?!<, which is a negative lookahead whose pattern starts with <.

    This is weird: the email that arrived in my inbox says "regex 2016.07.04" instead of "regex 2016.07.14", and doesn't have the text below the second box...

  3. animalize

    Sorry.

    I had edited it after post. But when I clicked submit button, the server took one minute to response, maybe there was something wrong with Bitbucket server that didn't record the revise.

  4. Log in to comment