* operator not working correctly with sub()

Issue #106 resolved
Anonymous created an issue
>>> regex.sub('.*', 'x', 'test')
u'xx' <--- This is wrong

>>> regex.sub('.+', 'x', 'test')
u'x'

>>> re.sub('.*', 'x', 'test')
u'x' <--- This is correct

>>> regex.sub('.*?', '|', 'test')
u'|||||||||' <--- This is wrong

>>> re.sub('.*?', '|', 'test')
u'|t|e|s|t|' <--- This is correct

python 2.7 64-bit linux, compiled from source regex version 2.4.39

Comments (3)

  1. Anonymous

    How it should behave is a bit of a grey area.

    The re module says 'x' and '|t|e|s|t|'.

    Perl and PCRE says 'xx' and '|||||||||'.

    This is because .* and .*? can/could match 0 characters after matching the >0 characters, and there are cases where the re module definitely gets it wrong, so it's not clear whether the re module is getting it right here.

  2. Anonymous

    Hmm well I dont really have an opinion. The behavior of re seems intuitively correct to me, but then that may just be because I have been using re for years.

    I just thought I'd report the discrepancy, as one of the goals of regex is (as I understand it) to replace re as seamlessly as possible.

  3. Log in to comment