* operator not working correctly with sub()

Create issue
Issue #106 resolved
Former user created an issue
>>> regex.sub('.*', 'x', 'test')
u'xx' <--- This is wrong

>>> regex.sub('.+', 'x', 'test')

>>> re.sub('.*', 'x', 'test')
u'x' <--- This is correct

>>> regex.sub('.*?', '|', 'test')
u'|||||||||' <--- This is wrong

>>> re.sub('.*?', '|', 'test')
u'|t|e|s|t|' <--- This is correct

python 2.7 64-bit linux, compiled from source regex version 2.4.39

Comments (3)

  1. Former user Account Deleted

    How it should behave is a bit of a grey area.

    The re module says 'x' and '|t|e|s|t|'.

    Perl and PCRE says 'xx' and '|||||||||'.

    This is because .* and .*? can/could match 0 characters after matching the >0 characters, and there are cases where the re module definitely gets it wrong, so it's not clear whether the re module is getting it right here.

  2. Former user Account Deleted

    Hmm well I dont really have an opinion. The behavior of re seems intuitively correct to me, but then that may just be because I have been using re for years.

    I just thought I'd report the discrepancy, as one of the goals of regex is (as I understand it) to replace re as seamlessly as possible.

  3. Former user Account Deleted

    Fixed in regex 2014.01.30.

    It now behaves more like the re module in the version 0 behaviour.

  4. Log in to comment