#2 bug of POSIX matching?

Create issue
Issue #183 invalid
animalize created an issue

hi, I'm not sure if it's a bug.

On regex 2015.11.22:

>>> regex.search(r'(?p)a*?(.*?)', 'aaabbb').group(1)
'aaabbb'
>>> regex.search(r'(?p)a*?(.*)', 'aaabbb').group(1)
'aaabbb'

On GNU sed 4.2.1:

user@linux:~$ echo "aaabbb" | sed -E "s/a*?(.*?)/\\1/"
bbb
user@linux:~$ echo "aaabbb" | sed -E "s/a*?(.*)/\\1/"
bbb

Comments (5)

  1. Matthew Barnett repo owner

    POSIX-standard regexes don't have lazy quantifiers, so it's difficult to know how they should behave, especially when mixed with greedy quantifiers, although I think I read somewhere that in some implementations they just replace any lazy quantifiers with their greedy equivalent.

  2. Grant Welch

    @mrabarnett Is correct. GNU sed ('gsed' below) ignores the non-greedy operators. Whereas, BSD sed will throw errors ('sed').

    bash$ echo "aaabbb" | sed -E "s/a*?(.*?)/\\1/"
    sed: 1: "s/a*?(.*?)/\1/": RE error: repetition-operator operand invalid
    bash$ echo "aaabbb" | gsed -E "s/a*?(.*?)/\\1/"
    bbb
    bash$ echo "aaabbb" | sed -E "s/a*(.*)/\\1/"
    bbb
    
    bash$ echo "aaabbb" | sed -E "s/a*?(.*)/\\1/"
    sed: 1: "s/a*?(.*)/\1/": RE error: repetition-operator operand invalid
    bash$ echo "aaabbb" | gsed -E "s/a*?(.*)/\\1/"
    bbb
    

    Using ruby we can compare GNU-sed and BSD-sed to the Oniguruma engine to use as another reference implementation for regex.

    +irb(main):006:0> "aaabbb".sub(%r{a*(.*)}, '\1')
    => "bbb"
    +irb(main):007:0> "aaabbb".sub(%r{a*?(.*)}, '\1')
    => "aaabbb"
    +irb(main):008:0> "aaabbb".sub(%r{a*?(.*?)}, '\1')
    => "aaabbb"
    

    And, PCRE (perl):

    bash$ echo "aaabbb" | perl -pe 's|a*(.*)|\1|'
    bbb
    bash$ echo "aaabbb" | perl -pe 's|a*?(.*)|\1|'
    aaabbb
    bash$ echo "aaabbb" | perl -pe 's|a*?(.*?)|\1|'
    aaabbb
    
    1. https://en.wikipedia.org/wiki/Oniguruma
    2. https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
  3. animalize reporter

    Thanks, Grant. I'm going to close this issue.

    BTW, .POSIX flag is a good helper for fuzzy matching.

  4. Log in to comment