sed die 'invalid argument to regex routine' with some regex pattern

Issue #124 closed
Takehiko NOZAKI repo owner created an issue

try following:

$ echo -n 'A' | sed -e 's/=*$//g'
sed: RE error: invalid argument to regex routine

TNF 7.1 has another bug:

$ echo -n 'A' | sed -e 's/=*$//g' | od -x
0000000     0a41
0000001

OpenBSD(6.1)/FreeBSD(11.0) is fine:

$ echo -n 'A' | sed -E -e 's/=*$//g' | od -x
0000000     0041
0000001

Comments (5)

  1. Takehiko NOZAKI reporter

    BUGFIX: Issue #124 -- sed die 'invalid argument to regex routine' with some regex pattern. the patch taken from OpenBSD, see original commit message:

    http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/sed/process.c#rev1.18

    Rewrite the main loop of the "sed s/..." command, to fix multiple issues regarding the replacement of zero-length strings.

    This commit brings back rev. 1.16, but without the regression that forced the backout: No NUL bytes will be output now, not even when the input file lacks a trailing newline character and there is a zero-length match at the end.

    OK otto@ deraadt@; and naddy@ (who originally found the regression) checked that the regression is indeed fixed.

    http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/sed/process.c#rev1.17

    Backout previous, naddy@ found the following regression: When the input does not end in a trailing newline character and there is an empty match at the end, the new code adds a spurious '\0' character. I have a fix, but otto@ prefers backout and full re-evaluation after release.

    http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/sed/process.c#rev1.16

    Rewrite the main loop of the "sed s/..." command, shortening it by ten lines and simplifying it by removing the switch statement implementing /g, /1, and /2 separately and repetitively. The idea to make the loop control variable slen, i.e. the length of the string remaining to be processed, signed, and stay in the loop even when slen == 0 (i.e. at the end of the string), lifted from FreeBSD by otto@. On i386, process.o shrinks by 440 bytes, and the sed binary by 23 bytes.

    This fixes multiple aspects of the replacement of multiple (/g) or specific (e.g. /2) instances of zero-length matches, both with BREs and EREs, both with and without a trailing newline character on the input.

    Feedback and OK otto@.

    → <<cset 1c88d32b903f>>

  2. Takehiko NOZAKI reporter

    F/O people found regression of this fix: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195929

    echo z |  sed -n -e 's/^a*/b/2p'
    

    is the case.

  3. Log in to comment