Partial matches

Create issue
Issue #102 resolved
Geert Jansen created an issue

Partial matches would be very useful.

A partial match is when a pattern did not match due to end of input, but could have matched if more input had been available. This is very useful e.g. when tokenizing input in a character by character way using regular expressions.

Boost has it here:

Java as a hitEnd() method:

Comments (5)

  1. Former user Account Deleted

    Partial matching is something I've considered adding, but I think it'll be too difficult to retrofit it into the existing implementation.

  2. Former user Account Deleted

    I recently tried to find this exact feature, and likewise ended up finding it in Boost and Java. I think it would be a very good feature in a language like Python, because it can e.g. be used to traverse a directory tree and match whole path names, or to validate user input, etc.

    However in the process I came to think of a more "advanced" version of it. In Boost and Java it is an API feature, i.e. a flag you set and it is applied to the whole match.

    Just in case it's worth considering, one could also view it as an alternative for ranges... consider:

    \d{1,3} - matches up to three digits.

    You could also use any regex instead of \d, but specify allowable amounts of characters to consume (with this or an alternate syntax), for example:


    This would allow the partial matched a, ab, 1, 12 and the full matches abc and 123. The information about how deep the match went could be stored in the group. It would be a unique feature of the library.

    I don't know many use cases (except for "partial matching" the whole regex), but at least it would use a somewhat existing feature of the language, which simply amounts to counting characters - instead of a character class, you would be counting characters consuming by the regex.

    Without looking at the code, it could also make the implementation easier.

  3. Former user Account Deleted

    Probably it would also be the same as allowing up to N deletions at the end... so that would be another possibility.

  4. Former user Account Deleted

    The syntax:


    already has a meaning; it's 1..3 repeats of (abc|123).

  5. Log in to comment