Regression from 2020.4.4 -> 2020.5.7 in non-fuzzy matching pattern

Create issue
Issue #372 resolved
Tom Milligan created an issue

I recently upgraded from 2020.4.4 to 2020.5.7 and experienced a regression. The below code will hang for a long time (>10 seconds) in 2020.5.7, but run as expected in 2020.4.4:

# -*- coding: utf-8 -*-
# ^ this unicode line ^ is required to make the below hang in some situations

import regex

RX_FIND_MARKERS = regex.compile(
    "(?<=m)e*((?:t+e*)+)m", regex.V0 | regex.UNICODE
)
markers = "tstttettettttetetemetttttttttttttttttttttttttttttttttttttttttttttt"

# will hang for a long time
assert list(RX_FIND_MARKERS.finditer(markers)) == []

The only changes to the codebase between these versions appear to be around fuzzy matching, but this pattern does not contain any fuzzy matching syntax. It does contain a lookbehind group.

The regex isn’t mine, but from the talon library, so I can’t work around the issue in some other way. For now I have rolled back to a previous regex version.

Just from my debugging in pinning down this issue, the Unicode flag and unicode encoding of the python source file where regex compilation is triggered appear to be important.

Comments (3)

  1. Log in to comment