I recently upgraded from
2020.5.7 and experienced a regression. The below code will hang for a long time (>10 seconds) in 2020.5.7, but run as expected in 2020.4.4:
# -*- coding: utf-8 -*- # ^ this unicode line ^ is required to make the below hang in some situations import regex RX_FIND_MARKERS = regex.compile( "(?<=m)e*((?:t+e*)+)m", regex.V0 | regex.UNICODE ) markers = "tstttettettttetetemetttttttttttttttttttttttttttttttttttttttttttttt" # will hang for a long time assert list(RX_FIND_MARKERS.finditer(markers)) == 
The only changes to the codebase between these versions appear to be around fuzzy matching, but this pattern does not contain any fuzzy matching syntax. It does contain a lookbehind group.
The regex isn’t mine, but from the talon library, so I can’t work around the issue in some other way. For now I have rolled back to a previous
Just from my debugging in pinning down this issue, the Unicode flag and unicode encoding of the python source file where regex compilation is triggered appear to be important.