finditer(..., overlap=True, timeout=5) hangs on a difficult input despite the timeout

Create issue
Issue #378 resolved
Marcin Wojnarski created an issue

I run a complex regex pattern that matches a sequence of 3-4 consecutive numbers written in different numeric formats and possibly separated by some text, WITH OVERLAP, using method finditer(text, overlap=True) of a compiled regex.

The pattern works fine on most inputs (tested on 100K different real-world texts), but it hangs on an input containing dense sequences of several hundred, nearly consecutive, numbers. This is a difficult input, no doubt. However, even after adding timeout (for example, timeout=5), the method STILL HANGS for many minutes and longer, which indicates a problem with the way “timeout” is processed. htop shows the process is busy all the time (100% cpu).

Ubuntu 20.04, 64 bit, Python 3.8.2. First tried on regex-2020.6.8, then upgraded to the latest version (2020.7.14) - the problem occurs with both versions.

Comments (3)

  1. Marcin Wojnarski reporter

    Sorry, the problem seems to occur in a different pattern, the one compiled with standard “re” and without timeout. Closing this issue.

  2. Marcin Wojnarski reporter

    The problem seems to occur in a different pattern, the one compiled with standard "re" and without timeout.

  3. Matthew Barnett repo owner

    For future reference, you didn’t provide any code that demonstrated the problem, so I wouldn’t have been able to investigate it anyway.

  4. Log in to comment