I've noticed that the following regex:
(which is supposed to do the same job as the old good comment regex:
ends up with a MemoryError when there is a very long comment in the string.
regex.findall(r'<!--(?>[^-]++|-(?!->))*-->', paste()) Traceback (most recent call last): File "<input>", line 1, in <module> File "...\Python37\lib\site-packages\regex.py", line 333, in findall overlapped, concurrent) MemoryError
Why is that happening? Since all the groups in the regex are atomic ones, the engine only needs to remember the last successful matching position before each atomic group and proceed towards the end as soon as each group fails. That shouldn't require that much memory, should it?
Now I know that what I just said is a gross simplification of what is actually happening under the hood and I might be wrong about it, but I thought maybe there is something that can be improved so I decided to create this issue.
Here is a sample text you can download: https://test.wikipedia.org/wiki/User:Dalba/comment-regex-test?action=raw&ctype=text/x-wiki