Kernel crash on fuzzy regex search

Create issue
Issue #334 new
Former user created an issue

The following regex fuzzy search will crash the Python kernel and give a message: “Kernel died, restarting”.

import regex as f_re
str_to_search = u'al, '
flags   = f_re.IGNORECASE | f_re.DOTALL | f_re.ENHANCEMATCH 
regex = r'(?e)(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}'
regex_comp = f_re.compile(str(regex),flags)
print(str_to_search)
for match in regex_comp.finditer(str_to_search):
    print('does not even get to here, kernel crash before this')
    print(match.group())

It took me a long time to figure out exactly what were the exact crash conditions, and I also found out that there was an error in the regex expression since the ? in expression1\W+)? doesn’t make any sense, still, the module should crash the whole kernel.

I am using version '2.5.33' 64 bits for python 3.7 in windows 10 64, installed manually from wheel file. These are the most important package versions: - anaconda 2018.12 py37_0 - ipykernel 5.1.0 py37h39e3cac_0 - ipython 7.2.0 py37h39e3cac_0 - python 3.7.1 h8c8aaf0_6 - spyder 3.3.2 py37_0 - spyder-kernels 0.3.0 py37_0

P.S. Thanks for the great module.

Comments (11)

  1. Matthew Barnett repo owner

    Just tried it (Windows 10 64-bit, Python 3.7). No crash. Tried Python 3.6 too. No crash.

  2. Bruno BC

    Thanks for the interest Matthew.

    I can confirm that I still have the bug. It is a very strange bug. It won’t happen if I change the string to search or if I execute without fuzzy options. It must be something related to my installation.

    This is the windows event log crash report:

    - <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

    - <System>

    <Provider Name="Application Error" />

    <EventID Qualifiers="0">1000</EventID>

    <Level>2</Level>

    <Task>100</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2019-07-02T07:19:35.116398900Z" />

    <EventRecordID>37498</EventRecordID>

    <Channel>Application</Channel>

    <Security />

    </System>

    - <EventData>

    <Data>python.exe</Data>

    <Data>3.7.1150.1013</Data>

    <Data>5c0f4332</Data>

    <Data>_regex.pyd</Data>

    <Data>0.0.0.0</Data>

    <Data>5a318168</Data>

    <Data>c0000005</Data>

    <Data>0000000000001ec1</Data>

    <Data>2db0</Data>

    <Data>01d530a6801bbc05</Data>

    <Data>C:\Program Files\Anaconda\python.exe</Data>

    <Data>C:\Program Files\Anaconda\lib\site-packages\_regex.pyd</Data>

    <Data>0bdc30c4-b70b-4396-944d-9b61ae0c8031</Data>

    <Data />

    <Data />

    </EventData>

    </Event>

    If you are interested (and tell me how), I can try to generate some additional log from within the execution script.

    Regards,

    B.

  3. Matthew Barnett repo owner

    Could you try these at the Python prompt to confirm what you’re getting:

    import regex
    pattern = regex.compile(r'(?e)(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}')
    
    # Should print None.
    print(pattern.search('al, '))
    
    s = pattern.finditer('al, ')
    # Should raise StopIteration.
    next(s)
    
    pattern = regex.compile(r'(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}')
    
    # Should print None.
    print(pattern.search('al, '))
    
    s = pattern.finditer('al, ')
    # Should raise StopIteration.
    next(s)
    
  4. Bruno BC

    Hello Matthew,

    I cannot get past the print(pattern.search('al, ')) command. As soon as I execute this command, the kernel crashes.

    This is a print-screen of my third try. (note: “Python dejó de funcionar” means Python stopped working)

    Thanks,

    Bruno

  5. Matthew Barnett repo owner

    I was thinking about how to test it when it doesn’t crash on my machine and it occurred to me that I do have another machine I could test it on: a Raspberry Pi. It crashes!

  6. Matthew Barnett repo owner

    I think the problem might be that I re-organised the files previously and if you install a new version of regex some of the old files get left behind, which somehow works most of the time but sometimes doesn’t.

    I found that problem on the Raspberry Pi and fixed it by uninstalling regex, removing any remaining files related to the regex module from the site-packages subfolder of Python, and then re-installing regex.

  7. Bruno BC

    Thanks Matthew, I will try that.

    Did you get the chance to test any other expressions, because this is the only one that crashes in my system. I can do other fuzzy searches without problems.

    Regards,

    Bruno

  8. Matthew Barnett repo owner

    I reduced it down to:

    import regex
    print(regex.match(r'(?:(?=(e)?)\1){e<=1}', ' '))
    

    which still crashed.

  9. Matthew Barnett repo owner

    Here’s a script you could try to ‘clean’ the regex installation:

    # Script to 'clean' an installation of the regex module.
    from glob import glob
    from os import remove
    from os.path import dirname, isdir, join
    
    import regex
    site_folder = dirname(dirname(regex.__file__))
    del regex
    
    old_files = glob(join(site_folder, 'regex.*')) + glob(join(site_folder, '_regex.*'))
    
    if isdir(join(site_folder, 'regex')):
        if old_files:
            for path in old_files:
                print(f'Found old file {path}')
                # Uncomment the next line to remove the old file.
                #remove(path)
        else:
            print('No old files')
    else:
        print('New regex not found')
    
  10. Talha Moosani

    Hi Matthew, I am having the same issue with the below regex.

    mypattern = r'(?<=Ves\/Voy\/Dir[\s]*)([\D]+[\s]*[\D]+)[\s]([\w]+.+)'
    matchNum  = re.search(mypattern,text,re.IGNORECASE)
    

  11. Matthew Barnett repo owner

    What is the value of ‘text'? Without that I’m unable to help. I need a complete example that shows the problem, ideally the smallest example that shows it.

  12. Log in to comment