Issue #340 resolved

more_like infinite loop

Anonymous created an issue

When I upgraded form 2.3 to 2.5.1, my application began failing - more_like was entering an infinte loop. Here is a stack trace when I ctrl-c to abort the loop:

  File "/home/julian/src/parentmap/handler/management/commands/test.py", line 12, in handle
    relatedTo(Dummy())
  File "/home/julian/src/parentmap/handler/related.py", line 87, in relatedTo
    docs = searcher.more_like(num, 'text',)
  File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 587, in more_like
    return self.search(q, limit=top, filter=filter, mask=set([docnum]))
  File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 787, in search
    self.search_with_collector(q, c)
  File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 820, in search_with_collector
    collector.run()
  File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 143, in run
    self.collect_matches()
  File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 730, in collect_matches
    for sub_docnum in child.matches():
  File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 408, in matches
    self.skipped_times += matcher.skip_to_quality(minscore)
  File "/usr/local/lib/python2.7/dist-packages/whoosh/matching/combo.py", line 281, in skip_to_quality
    self._read_part()
  File "/usr/local/lib/python2.7/dist-packages/whoosh/matching/combo.py", line 207, in _read_part
    a[i] = 0
KeyboardInterrupt

This is running under django, if that matters. Happy to provide more details if needed. Sticking with 2.3 for now - although I'm also running out of file handlers, so I need to move forward someday soon.

Comments (9)

  1. haight6716

    Stack trace more readable?:

      File "/home/julian/src/parentmap/handler/management/commands/test.py", line 12, in handle
        relatedTo(Dummy())
      File "/home/julian/src/parentmap/handler/related.py", line 87, in relatedTo
        docs = searcher.more_like(num, 'text',)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 587, in more_like
        return self.search(q, limit=top, filter=filter, mask=set([docnum]))
      File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 787, in search
        self.search_with_collector(q, c)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/searching.py", line 820, in search_with_collector
        collector.run()
      File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 143, in run
        self.collect_matches()
      File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 730, in collect_matches
        for sub_docnum in child.matches():
      File "/usr/local/lib/python2.7/dist-packages/whoosh/collectors.py", line 408, in matches
        self.skipped_times += matcher.skip_to_quality(minscore)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/matching/combo.py", line 281, in skip_to_quality
        self._read_part()
      File "/usr/local/lib/python2.7/dist-packages/whoosh/matching/combo.py", line 207, in _read_part
        a[i] = 0
    KeyboardInterrupt
    
  2. haight6716

    Thanks for the immediate response, wow! It just spins at 100% cpu. I haven't fully debugged it, but I guess skip_to_quality's while loop is never satisfied.

  3. Matt Chaput repo owner

    OK, I thought maybe there were repeated statements that were cut off in the traceback, but if not then yes it's probably a "while" problem in a single function (hopefully that makes it easier to debug ;)

    Yes, please, it would be awesome if you could send me your index (either email to matt@whoosh.ca or send me a download link) and some code that triggers the bug. Thanks very much!

  4. haight6716

    Yeah, I cut off some of the call stack above your code, but I did put some debug statements around the call into whoosh to ensure it wan't something in my code.

    Here it is, it's 30M, but I boiled down the test code to a very simple situation. http://www.julianhaight.com/test_whoosh.tgz

    I re-built the index with 2.5.1.

    By the way, when I run the same test with a db that is from 2.3, I get this instead which looks like a different problem:

    Traceback (most recent call last):
      File "./test_whoosh.py", line 14, in <module>
        test()
      File "./test_whoosh.py", line 8, in test
        db = open_dir(DBDIR)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/index.py", line 123, in open_dir
        return FileIndex(storage, schema=schema, indexname=indexname)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/index.py", line 421, in __init__
        TOC.read(self.storage, self.indexname, schema=self._schema)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/index.py", line 646, in read
        schema, segments = loader(stream, gen, schema, version)
      File "/usr/local/lib/python2.7/dist-packages/whoosh/legacy.py", line 68, in load_110_toc
        segments = stream.read_pickle()
      File "/usr/local/lib/python2.7/dist-packages/whoosh/filedb/structfile.py", line 245, in read_pickle
        return load_pickle(self.file)
    ImportError: No module named fileindex
    

    Not as serious a problem as far as I'm concerned - I can rebuild the index when I upgrade. But not everyone can do that perhaps?

    Thanks!

  5. Log in to comment