Issue #331 new

ReadTooFar Exception

Anonymous created an issue

I get the ReadTooFar exception while calling searcher.search(). I'm using Whoosh 2.5.0 and I've tried to apply the fix 310 but that didn't solve anything.

What's strange is that this error only occurs with some terms in the query but the majority of my queries work fine.

My query looks like this:

query = ''' And([Or([Term('name', u'ix'), Term('repository', u'ix'), Term('index', u'ix'), Term('shelfmark', u'ix'), Term('date', u'ix'), Term('type', u'ix'), Term('scriptorium', u'ix')]), Term(u'type', u'scribes')]) '''

My call looks like this:

results = searcher.search(query, limit=None)

Let me know if you need further information.

Comments (7)

  1. goffer_looney

    Here's some debugging information:

    Environment:
    
    
    Request Method: GET
    Request URL: http://localhost/digipal/search/?terms=ix&basic_search_type=hands&ordering=&years=&result_type=&scribes=&repository=&place=&date=
    
    Django Version: 1.4.3
    Python Version: 2.7.2
    Installed Applications:
    ['mezzanine.boot',
     'django.contrib.auth',
     'django.contrib.contenttypes',
     'django.contrib.redirects',
     'django.contrib.sessions',
     'django.contrib.sites',
     'django.contrib.sitemaps',
     'django.contrib.staticfiles',
     'mezzanine.conf',
     'mezzanine.core',
     'mezzanine.generic',
     'mezzanine.blog',
     'mezzanine.forms',
     'mezzanine.pages',
     'mezzanine.galleries',
     'mezzanine.twitter',
     'pagination',
     'digipal',
     'haystack',
     'reversion',
     'south',
     'django_extensions',
     'filebrowser_safe',
     'grappelli_safe',
     'django.contrib.admin',
     'django.contrib.comments']
    Installed Middleware:
    ['digipal_django.middleware.HttpsAdminMiddleware',
     'django.contrib.sessions.middleware.SessionMiddleware',
     'django.contrib.auth.middleware.AuthenticationMiddleware',
     'django.contrib.redirects.middleware.RedirectFallbackMiddleware',
     'django.middleware.common.CommonMiddleware',
     'django.middleware.csrf.CsrfViewMiddleware',
     'django.contrib.messages.middleware.MessageMiddleware',
     'mezzanine.core.request.CurrentRequestMiddleware',
     'mezzanine.core.middleware.TemplateForDeviceMiddleware',
     'mezzanine.core.middleware.TemplateForHostMiddleware',
     'mezzanine.core.middleware.AdminLoginInterfaceSelectorMiddleware',
     'mezzanine.core.middleware.SitePermissionMiddleware',
     'mezzanine.pages.middleware.PageMiddleware',
     'pagination.middleware.PaginationMiddleware']
    
    
    Traceback:
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\django\core\handlers\base.py" in get_response
      111.                         response = callback(request, *callback_args, **callback_kwargs)
    File "c:\Users\Geoff\workspace\digipal\dp\digipal-django\digipal\views\search.py" in search_page
      72.                 context['results'] = type.build_queryset(request, term)
    File "c:\Users\Geoff\workspace\digipal\dp\digipal-django\digipal\views\content_type\search_content_type.py" in build_queryset
      164.                 results = searcher.search(query, limit=None)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\searching.py" in search
      787.         self.search_with_collector(q, c)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\searching.py" in search_with_collector
      820.         collector.run()
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\collectors.py" in run
      142.                 self.set_subsearcher(subsearcher, offset)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\collectors.py" in set_subsearcher
      170.         self.matcher = self.q.matcher(subsearcher, self.context)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\query\compound.py" in matcher
      206.             m = self._matcher(subs, searcher, context)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\query\compound.py" in _matcher
      265.                                   context, q_weight_fn)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\query\compound.py" in _tree_matcher
      229.             m = make_weighted_tree(mcls, w_subms, **kwargs)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\util\__init__.py" in make_weighted_tree
      87.         insort(ls, (a[0] + b[0], fn(a[1], b[1])))
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\matching\binary.py" in __init__
      414.         self._find_first()
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\matching\binary.py" in _find_first
      425.             self._find_next()
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\matching\binary.py" in _find_next
      485.                 rb = b.skip_to(a_id)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\matching\combo.py" in skip_to
      266.             subm.skip_to(docnum)
    File "c:\Users\Geoff\workspace\digipal\dp\dp\lib\site-packages\whoosh\codec\whoosh3.py" in skip_to
      945.             raise ReadTooFar
    
    Exception Type: ReadTooFar at /digipal/search/
    Exception Value: 
    
  2. goffer_looney

    My schema looks like this:

    ('character', ID(format=Existence(boost=1.0), vector=None, scorable=None, stored=False, unique=False))
    ('component', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('date', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('description', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('feature', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('id', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=True, unique=None))
    ('index', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('label', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('locus', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('name', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('place', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('repository', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('scribes', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('scriptorium', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('shelfmark', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=False, unique=None))
    ('type', TEXT(format=Positions(boost=1.0), vector=None, scorable=True, stored=True, unique=None))
    
  3. goffer_looney

    Ok, my code and index have changed since I posted this so I'll have to try to reproduce it and then send you a zip of the index. (I implemented a patch to prevent the exception in my project but I'm not sure it's logically sound).

  4. arescope

    Hi,

    I am having also the same problem.

    My schema is

    Schema(id=ID(unique=True, stored=True), content=TEXT(stored=True))
    

    And when I perform the search like

    with self._textIdx.searcher() as searcher:
                qp = QueryParser("content", self._textIdx.schema)
                query = qp.parse(text)
                results = [(record["id"], record["content"]) for record in searcher.search(query, limit=limit)]
            return results
    

    The problem is happening only when there are spaces in the text.

    Thank you so much in advance

  5. Log in to comment