1. Matt Chaput
  2. whoosh
  3. Issues
Issue #135 open

Infinite search

created an issue

I have a requirement to show the user a virtually infinite list of results. It is far easier to visually see results and scroll through them than to get a short list of results and then have to keep typing to broaden your query to get more.

In my current implementation I wrap search and see if there are fewer than limit results. If there is only one item then I do more_like_this and append those to the Results. If more than one then I get key_terms and append a search for those.

Spelling correction can also be mixed into this. If the search was for 'niel' (mis-spelling of neil) and it so happens that one doc matches, I'd like the following documents to mix in the likely better spelling.

My enhancement request is for a function that does infinite search (always returns limit results) and uses the existing matches plus knowledge of key terms, spelling etc to fill out the remainder of the list.

Comments (3)

  1. rogerb_aviga reporter

    A first approximation is a more_like_these function. On Results it gathers up the docnums of the top results and then calls more_like_these on Searcher. more_like_these on Searcher is virtually identical to more_like_this taking docnums instead of docnum. I then extend the existing results with these.

        def more_like_these(self, docnums, fieldname, top=10, numterms=5, normalize=False, model=classify.Bo1Model):
            """Get more like a range of docs"""
            # code copied from above
            kts = self.key_terms(docnums, fieldname, numterms=numterms,
                                     model=model, normalize=normalize)
            # Create an Or query from the key terms
            q = query.Or([query.Term(fieldname, word, boost=weight)
                          for word, weight in kts])
            # Filter the original document out of the results using a bit vector
            # with every bit set except the one for this document
            size = self.doc_count_all()
            comb = BitVector(size, [n for n in xrange(self.doc_count_all())
                                    if n not in docnums])
            return self.search(q, limit=top, filter=comb, optimize=False)
  2. Matt Chaput repo owner
    • changed status to open

    This is interesting. I'll try to do something with this (at least as an example, if not as core functionality) when I finish with the better spell checking.

    Maybe a good first step to try to reach an arbitrary limit, before MLT or autocorrecting, would be to add on results from rewriting the query to be less restrictive (e.g. convert any AND clauses into ORs).

  3. rogerb_aviga reporter

    I use dismax so all queries are OR. My final implementation:

    • Do the query
    • If no results, repeat query without filter but exclude these results from final results
    • While less then 10 results keep calling more like these on whatever results are present till getting at least 10
    • If less than 50 results then call more like these on first 10 to extend results

    (I know the offset and limit passed in so extra work is only done if the results will be looked at.) This whole approach works very well. In the future I'd want to mix in stemming, double metaphone, spelling correction etc.

  4. Log in to comment