OR by default - awful or useful?

Issue #271 new
Thomas Waldmann
created an issue

I attended https://ep2012.europython.eu/conference/talks/full-text-search-for-trac-with-apache-solr - Alex mentioned that solr is using OR by default, which I found rather surprising (as I had awful experiences with bad search engines doing OR by default or even only OR).

But he explained that using OR is not as bad as it sounds and even useful maybe, because (for solr) if you query for A OR B, results that have A AND B will still score higher than results that have only A (or only B).

I just wanted to share / keep the idea, maybe it is interesting/useful.

Comments (2)

  1. Matt Chaput repo owner

    Added ArrayUnionMatcher as optimization for OR queries. Changed Query.matcher() signature.

    The new OR matcher is several times faster (the number of times faster increases with the number of sub-clauses), but does not allow access to the "current document". The new context argument allows the Or query to know when it can use the optimized matcher.

    I needed a way during instantation of the matcher tree to let the query know whether access to the current document was needed. Rather than add another keyword argument to Query.matcher(), I replaced the weighting argument with a "context" argument that subsumes weighting and the new "needs_current" functionality, which also replaces the "requires_matcher" signal from categorizers to collectors.

    See issue #271.


  2. Log in to comment