multitoken_query - strange defaults lead to strange results
We (at MoinMoin) stumbled over strange whoosh behaviour that was easily explainable after we found multitoken_query. :)
I have seen you added this in whoosh 1.5 and defaulted it to "first" everywhere for compatibility reasons. While I can understand trying to be compatible, it looks rather like a bug to me that should be fixed (by default) and not kept for compatibility.
E.g. if one has a TEXT field and indexes "foo bar baz", it gets tokenized to "foo", "bar", "baz" and put into index.
If one does a query then for "foo bar", it'll tokenize that into "foo", "bar" and then throw away the "bar" because of multitoken_query="first" default and search only for "foo", embarrassing the user with strange search results.
I only discovered this by using teh source. Afterwards I also found some docs about it, but IIRC I didn't see this in the tutorials or at another place except the FieldType docs (which one usually discovers rather late).
So, how could one improve this?
default to "and" (like when using multiple terms, they are also ANDed by default. "or" is usually stupid/annoying. "phrase" might also make sense, but maybe not as a default.)
If you only have one token, AND(token) is the same as "first" behaviour, so maybe this is good enough for compatibility? and if a user gives more than one token, he maybe expects whoosh making use of it. :)
In any case (no matter whether you change the default or not) document it at a more visible place. tutorial and other "prose" parts of the docs, not just in the FieldType docs.