Issue #320 invalid

Parser returns _NullQuery on parse(u"you")

Anonymous created an issue

parser will return _NullQuery when I put on u"you", example below, and is ok if I put in u"document"

test.py

from whoosh.index import create_in
from whoosh.fields import *
schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = create_in("C:/dev/whoosh/test", schema)
writer = ix.writer()

writer.add_document(title=u"First document", path=u"c:/where/art/though", content=u"you document")
writer.add_document(title=u"second document", path=u"c:/yes/though", content=u"second doc")

writer.commit()

query.py

from whoosh.qparser import QueryParser
from whoosh.index import open_dir
ix = open_dir("C:/dev/whoosh/test")

with ix.searcher() as searcher:
    print ix.schema
    parser = QueryParser("content", ix.schema)
    query = parser.parse(u"you")
    print query
    results = searcher.search(query)
    print results

Comments (2)

  1. Matt Chaput repo owner
    • edited description
    • changed component to Indexing

    The default analyzer for TEXT fields (whoosh.analysis.StandardAnalyzer) includes a stop word filter (whoosh.analysis.StopFilter) and "you" is in the list of stop words (whoosh.analysis.STOP_WORDS). Since the word is not indexed, it can't be found by a query, so the query parser replaces a search for it with a NullQuery.

    You can use a SimpleAnalyzer instead (which doesn't include a stop word filter) or create your own list of stop words, e.g.:

    stops = ["the", "a", "an", "it", "is", "was", "be"]
    ana = analysis.StandardAnalyzer(stoplist=stops)
    schema = fields.Schema(context=fields.TEXT(analyzer=ana))
    
  2. Log in to comment