1. Matt Chaput
  2. whoosh
Issue #195 resolved

key_terms() is slow?

Thomas Waldmann
created an issue

I read this: http://packages.python.org/Whoosh/keywords.html

And did that: "You can extract key terms for the top N results from a query and suggest them to the user as additional/alternate query terms to try."

Code: {{{ flaskg.clock.start('search') results = searcher.search(q, limit=100) flaskg.clock.stop('search') flaskg.clock.start('search suggestions') name_suggestions = u', '.join([word for word, score in results.key_terms(NAME, docs=20, numterms=10)]) content_suggestions = u', '.join([word for word, score in results.key_terms(CONTENT, docs=20, numterms=10)]) flaskg.clock.stop('search suggestions') }}}

Output: {{{ 2011-09-02 16:17:20,167 INFO MoinMoin.util.clock:40 timer search(0): 18.41ms 2011-09-02 16:17:36,029 INFO MoinMoin.util.clock:40 timer search suggestions(0): 15862.10ms

2011-09-02 16:18:52,512 INFO MoinMoin.util.clock:40 timer search(0): 1.93ms 2011-09-02 16:19:07,739 INFO MoinMoin.util.clock:40 timer search suggestions(0): 15227.01ms }}}

So it looks like .key_terms() is a good way to kill performance, as it is 1000x .. 10000x slower than the search itself.

The index size is about 1GB.

Not sure if this is a bug. Am I doing something wrong? Can I tune it somehow?

Comments (5)

  1. Thomas Waldmann reporter

    nope, did not help:

    2011-09-02 17:28:51,120 INFO MoinMoin.util.clock:40 timer search(0): 2.89ms
    2011-09-02 17:30:28,525 INFO MoinMoin.util.clock:40 timer search suggestions(0): 97405.01ms
    
  2. Thomas Waldmann reporter

    I had changed my Schema like you see below and rebuilt the index:

    -            NAME: TEXT(stored=True, multitoken_query="and", analyzer=item_name_analyzer(), field_boost=2.0),
    +            NAME: TEXT(stored=True, vector=Frequency, multitoken_query="and", analyzer=item_name_analyzer(), field_boost=2.0),
    -            CONTENT: TEXT(stored=True, multitoken_query="and"),
    +            CONTENT: TEXT(stored=True, vector=Frequency, multitoken_query="and"),
    
  3. Matt Chaput repo owner
    • changed status to open
    • changed component to Search

    Hi Thomas, sorry for the late reply, I was on vacation. I'll try to look at this soon. I never tested that feature at large scale, hopefully I'm just doing something inefficient that can be fixed.

  4. Log in to comment