Benchmarks

Whoosh is quite fast!

As Whoosh is in pure Python, there is of course the common suspicion that it must be significantly slower than Non-Pure-Python (like search code in C/C++/Java, plus Python wrapper) search solutions.

The benchmarks below are maybe not very scientific and also not covering all sorts of different use cases, but they maybe show that one needs to be careful with such suspicions.

Benchmark code is there (benchmark results made with different versions of the benchmark code are NOT comparable): https://bitbucket.org/thomaswaldmann/python-search-benchmark/

If you have more test code or adaptions for different python search libs, please contribute!

How the benchmark works

N documents are generated, the search word is a random word and 10 chars long, plus 10 extra fields with 100 chars of random stuff each (just to pump up the size of the document).

For indexing, all fields are indexed and stored.

For searching, all words are searched in random order and all stored fields are retrieved.

For whoosh, we used the multiprocessing writer for building the index - this explains why it is faster for indexing than xappy (because it used all 4 cores, not just 1).

For searching, xappy/xapian is faster (there was no parallel processing used).

But you see that the speed difference between xappy and whoosh is maybe not as big as you expected.

Index Size about 12MB

# Phenom II X4 840, 8GB RAM, HDD
# Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
# [GCC 4.6.1] on linux2

Params:
DOC_COUNT: 3000 WORD_LEN: 10
EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100

Benchmarking: xappy 0.5 / xapian 1.2.5
Indexing takes 2.8s (1068.9/s)
Searching takes 0.5s (6635.8/s)

Benchmarking: whoosh 2.3.2
Indexing takes 0.8s (3575.6/s)
Searching takes 0.8s (3714.8/s)

Updated

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.