# Whoosh

Bitbucket is a code hosting site with unlimited public and private repositories. We're also free for small teams!

Close

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. Programmers can use it to easily add search functionality to their applications and websites. Every part of how Whoosh works can be extended or replaced to meet your needs exactly.

Some of Whoosh's features include:

• Pythonic API.
• Pure-Python. No compilation or binary packages needed, no mysterious crashes.
• Fielded indexing and search.
• Fast indexing and retrieval -- faster than any other pure-Python, scoring, full-text search solution I know of.
• Pluggable scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
• Powerful query language.
• Pure Python spell-checker (as far as I know, the only one).

Whoosh might be useful in the following circumstances:

• Anywhere a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them).
• As a research platform (at least for programmers that find Python easier to read and work with than Java ;)
• When an easy-to-use Pythonic interface is more important to you than raw speed.

Whoosh was created and is maintained by Matt Chaput. It was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini. Side Effects Software Inc. graciously agreed to open-source the code.

## Installing Whoosh

If you have setuptools or pip installed, you can use easy_install or pip to download and install Whoosh automatically:

$easy_install Whoosh or$ pip install Whoosh


## Getting the source

You can check out the latest version of the source code using Mercurial:

hg clone http://bitbucket.org/mchaput/whoosh


# Recent activity

Commits by sat...@Valeras-MacBook-Pro.local were pushed to satsura/Whoosh

251d7c7 - Added feature: now in django you can set WHOOSH_REGEX_TOKEN_DEFAULT_PATTERN param to overwrite default pattern regexp

Commits by Even Wiik Thomassen were pushed to satsura/Whoosh

05380b7 - Fix for issue #215 Issue with most_distinctive_terms, with test. Code taken from issue-tracker, provided by anonymous. Modified a test to cover the issue, which will ...

Commits by Matt Chaput were pushed to satsura/Whoosh

71e4c1e - Writer now assembles segment files into a single compound file by default. (Use myindex.writer(compound=False) to disable.) Removed uses of mmap other than with compound file. ...
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.