Currently, this contains probability.py, a module for building FreqDists on top of Redis. Also included is a copy of the python redis.py interface. It is based on the 0.9.9 release of NLTK and 0.096 release of Redis. Because it relies on the internal implementation of FreqDist, and given that Redis is still very much in beta, I can't guarantee it'll work with future versions of NLTK and/or Redis.
RedisFreqDist works just like NLTK's FreqDist, but stores samples and frequency counts as keys and values. That means samples must be strings. And of course, you'll need a running redis-server for
RedisFreqDist to work. Below is a simple example function for creating a
RedisFreqDist and counting samples.
def make_freq_dist(samples, host='localhost', port=6379, db=0): freqs = RedisFreqDist(host=host, port=port, db=db) for sample in samples: freqs.inc(sample)
All of the other FreqDist functions are supported, allowing you to get a list of all the samples, and to lookup the frequency count of each sample. For more info, see my article Building a NLTK FreqDist on Redis.
probability.py also includes
ConditionalRedisFreqDist, which works just like NLTK's ConditionalFreqDist. There's also
RedisConditionalFreqDist for when you have a large number of conditions, but it's not very well tested.