Overview

python-tools

This package is a random collection of tools that might be useful to python programmers. All of the code in this repo is released under a BSD license (see the top-level LICENSE file).

Some of the code in this repository was released courtesy of S7 Labs (http://s7labs.com) and/or Songza Media, Inc (http://songza.com). Thank you to S7 and Songza for allowing this to happen.

mongoengine

Tools for use with the excellent MongoEngine package (http://mongoengine.org/)

extract-model.py

Reverse engineer mongoengine Document subclass from mongodb collection.

This can be useful when retrofitting mongoengine to an existing MongoDB database. It examines a collection and prints a skeleton of a Document subclass which includes field and index declarations, as well as stubs of some standard methods, such as __unicode__(), which you should flesh out.

The index declarations should be correct, since there is sufficient information in the database to figure out what they should be. If it gets them wrong, that's a bug that needs to get fixed.

The fields are more of a guess. The best we can do is examine a bunch of documents, observe what fields exist, and intuit what they should be declared based on what data is found. Due to MongoDB's schema-free nature, we can do no better than guess. Don't just accept this blindly; consider it a starting point to save yourself a lot of typing. Examine the output carefully to ensure it makes sense for your data. If you have suggestions for better heuristics, please let me know. The current logic is quite simplistic.

logs

Tools for analyzing various sorts of log files (mostly related to web servers).

stackprint.py

Finds and prints python stack dumps in a text file, using some heuristics. The text file is ostensibly a web server log file, but could really be anything. We use rsyslog for logging, which appears to escape embedded newlines as #012. We use that as a hint that a line contains a stack dump.

In our case, the stack dumps are generated by some django middleware which catches all uncaught exceptions and formats them with traceback.format_tb().

Every stack has a signature, which is made up of the file and function names for each frame. This is not a perfect way to identify stack dumps, but it's convenient and good enough for our purposes. All of the stacks found in the input are catagorized by signature. A printout is produced summarizing all the stacks found, sorted by frequency.

In theory, you should have no uncaught exceptions from your web server. In practice, shit happens. This is a convient tool to identify which shit happens the most, so you can attack the worst problems first.

django

This is a collection of small utilities (and supporting middleware) which have proven useful when writing django apps.

A common pattern when parsing query parameters is verifying that required parameters have been included, optional parameters get their default values, parameters which are supposed to be integers, or booleans, or whatever, are indeed strings representing those types, etc. The family of _param() functions simplifies all this.

For example, imagine you have a view which requires a query parameter, username. In your view, you can simply write:

username = _param(request, 'username')

and everything is take care of for you. If the username parameter is missing, BadRequest is raised, and the middleware arranges to return a 400 status code with an appropriate message in the body:

missing required parameter (username)"

Likewise,

limit = _int_param(request, 'limit', 10)

will return the value of the limit paramter converted to an int. If the parameter is missing, the default value of 10 will be returned. If the request included a query parameter, "limit=x", it would detect that "x" is not a valid integer and return a 400 with a body of

parameter 'limit' isn't a valid integer (u'x')