CherryPy / sphinx / source / concepts / dispatching.rst

Dispatching

The resource is not the storage object. The resource is not a mechanism that the server uses to handle the storage object. The resource is a conceptual mapping -- the server receives the identifier (which identifies the mapping) and applies it to its current mapping implementation (usually a combination of collection-specific deep tree traversal and/or hash tables) to find the currently responsible handler implementation and the handler implementation then selects the appropriate action+response based on the request content. All of these implementation-specific issues are hidden behind the Web interface; their nature cannot be assumed by a client that only has access through the Web interface.

Roy Fielding

When you wish to serve a resource on the Web, you never actually serve the resource, because "resources" are concepts. What you serve are representations of a resource, and page handlers are what you use in CherryPy to do that. Page handlers are functions that you write; CherryPy calls one for each request and uses its response (a string of HTML, for example) as the representation.

For the user, a web application is just like a website with static files. The user types (or clicks) a URL, and gets to the desired webpage. A conventional webserver uses the URL to retrieve a static file from the filesystem. A web application server, on the other hand, not only serves the content from static files; it can also map the URL it receives into some object and call it. The result is then sent back to the user's browser, where it is rendered into a viewable page. The result is a dynamic web application; for each URL, a unique object can be called into action. The key to understand how to write a new web application is to understand how this mapping occurs.

CherryPy takes the output of the appropriate page handler function, binds it to :attr:`cherrypy.response.body <cherrypy._cprequest.Response.body>`, and sends it as the HTTP response entity body. Your page handler function (and almost any other part of CherryPy) can directly set :attr:`cherrypy.response.status <cherrypy._cprequest.Response.status>` and :attr:`cherrypy.response.headers <cherrypy._cprequest.Response.headers>` as desired.

Dispatchers

Before CherryPy can call your page handlers, it has to know 1) where they are, and 2) which one to call for a given 'identifier' (URI). In CherryPy, we use a Dispatcher object to:

  1. Understand the arrangement of handlers
  2. Find the appropriate page handler function
  3. Wrap your actual handler function in a :class:`PageHandler <cherrypy._cpdispatch.PageHandler>` object (see below)
  4. Set :attr:`cherrypy.request.handler <cherrypy._cprequest.Request.handler>` (to the :class:`PageHandler <cherrypy._cpdispatch.PageHandler>` wrapper)
  5. Collect configuration entries into :attr:`cherrypy.request.config <cherrypy._cprequest.Request.config>`
  6. Collect "virtual path" components

CherryPy has a default arrangement of handlers (see next), but also allows you to trade it for any arrangement you can think up and implement.

Default Dispatcher

By default, CherryPy uses a fairly straightforward mapping procedure. The root of the site is the :attr:`Application.root <cherrypy._cptree.Application.root>` object. When it receives a URL, it breaks it into its path components, and proceeds looking down into the site until it finds an object that is the 'best match' for that particular URL. For each path component it tries to find an object with the same name, starting from root, and going down for each component it finds, until it can't find a match. An example shows it better:

root = HelloWorld()
root.onepage = OnePage()
root.otherpage = OtherPage()

In the example above, the URL http://localhost/onepage will point at the first object and the URL http://localhost/otherpage will point at the second one. As usual, this search is done automatically. But it goes even further:

root.some = Page()
root.some.page = Page()

In this example, the URL http://localhost/some/page will be mapped to the root.some.page object. If this object is exposed (or alternatively, its index method is), it will be called for that URL.

In our HelloWorld example, adding the http://localhost/onepage/ mapping to OnePage().index could be done like this:

class OnePage(object):
    def index(self):
        return "one page!"
    index.exposed = True

class HelloWorld(object):
    onepage = OnePage()

    def index(self):
        return "hello world"
    index.exposed = True

cherrypy.quickstart(HelloWorld())

Normal methods

CherryPy can directly call methods on the mounted objects, if it receives a URL that is directly mapped to them. For example:

def foo(self):
    return 'Foo!'
foo.exposed = True

root.foo = foo

In the example, root.foo contains a function object, named foo. When CherryPy receives a request for the /foo URL, it will automatically call the foo() function. Note that it can be a plain function, or a method of any object; any callable will do it.

Index methods

The index method has a special role in CherryPy: it handles intermediate URI's that end in a slash; for example, the URI /orders/items/ might map to root.orders.items.index. The index method can take additional keyword arguments if the request includes querystring or POST params; see :ref:`kwargs`, next. However, unlike all other page handlers, it cannot take positional arguments (see :ref:`args`, below).

The default dispatcher will always try to find a method named index at the end of the branch traversal. In the example above, the URI "/onepage/" would result in the call: app.root.onepage.index(). Depending on the use of the :func:`trailing_slash Tool <cherrypy.lib.cptools.trailing_slash>`, that might be interrupted with an HTTPRedirect, but otherwise, both "/onepage" (no trailing slash) and "/onepage/" (trailing slash) will result in the same call.

Keyword Arguments

Any page handler that is called by CherryPy (index, or any other suitable method) can receive additional data from HTML or other forms using keyword arguments. For example, the following login form sends the username and the password as form arguments using the POST method:

<form action="doLogin" method="post">
    <p>Username</p>
    <input type="text" name="username" value=""
        size="15" maxlength="40"/>
    <p>Password</p>
    <input type="password" name="password" value=""
        size="10" maxlength="40"/>
    <p><input type="submit" value="Login"/></p>
    <p><input type="reset" value="Clear"/></p>
</form>

The following code can be used to handle this URL:

class Root:
    def doLogin(self, username=None, password=None):
        # check the username & password
        ...
    doLogin.exposed = True

Both arguments have to be declared as keyword arguments. The default value can be used either to provide a suitable default value for optional arguments, or to provide means for the application to detect if some values were missing from the request.

CherryPy supports both the GET and POST method for HTML forms. Arguments are passed the same way, regardless of the original method used by the browser to send data to the web server.

Positional Arguments

When a request is processed, the URI is split into its components, and each one is matched in order against the nodes in the tree. Any trailing components are "virtual path" components and are passed as positional arguments. For example, the URI "/branch/leaf/4" might result in the call: app.root.branch.leaf(4), or app.root.index(branch, leaf, 4) depending on how you have your handlers arranged.

Partial matches can happen when a URL contains components that do not map to the object tree. This can happen for a number of reasons. For example, it may be an error; the user just typed the wrong URL. But it also can mean that the URL contains extra arguments.

For example, assume that you have a blog-like application written in CherryPy that takes the year, month and day as part of the URL http://localhost/blog/2005/01/17. This URL can be handled by the following code:

class Root:
    def blog(self, year, month, day):
        ...
    blog.exposed = True

root = Root()

So the URL above will be mapped as a call to:

root.blog('2005', '01', '17')

In this case, there is a partial match up to the blog component. The rest of the URL can't be found in the mounted object tree. In this case, the blog() method will be called, and the positional parameters will receive the remaining path segments as arguments. The values are passed as strings; in the above mentioned example, the arguments would still need to be converted back into numbers, but the idea is correctly presented.

Default methods

If the default dispatcher is not able to locate a suitable page handler by walking down the tree, it has a last-ditch option: it starts walking back ''up'' the tree looking for default methods. Default methods work just like any other method with positional arguments, but are defined one level further down, in case you have multiple methods to expose. For example, we could have written the above "blog" example equivalently with a "default" method instead:

class Blog:
    def default(self, year, month, day):
        ...
    default.exposed = True

class Root: pass

root = Root()
root.blog = Blog()

So the URL http://localhost/blog/2005/01/17 will be mapped as a call to:

root.blog.default('2005', '01', '17')

You could achieve the same effect by defining a __call__ method in this case, but "default" just reads better. ;)

Special characters

You can use dots in a URI like /path/to/my.html, but Python method names don't allow dots. To work around this, the default dispatcher converts all dots in the URI to underscores before trying to find the page handler. In the example, therefore, you would name your page handler "def my_html". However, this means the page is also available at the URI /path/to/my_html. If you need to protect the resource (e.g. with authentication), you must protect both URLs.

Other Dispatchers

But Mr. Fielding mentions two kinds of "mapping implementations" above: trees and hash tables ('dicts' in Python). Some web developers claim trees are difficult to change as an application evolves, and prefer to use dicts (or a list of tuples) instead. Under these schemes, the mapping key is often a regular expression, and the value is the handler function. For example:

def root_index(name):
    return "Hello, %s!" % name

def branch_leaf(size):
    return str(int(size) + 3)

mappings = [
    (r'^/([^/]+)$', root_index),
    (r'^/branch/leaf/(\d+)$', branch_leaf),
    ]

CherryPy allows you to use a :class:`Dispatcher<cherrypy._cpdispatch.Dispatcher>` other than the default if you wish. By using another :class:`Dispatcher <cherrypy._cpdispatch.Dispatcher>` (or writing your own), you gain complete control over the arrangement and behavior of your page handlers (and config). To use another dispatcher, set the request.dispatch config entry to the dispatcher you like:

d = cherrypy.dispatch.RoutesDispatcher()
d.connect(name='hounslow', route='hounslow', controller=City('Hounslow'))
d.connect(name='surbiton', route='surbiton', controller=City('Surbiton'),
          action='index', conditions=dict(method=['GET']))
d.mapper.connect('surbiton', controller='surbiton',
                 action='update', conditions=dict(method=['POST']))

conf = {'/': {'request.dispatch': d}}
cherrypy.tree.mount(root=None, config=conf)

A couple of notes about the example above:

  • Since Routes has no controller hierarchy, there's nothing to pass as a root to :func:`cherrypy.tree.mount <cherrypy._cptree.Tree.mount>`; pass None in this case.
  • Usually you'll use the same dispatcher for an entire app, so specifying it at the root ("/") is common. But you can use different dispatchers for different paths if you like.
  • Because the dispatcher is so critical to finding handlers (and their ancestors), this is one of the few cases where you cannot use :ref:`_cp_config <cp_config>`; it's a chicken-and-egg problem: you can't ask a handler you haven't found yet how it wants to be found.
  • Since Routes are explicit, there's no need to set the exposed attribute. All routes are always exposed.

CherryPy ships with additional Dispatchers in :mod:`cherrypy._cpdispatch`.

PageHandler Objects

Because the Dispatcher sets :attr:`cherrypy.request.handler <cherrypy._cprequest.Request.handler>`, it can also control the input and output of that handler function by wrapping the actual handler. The default Dispatcher passes "virtual path" components as positional arguments and passes query-string and entity (GET and POST) parameters as keyword arguments. It uses a :class:`PageHandler <cherrypy._cpdispatch.PageHandler>` object for this, which looks a lot like:

class PageHandler(object):
    """Callable which sets response.body."""

    def __init__(self, callable, *args, **kwargs):
        self.callable = callable
        self.args = args
        self.kwargs = kwargs

    def __call__(self):
        return self.callable(*self.args, **self.kwargs)

The actual default PageHandler is a little bit more complicated (because the args and kwargs are bound later), but you get the idea. And you can see how easy it would be to provide your own behavior, whether your own inputs or your own way of modifying the output. Remember, whatever is returned from the handler will be bound to :attr:`cherrypy.response.body <cherrypy._cprequest.Response.body>` and will be used as the response entity.

Replacing page handlers

The handler that's going to be called during a request is available at :attr:`cherrypy.request.handler <cherrypy._cprequest.Request.handler`, which means your code has a chance to replace it before the handler runs. It's a snap to write a Tool to do so with a :class:`HandlerWrapperTool <cherrypy._cptools.HandlerWrapperTool>`:

to_skip = (KeyboardInterrupt, SystemException, cherrypy.HTTPRedirect)
def PgSQLWrapper(next_handler, *args, **kwargs):
    trans.begin()
    try:
        result = next_handler(*args, **kwargs)
        trans.commit()
    except Exception, e:
        if not isinstance(e, to_skip):
            trans.rollback()
        raise
    trans.end()
    return result

cherrypy.tools.pgsql = cherrypy._cptools.HandlerWrapperTool(PgSQLWrapper)

Configuration

The default arrangement of CherryPy handlers is a tree. This enables a very powerful configuration technique: config can be attached to a node in the tree and cascade down to all children of that node. Since the mapping of URI's to handlers is not always 1:1, this provides a flexibility which is not as easily definable in other, flatter arrangements.

However, because the arrangement of config is directly related to the arrangement of handlers, it is the responsibility of the Dispatcher to collect per-handler config, merge it with per-URI and global config, and bind the resulting dict to :attr:`cherrypy.request.config <cherrypy._cprequest.Request.config>`. This dict is of depth 1 and will contain all config entries which are in effect for the current request.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.