Clone wiki

CherryPy / WhatsNewIn30

PageOutline [This document is complete to rev 1460.]

What's new in CherryPy 3.0

This document only describes new features in CherryPy 3.0. A detailed "How To Upgrade" document is at [wiki:UpgradeTo30 UpgradeTo30].

Speed

CherryPy 3 is much faster than CherryPy 2 (as much as three times faster in benchmarks).

Config

_cp_config: attaching config to handlers

In CP 2, you could only specify "config" in a config file or dict, where it was always keyed by URL. For example:

[/path/to/page]
methods_with_bodies = ("POST", "PUT", "PROPPATCH")

It's obvious that the extra method is the norm for that path; in fact, the code could be considered broken without it. In CherryPy 3, you can attach that bit of config directly on the page handler:

def page(self):
    return "Hello, world!"
page.exposed = True
page._cp_config = {"request.methods_with_bodies": ("POST", "PUT", "PROPPATCH")}

This can be done at any point in the cherrypy tree; for example, we could have attached that config to a class which contains the page method:

class SetOPages:

    _cp_config = {"request.methods_with_bodies": ("POST", "PUT", "PROPPATCH")}

    def page(self):
        return "Hullo, Werld!"
    page.exposed = True

This technique allows you to:

  • Put config near where it's used for improved readability and maintainability.
  • Attach config to objects instead of URL's. This allows multiple URL's to point to the same object, yet you only need to define the config once.
  • Provide defaults which are still overridable in a config file.

Separate configuration scopes

CherryPy 2 used a single config dict for global, per-application, and per-path config. CherryPy 3 separates these scopes in a couple of ways:

First, and most '''important''', cherrypy.config now only holds global config data; that is, config entries which affects all mounted applications. Each Application object keeps its own config in app.config. You must pass global config to cherrypy.config.update, and per-application config to cherrypy.tree.mount. You ''may'' use a single config file and hand the same file (or filename) to both methods; put your global config in a [global] section to signal cherrypy.config.update which entries to grab.

Second, when a request is processed, these two config sources (global and per-application) are merged and collapsed to form a single config dict stored inside cherrypy.request.config. This dict contains only those config entries which apply to the given request; that is, per-path config. Note that when you do an InternalRedirect, this config is recalculated for the new path.

Configuration namespaces

In CherryPy 2, config entries were somewhat haphazard about their naming and scope. They were always inspected as late as possible, often multiple times, and their default values were locked away inside the source code.

In CherryPy 3, all config entries (except "environment") are now prefixed with a namespace. When you provide a config entry, it is now bound as early as possible to the actual object referenced by the namespace; for example, CP 2's "stream_response" is now "response.stream", and actually sets the "stream" attribute of cherrypy.response. In this way, you can easily determine the default value by firing up a python interpreter and typing:

>>> import cherrypy
>>> cherrypy.response.stream
False

This also means that some objects (the Request class in particular) have grown a number of new attributes, to avoid the need for config.get().

Entries from each namespace may be allowed in the global, application root ("/") or per-path config, or a combination:

ScopeGlobalApplication RootApp Path
engineX
hooksXXX
logXX
requestXXX
responseXXX
serverX
toolsXXX

Custom config namespaces

You can define your own namespaces if you like, and they can do far more than simply set attributes. The test/test_config module, for example, shows an example of a custom namespace that coerces incoming params and outgoing body content. The _cpwsgi module includes an additional, builtin namespace for invoking WSGI middleware.

In essence, a config namespace handler is just a function, that gets passed any config entries in its namespace. You add it to a namespaces registry (a dict), where keys are namespace names and values are handler functions. When a config entry for your namespace is encountered, the corresponding handler function will be called, passing the config key and value; that is, namespaces[namespace](k, v). For example, if you write:

def db_namespace(k, v):
    if k == 'connstring':
        orm.connect(v)
cherrypy.config.namespaces['db'] = db_namespace

...then cherrypy.config.update({"db.connstring": "Oracle:host=1.10.100.200;sid=TEST"}) will call db_namespace('connstring', 'Oracle:host=1.10.100.200;sid=TEST').

The point at which your namespace handler is called depends on where you add it:

NamespaceHandler is called in
config.namespacescherrypy.config.update
Application.namespacesApplication.merge (which is called by cherrypy.tree.mount)
engine.request_class.namespacesRequest.configure (called for each request, after the handler is looked up)

If you need additional code to run when all your namespace keys are collected, you can supply a callable context manager in place of a normal function for the handler. Context managers are defined in [http://www.python.org/dev/peps/pep-0343/ PEP 343].

Tools

Using builtin tools

Filters are gone! In their place are Tools, which allow for much more flexibility. If your favorite builtin filter has changed to a tool, it's easy to convert your code. See [wiki:UpgradeTo30 UpgradeTo30] for a complete list of name changes. Instead of this:

[/docroot]
static_filter.on: True
static_filter.root: "/path/to/app"
static_filter.dir: 'static'

...use the "tools" namespace like this:

[/docroot]
tools.staticdir.on: True
tools.staticdir.root: "/path/to/app"
tools.staticdir.dir: 'static'

We can also use our new friend _cp_config (see above):

class docroot(object):

    _cp_config = {'tools.staticdir.on': True,
                  'tools.staticdir.root: "/path/to/app",
                  'tools.staticdir.dir': 'static'}

But we can do even better by using the '''builtin decorator support''' that all Tools have:

class docroot(object):

    @tools.staticdir(root="/path/to/app", dir='static')
    def page(self):
       ...

...and in this case, we can do even '''better''' because tools.staticdir is a 'HandlerTool', and therefore can be used directly as a page handler:

class docroot(object):

    static = tools.staticdir.handler(section='static', root="/path/to/app",
                                     dir='static')

Finally, you can use (most) Tools directly, by calling the function they wrap. They expose this via the 'callable' attribute:

def page(self):
    tools.response_headers.callable([('Content-Language', 'fr')])
    return "Bonjour, le Monde!"
page.exposed = True

Because the underlying function is wrapped in a tool, you need to call help(tools.whatevertool.callable) if you want the docstring for it. Using help(tools.whatevertool) will give you help on how to use it as a Tool (for example, as a decorator).

Tools also are also '''inspectable''' automatically. They expose their own arguments as attributes:

>>> dir(cherrypy.tools.session_auth)
[..., 'anonymous', 'callable', 'check_username_and_password',
'do_check', 'do_login', 'do_logout', 'handler', 'login_screen',
'on_check', 'on_login', 'on_logout', 'run', 'session_key']

This makes IDE calltips especially useful, even when writing config files!

New and improved builtin tools

tools.proxy

This replaces and enhances the old baseurl_filter. The old way:

baseurl_filter.base_url = "http://myhost"
baseurl_filter.use_x_forwarded_host = False

The new way:

tools.proxy(base=None, local='X-Forwarded-Host', 
            remote='X-Forwarded-For', scheme='X-Forwarded-Proto')

This changes the base URL (scheme:host[:port][/path]), and is most useful when running a CP server behind Apache or some other webserver.

tools.proxy.local defines the request header which will be used to auto-fill the new request.base. If you want the new request.base to include path info (not just the host), you must explicitly set base to the full base path, and ALSO set tools.proxy.local to "" (empty string), so that the X-Forwarded-Host request header (which never includes path info) does not override it.

New in CP 3: cherrypy.request.remote.ip (the IP address of the client) will be rewritten if the header specified by tools.proxy.remote is valid. By default, 'remote' is set to 'X-Forwarded-For'. If you do not want to rewrite remote.ip, set the 'remote' arg to an empty string.

tools.log_tracebacks

This replaces the CP 2 feature: "server.log_tracebacks".

tools.log_headers

This replaces the CP 2 feature: "server.log_request_headers".

tools.err_redirect

Turn this tool on to redirect all unhandled errors to a different page. Supply the new URL via tools.err_redirect.url. By default, this raises InternalRedirect. To use HTTPRedirect, set tools.err_redirect.internal to False.

tools.etags

This new tool validates the current ETag response header against If-Match and If-None-Match headers, and raises "304 Not Modified" or "412 Precondition Failed" as needed. If tools.etags.autotags is True, an ETag response-header value will be provided from an MD5 hash of the response body (unless some other code has already provided an ETag header). If False (the default), the ETag will not be automatic.

tools.expires

A tool for influencing cache mechanisms using the 'Expires' header.

tools.expires.secs must be either an int or a datetime.timedelta, and indicates the number of seconds between response.time and when the response should expire. The 'Expires' header will be set to (response.time + secs). If zero (the default), the following "cache prevention" headers are also set:

'Pragma': 'no-cache'
'Cache-Control': 'no-cache'

If tools.expires.force is False (the default), the following headers are checked: 'Etag', 'Last-Modified', 'Age', 'Expires'. If any are already present, none of the above response headers are set.

tools.basic_auth

A tool for doing basic authentication. It takes a "realm" setting (a string) and a "users" dict of {username: password} pairs (or a callable which returns that dict). If authentication fails, 401 Unauthorized is raised.

tools.digest_auth

A tool for doing Digest authentication (RFC 2617). It takes a "realm" setting (a string) and a "users" dict of {username: password} pairs (or a callable which returns that dict). If authentication fails, 401 Unauthorized is raised.

tools.trailing_slash

A tool that lets you control whether URL's with a missing or extra trailing slash should raise HTTPRedirect. It's on by default, with these settings:

tools.trailing_slash.on = True
tools.trailing_slash.missing = True
tools.trailing_slash.extra = False

That is, if a trailing slash is missing for an index handler, HTTPRedirect is raised. But if a non-index handler has an extra slash, it's not redirected by default.

tools.accept

A tool for verifying that the client is willing to accept the Content-Type of the response.

tools.accept.media, if provided, should be the Content-Type value (as a string) or values (as a list or tuple of strings) which the current request can emit. The client's acceptable media ranges (as declared in the Accept request header) will be matched in order to these Content-Type values; the first such string is returned. That is, the return value will always be one of the strings provided in the 'media' arg (or None if 'media' is None).

The return value doesn't mean anything when used as a Tool, but you can call tools.accept.callable(media) directly to dispatch based on the client's preferred Content-Type:

def select(self):
    mtype = tools.accept.callable(['text/html', 'text/plain'])
    if mtype == 'text/html':
        return "<h2>Page Title</h2>"
    else:
        return "PAGE TITLE"
select.exposed = True

Regardless of whether you call it directly or just turn on the Tool, if no match is found, then HTTPError 406 (Not Acceptable) is raised. Note that most web browsers send */* as a (low-quality) acceptable media range, which should match any Content-Type. In addition, "...if no Accept header field is present, then it is assumed that the client accepts all media types."

Custom tools

You can make your own tools and register them to gain all the benefits the builtin Tools enjoy. Usually, this is as simple as:

cherrypy.tools.my_tool = cherrypy.Tool('before_request_body', my_callback)

cherrypy.tools is an instance of _cptools.Toolbox. When you add your Tool to it, then config entries in the "tools.my_tool.*" namespace automatically get passed to your callback as keyword arguments.

See cherrypy._cptools for more examples.

Custom toolboxes

If you're building a framework on top of !CherryPy, you might want to use your own toolbox to avoid conflicting with builtin tools. It's just a single line: mytools = cherrypy._cptools.Toolbox("mytools"). This one line creates a new Toolbox and automatically registers the "mytools" config namespace.

Hooks

Tools use hooks under the covers. Each Hook has a "callback" attribute, and is registered at a "hook point" in a HookMap called cherrypy.request.hooks. As a request is processed, hooks are called at the following hook points: 'on_start_resource', 'before_request_body', 'before_handler', 'before_finalize', 'on_end_resource', 'on_end_request', 'before_error_response', and 'after_error_response'.

If you can't make a Tool, you can provide custom hooks in config by writing hooks.<hookpoint> = function, and the function you provide will be called at that hook point. If you want to do it in code (especially for a custom Tool, see above), use cherrypy.request.hooks.attach(self, point, callback, failsafe=None, priority=None, **kwargs).

Some Hook objects are "failsafe", which means that they are guaranteed to run even if other Hooks in the same hook point raise exceptions (if more than one fails, they are all logged, but only the last exception is raised). You can either set Hook.failsafe = True, or provide it as Hook(callback, failsafe=True). Additionally, you may be able to set callback.failsafe = True, in which case the Hook will automatically copy that value to itself.

Hook objects also have a "priority", in the closed interval of [0, 100]. By default, Hook.priority is 50, but you can change it (as with failsafe, above). This is a necessary evil to make sure that, for example, the encoding Tool's hooks run before the gzip Tool's hooks (if they were reversed, the request would almost certainly fail, because the encoding Tool was designed to operate on text output, not binary).

Dispatch

"Dispatch" refers to the way the framework looks up and calls application code. By default, CherryPy traverses a tree of objects to find a page handler that you've written. Then it calls that function, passing any virtual path segments as positional arguments and any request parameters (form or querystring values) as keyword arguments. In CherryPy 2, this process was hard-coded into the core; to change it, you had to subclass the Request object.

CherryPy 3 separates dispatch into a new "request.dispatch" object, which you can specify in config per-path. It must refer to a callable that 1) takes a path_info argument, and 2) sets cherrypy.request.handler (a callable that takes no arguments) and cherrypy.request.config (a flat dict containing all config entries that apply to the current request).

There's a new !MethodDispatcher and !RoutesDispatcher in cherrypy.dispatch, too. Feel free to try them out.

URL construction

There's a new cherrypy.url(path) function which can be used to construct portable URL's for your application. It calculates new paths relative to the current SCRIPT_NAME (if you pass a path which starts with "/") or relative to the current PATH_INFO (if you pass a path which ''doesn't'' start with "/").

Autoreload

The autoreload feature has been completely reworked. In CherryPy 2.x, it would immediately start a second process (using os.spawnve(os.P_WAIT, ...)). This caused repeated confusion and complaints when applications would "mysteriously" run startup code twice.

In CherryPy 3, the autoreload mechanism does nothing to the initial process, it simply replaces its own process when needed (using os.execv). You can also now trigger this behavior yourself, outside of the autoreload file-checking logic, by calling cherrypy.engine.reexec. Finally, if your platform supports the HUP signal, then a SIGHUP will automatically call cherrypy.engine.reexec (whereas SIGTERM shuts down CherryPy, now).

We've also borrowed an idea from Turbogears: engine.autoreload_match is a regular expression pattern (default .*) that you can change to filter which files are monitored.

WSGI improvements

WSGI server

The builtin WSGI server is now HTTP/1.1 compliant! It correctly handles persistent connections, pipelining, Expect/100-continue, and the "chunked" transfer-coding (receive only).

It also now emits a custom WSGI environ entry: ACTUAL_SERVER_PROTOCOL. Clients can calculate min(SERVER_PROTOCOL, ACTUAL_SERVER_PROTOCOL) in order to determine which level of HTTP features to support. CherryPy applications can see this min() value in cherrypy.request.protocol.

It also supports HTTPS/SSL! Just set server.ssl_certificate and server.ssl_private_key to the names of each file in your config.

As always, the code in wsgiserver.py is usable anywhere, as it doesn't depend on CherryPy in any way. Feel free to use it with other WSGI stacks.

WSGI applications

cherrypy.Application objects are now WSGI applications, automatically. Whenever you call cherrypy.tree.mount(Root()), the "Root" object you pass is wrapped up in an Application object, and added to cherrypy.tree.apps.

One big difference between CherryPy Application objects and a lot of other WSGI applications is that CherryPy apps usually know their own SCRIPT_NAME before being called. If you cannot or don't want to set this in stone, set app.script_name to None, and the Application will provide it from the WSGI environ['SCRIPT_NAME'] on each request.

In addition, cherrypy.tree is also usable as a "WSGI application"; it acts as dispatching middleware to all mounted apps.

WSGI middleware

In addition to mounting cherrypy.Application objects onto cherrypy.tree, you can also mount plain 'ol WSGI callables, too, using cherrypy.tree.graft(wsgi_callable, script_name=""). Then hand cherrypy.tree to your WSGI server, and it will happily dispatch to both CherryPy apps and foreign WSGI apps.

The profile module is now implemented as WSGI middleware, too. Use cherrypy.lib.profiler.make_app(nextapp, path, aggregate=False) to use it. If 'aggregate' is False, a separate profile dump will be made for each request. If True, all requests (for the same 'nextapp') will be aggregated together into a single results file.

Finally, there's a new "pipeline" helper in cherrypy.wsgi. The config entry wsgi.pipeline = [(name, wsgiapp_factory), ...] will pipe the request through the supplied wsgiapps before handing it off to the CherryPy application. See help(cherrypy.wsgi.CPWSGIApp) for details. If you want to do it in code instead of config, write:

app = cherrypy.Application(Root())
app.wsgiapp.pipeline.append((name, wsgiapp_factory))
cherrypy.tree.mount(app, config={'/': root_conf})

Logging

CherryPy 3 now uses the standard library's logging module, which means you have access to its RotatingFileHandler(s), SocketHandler, SysLogHandler, NTEventLogHandler, SMTPHandler, and HTTPHandler (and other goodies).

In CherryPy 2, log config was specifiable per-path (since it used very simple handlers). Now, there are separate error and access logs for each mounted Application (named "cherrypy.error.%s" % id(app)), as well as global error and access logs (named "cherrypy.error" and "cherrypy.access"). This naming scheme means that messages sent to "cherrypy.access.723863" will automatically also be sent to the global "cherrypy.access" log.

Code inspection

A lot of work has been done to make CherryPy 3 play nice with the interactive interpreter. If you don't know or can't recall how something works or even what features are available, start with help(cherrypy), and work your way through the available attributes. Two items in particular need mentioning:

  • The cherrypy.request and response objects are dummy objects, and exist only for your benefit when you write an application. The values of their attributes should be considered read-only, and are only intended to let you see default values easily.
  • Tools have two faces. They have their own answers to help() that tell you how to use them as tools; if you want to see the docstrings for the functions they wrap, try help(tools.<toolname>.callable) instead. They ''do'', however copy the argument names of the wrapped function to themselves as attributes (all None), so you should be able to use dir(tools.<toolname>) with no problems.

Redirection and Deadlock

The cherrypy.request object now has improved support for !InternalRedirect situations. First, on redirect, it creates an entirely new Request object, and sets Request.prev to point to the previous Request object. It also inspects the list of seen URL's at each redirect, and, if the new path + querystring has already been visited during this request, raises an error. This stops infinite redirect loops. If for some reason you ''want'' to visit the same path twice in a single request, set wsgi.iredir.recursive = True in config.

You may also now raise !InternalRedirect at any time during the run of a Request. In the past, you could only do so during the "before_main" hook and inside page handlers.

Each response object also has a time attribute (set to time.time() when created), a timeout attribute (default 300 seconds), and a timed_out attribute, a bool. Assuming cherrypy.engine.deadlock_poll_freq is greater than 0, a monitor thread will check if now > response.time + response.timeout; if so, it sets response.timed_out. This is checked at various places in the core, and cherrypy.TimeoutError is raised if response.time_out is True. Feel free to check it and raise TimeoutError in your own code's critical sections.

Drop privileges

There is a new engine.drop_privileges function which may setuid/gid and/or set a new umask, or raise NotImplemented, depending on your platform. If you're on UNIX, you'll probably see engine.uid, engine.gid (names or numbers), and engine.umask attributes which you can set (from config, if you want). If you're on Windows, you'll only see the umask attribute. Other platforms may see none of these. Whatever happens, it'll get logged so you know when it works and it'll raise errors when it doesn't work.

Native support for mod_python

The popular "mpcp" module has been ... uh ... "embraced and extended" and is now included in the standard CherryPy 3 distribution as cherrypy._cpmodpy. Thanks to Jamie Turner for his ingenuity and generosity!

Multiple HTTP server support

The new cherrypy.server object can now control more than one HTTP server. Add additional ones via server.httpservers[myserver] = (host, port). This can be used to listen on multiple ports or protocols.

Updated