1. cherrypy
  2. CherryPy

Wiki

Clone wiki

CherryPy / CherryPySpec

                          The CherryPy HTTP framework

Abstract
--------

CherryPy is a framework for developing and deploying HTTP applications.


CONTENTS
========

    1 Introduction
        1 Purpose
        2 Requirements
        3 Terminology
        4 Overview
    2 Core
        1 Applications
        2 Requests and Responses
            1 The Request object
            2 The Response Object
            3 Serving the Request and Response
            4 Request Execution
            5 Cleanup
        3 Dispatchers
            1 Invocation
            2 request.handler
            3 request.config
        4 HTTP Servers
        5 WSGI
        6 Engines
    3 Extensions
        1 Hooks
            1 Hook points
            2 Hook objects
        2 Tools
            1 Decorators
            2 Callables
            3 Handlers
        3 Toolboxes
        4 Configuration
            1 Scopes
            2 Namespaces
                1 Namespace handlers
            3 Handler Attributes
    4. Footnotes and References


1 Introduction

CherryPy is a framework for developing and deploying HTTP applications.

1.1 Purpose

This specification defines the composition and interaction of CherryPy
components.

1.2 Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
See http://www.ietf.org/rfc/rfc2119.txt.

An implementation is not compliant if it fails to satisfy one or more of
the MUST or REQUIRED level requirements for the protocols it implements.
An implementation that satisfies all the MUST or REQUIRED level and all the
SHOULD level requirements for its protocols is said to be "unconditionally
compliant"; one that satisfies all the MUST level requirements but not all
the SHOULD level requirements for its protocols is said to be "conditionally
compliant."

1.3 Terminology

Unless otherwise specified, all terminology used in this specification
should be interpreted as that of "Hypertext Transfer Protocol -- HTTP/1.1"
(RFC 2616) and "Uniform Resource Identifiers (URI): Generic Syntax and
Semantics" (RFC 2396).

Additional terms:

handler (page handler)
    A callable which responds to a request, usually by returning an HTTP
    response body.

handler (namespace handler)
    A callable which parses and applies a configuration entry based on
    a hierarchy of entry names.

unexpected exception
    In the normal course of responding to requests, CherryPy raises known
    exceptions such as HTTPError, HTTPRedirect, and InternalRedirect in
    order to skip various parts of the request process. In addition, the
    exceptions SystemExit and KeyboardInterrupt are never handled by request
    objects, but are always passed outward to the caller. These are all
    "expected exceptions", and any other exception, therefore, is defined
    as an "unexpected exception".

1.4 Overview

CherryPy consists of not one, but four separate API layers.

The APPLICATION LAYER is the simplest. CherryPy applications are written as
a tree of classes and methods, where each branch in the tree corresponds to
a branch in the URL path. Each method is a 'page handler', which receives
GET and POST params as keyword arguments, and returns or yields the (HTML)
body of the response. The special method name 'index' is used for paths
that end in a slash, and the special method name 'default' is used to
handle multiple paths via a single handler. This layer also includes:

 * the 'exposed' attribute (and cherrypy.expose)
 * cherrypy.quickstart()
 * _cp_config attributes
 * cherrypy.tools (including cherrypy.session)
 * cherrypy.url()

The ENVIRONMENT LAYER is used by developers at all levels. It provides
information about the current request and response, plus the application
and server environment, via a (default) set of top-level objects:

 * cherrypy.request
 * cherrypy.response
 * cherrypy.engine
 * cherrypy.server
 * cherrypy.tree
 * cherrypy.config
 * cherrypy.thread_data
 * cherrypy.log
 * cherrypy.HTTPError, NotFound, and HTTPRedirect
 * cherrypy.lib

The EXTENSION LAYER allows advanced users to construct and share their own
plugins. See Section 3.

Finally, there is the CORE LAYER, which uses the core API's to construct
the default components which are available at higher layers. You can think
of the default components as the 'reference implementation' for CherryPy.
Megaframeworks (and advanced users) may replace the default components
with customized or extended components. The core API's are discussed in
Section 2.


2 Core

2.1 Applications

CherryPy uses an application object to implement a collection of URI's
which maps to a collection of page handlers. This terminology is taken
directly from Fielding, "...the server receives the identifier (which
identifies the mapping) and applies it to its current mapping implementation
(usually a combination of collection-specific deep tree traversal and/or
hash tables) to find the currently responsible handler implementation and
the handler implementation then selects the appropriate action+response
based on the request content."

The exact implementation of that mapping is dependent on the dispatcher(s)
(section 2.3) which the application employs internally; by default, the
external application interface only exposes a "script name" (root URI)
for the entire collection.

An application object MUST contain the following three attributes:

    * script_name: a string, containing the "mount point" for this object.
        A mount point is that portion of the URI which is constant for all
        URIs that are serviced by this application; it does not include
        scheme, host, or proxy ("virtual host") portions of the URI.
        It MUST NOT end in a slash. If the script_name refers to the
        root of the URI, it MUST be an empty string (not "/").
    * config: a nested dict, containing configuration entries which apply
        to this application, of the form: {section: {entry name: value}}.
        The 'section' keys MUST be strings. If they represent URI paths,
        they MUST begin with a slash, and MUST be relative to this object's
        script_name. If they do not begin with a slash, they SHOULD be
        treated as arbitrary section names, which applications MAY use as
        they see fit. The 'entry name' keys MUST be strings, and in the
        case of path sections, SHOULD be namespaced (section 3.4).
        The values may be arbitrary Python values.
    * namespaces: a dict of configuration namespace names and handlers.
        See section 3.4.

Application objects also MUST possess a "merge" method, that takes a single
"config" argument, which MUST be a dict, nested in the same manner as the
application object's config. The "merge" method MUST combine the supplied
config with the application object's existing config dict in such a way that
the supplied config overrides (overwrites) entries in the existing config.
The "merge" method MUST NOT remove any values in the existing config unless
replacing them with a new value, or performing the removal via a namespace
handler. The "merge" method MUST pass all entries in the supplied config to
the proper namespace handler (if any). It MUST NOT pass any entries from the
existing config to namespace handlers, since these entries will have already
been handled when they were first merged. Callers SHOULD NOT attempt to add
config entries to the application object via any means other than passing a
new config dict to the "merge" method.

The specification of application objects excludes calling syntax by design;
their implementation, however, MAY include additional methods which are used
to associate them with an HTTP request, and even initiate the handling of
each request. For example, the reference implementation extends the spec
by adding a __call__ method which acts as a "WSGI application interface";
WSGI servers and middleware may then hand off request processing to such
an application object by calling it.

In addition, application objects MAY possess other attributes and methods
which consumers can use to differentiate them. For example, a consumer
might wish to use different application objects based on the "Accept" HTTP
request header, in which case a cooperating creator of application objects
could give each object an additional "accept" attribute.


2.2 Requests and Responses

The CherryPy Request API involves the creation and handling of Request and
Response objects, and also a caller. The caller is usually an HTTP server
(section 2.4), although it may act through intermediaries such as a WSGI
adapter (section 2.5) and/or an Engine (section 2.6). The rest of this
section uses "HTTP server" to mean any combination of calling code,
regardless of its architecture.

The API is quite simple, and consists of five steps:

2.2.1 The Request Object

An HTTP server obtains a request object by instantiating it directly.
Each HTTP request MUST result in a separate request object.

The constructor arguments for the request object are:

    local_host: an instance of http.Host corresponding to the server socket.
    remote_host: an instance of http.Host corresponding to the client socket.
    scheme: a string containing the protocol actually used for the HTTP
        conversation, lowercased. Usually, this will be either "http" or
        "https", but is open to extension. This should be provided by the
        server based on its own awareness of the conversation details;
        that is, it should not be obtained from any part of the request
        message itself.
    server_protocol: a string containing the HTTP-Version for which the
        server is at least conditionally compliant. Servers which meet all
        of the MUSTs in RFC 2616 should set this to "HTTP/1.1"; all others
        should use "HTTP/1.0" (lower versions are not explicitly supported).

Once the HTTP server obtains the request object, it is free to modify it in
any way it sees fit. Generally, this involves adding new server environment
attributes such as 'login', 'multithread', 'app', 'prev' and so on. Some such
additional attributes MAY be required by individual request implementations.

Request objects SHOULD use hooks (section 3.1) and tools (section 3.2) to
implement extensions.


2.2.2 The Response Object

The HTTP server obtains a response object by instantiating it; there are
no arguments. Each HTTP request MUST result in a separate response object.

Once the HTTP server obtains the response object, it is free to modify it
in any way it sees fit. Some additional attributes MAY be required by
individual response implementations.


2.2.3 Serving the Request and Response

Once the HTTP server has obtained a request and response object (and before
executing the request object, section 2.2.4), it MUST register them both via:

    cherrypy.serving.load(req, resp)

This makes the request and response objects available via cherrypy.request
and cherrypy.response, respectively.


2.2.4 Request Execution

When ready, the HTTP server calls the 'run' method of the Request.
It takes the following arguments; the first four SHOULD be obtained
directly from the HTTP Request-Line.

    * method: a string containing the HTTP request method token.
        Methods are case-sensitive.
    * path: a string containing the Request-URI, minus any query string.
        This string MUST be "% HEX HEX" decoded.
    * query_string: a string containing the query string from the URI.
        This string SHOULD NOT be "% HEX HEX" decoded.
    * req_protocol: a string containing the HTTP-Version of the request
        message; for example, "HTTP/1.1".

    * headers: a list of (name, value) tuples containing the request headers.
    * rfile: a file-like object containing the HTTP request entity.

The 'run' method handles the request in any way it sees fit. The only
constraint is that it MUST return the cherrypy.response object, which MUST
be the same object that the HTTP server created, and which MUST have the
following three attributes upon return:

    * status: a valid HTTP Status-Code and Reason-Phrase, e.g. "200 OK".
    * header_list: a list of (name, value) tuples of the response headers.
    * body: an iterable yielding strings.

The HTTP server SHOULD then use these response attributes to build the
outbound stream. Due to the vagaries of socket communications, and to
reduce the burden on server authors, the HTTP server MAY iterate over
the entire response body, or it may not. CherryPy application authors
should not assume that page handlers which are generators will run to
completion.


2.2.5 Cleanup

Regardless of whether the HTTP server iterates over the entire response
body or not, it MUST call the 'close' method of the request object
once it has finished with the body. The 'close' method takes no args,
and MUST be idempotent.

Once an HTTP server obtains a request object, it MUST call the 'close'
method, even if exceptions occur during the remainder of the process.
Once the 'close' method returns (or errors), the HTTP server SHOULD
delete all references to the request and response objects.

In addition, the HTTP server MUST clear the serving object as follows:

    cherrypy.serving.clear()


2.3 Dispatchers

A 'dispatcher' is the function or callable object which looks up the
'page handler' callable and collects config for the current request
based on the path_info, other request attributes, and the application
architecture.

The default dispatcher discovers the page handler by matching path_info
to a hierarchical arrangement of objects, starting at request.app.root.
Other dispatchers MAY use other techniques to map the given URI (and
other message parameters) to the proper handler.

2.3.1 Invocation

Request objects MUST look up and call a dispatcher as early as possible
after the headers are read and parsed, and MUST pass a single 'path_info'
argument to the dispatcher.

Dispatchers MUST be callable, and MUST take a single 'path_info' argument
(a string). When called, they MUST set request.handler and request.config.
In addition, if the handler is an "index" handler (designed to map to URI's
which end in a slash ("/")), the dispatcher SHOULD set request.is_index
to True.

2.3.2 request.handler

The value bound to request.handler MUST be a callable object that takes
no arguments. Note that instances of the builtin exceptions HTTPError,
NotFound, and HTTPRedirect may be set as handlers, if appropriate.

Because request.handler MUST take no arguments, it MAY be wrapped in an
intermediary object which calls the "real" handler, allowing the "real"
handler to be passed arguments which have been stored in the intermediary.
For example, the LateParamPageHandler in the reference implementation
wraps the "real" handler so that it can decide which arguments to pass
to the handler (and can decide as late as possible). Such intermediaries
SHOULD provide read-write access to the wrapped handler and SHOULD
provide read/write access to the positional and keyword arguments
which they will eventually pass to the wrapped handler.

2.3.3 request.config

The value bound to request.config MUST be a new dict object (that is,
not shared between requests) and MUST contain all entries found in
cherrypy.config, and any entries found in cherrypy.request.app.config
which apply to the current path_info or one of its hierarchical ancestors.
Entries from app.config MUST override entries from cherrypy.config,
and multiple entries in app.config MUST be collapsed into a single
entry by retaining the value with the longest URI path.

The request.config dict SHOULD also contain _cp_config entries from handler
methods and their containers (such as controller classes) and merge those
values into request.config. However, since the very nature of different
dispatchers is to enable different controller architectures, the decision
of where to attach and collect _cp_config entries is dispatcher-specific.
Also, dispatchers SHOULD allow app.config entries to override _cp_config
entries; this allows deployers to more easily override developer defaults.

Dispatchers may be nested, and therefore a given dispatcher MAY call
another and pass it a different 'path_info' argument (for example,
the builtin VirtualHost dispatcher adds a prefix to the path_info
value it receives before calling the next dispatcher). Some consumers
may even wish to attach dispatchers as methods on their controller
classes (which would then presumably set request.handler to a found
method of that controller).


2.4 HTTP Servers

An "HTTP server" is a component "that accepts connections in order
to service [HTTP] requests by sending back [HTTP] responses."
"HTTP communication usually takes place over TCP/IP connections."

Server objects MUST possess the following attributes:

    * protocol_version: a string containing the HTTP-Version for which
        the server is at least conditionally compliant.
    * start: a method which starts the HTTP server. In order to make servers
        easier to write, this method MAY block until the server is stopped
        or interrupted.
    * ready: a boolean state flag, which the server MUST set internally to
        signal whether or not it is ready to receive requests from clients.
    * stop: a method which stops the HTTP server. This method MUST block
        until the server is truly stopped (all threads idle or shutdown
        and all sockets closed, including the listening socket).
    * restart: a method which calls stop, then start.
    * max_request_body_size:
    * max_request_header_size:
    * thread_pool:

Servers which communicate over TCP SHOULD possess these additional attributes:

    * reverse_dns:
    * socket_file:
    * socket_host:
    * socket_port:
    * socket_queue_size:
    * socket_timeout:

Servers which use SSL SHOULD possess these additional attributes:

    * ssl_certificate:
    * ssl_private_key:


2.5 WSGI

See PEP 333.


2.6 Engines

Engine objects MUST possess the following attributes:

    * state: a state flag, one of:
        * STOPPED = 0
        * STARTING = None
        * STARTED = 1
    * block: a method which MUST block until the 'state' is STOPPED or an
        exception is raised. This allows a main thread to wait while child
        threads respond to HTTP requests. If any exception is raised, the
        method SHOULD call its own 'stop' method. If KeyboardInterrupt or
        SystemExit is raised, the method MUST call server.stop.
    * restart: a method which MUST call the 'stop' method, and then the
        'start' method.
    * start: a method which takes a single optional 'blocking' argument.
        If True, the 'start' method MUST call the 'block' method.
        The 'start' method MAY temporarily set 'state' to STARTING,
        but MUST set it to STARTED before either returning or blocking.
    * stop: a method which MUST set 'state' to STOPPED. Note that this
        will signal any thread which has called 'block' to stop blocking.
    * wait: a method which must block until the 'state' is STARTED.
        This allows a main thread to wait until the engine has started
        without having to block after that point.

3 Extensions

3.1 Hooks

Hooks are optional callables which are invoked at various points in the
request-handling process. They MAY be declared (attached) by the core,
by application developers, and by deployers.

3.1.1 Hook points

Each hook callable is bound to a "hook point", a named calling point
inside the request-handling process. The exact list of available hook
points is flexible, and SHOULD be specified by the request object
(section 2.2.1). Request objects SHOULD implement the following
hook points, and SHOULD call them according to the corresponding
descriptions:

    * on_start_resource: called after the headers are read and parsed,
        and a page handler is located.
    * before_request_body: called just before the request entity body
        is read from the incoming stream.
    * before_handler: called just before the page handler is called.
    * before_finalize: called just before the response entity is checked
        for validity. For page handlers which buffer their output, this
        should be called after the entire response body has been buffered.
        For page handlers which stream their output, this should be called
        after the generator has been returned, but before it has been
        iterated over. This may be called more than once if errors occur.
    * on_end_resource: called just before the "run" method of the request
        object returns.
    * on_end_request: called after the entire response message has been
        written out to the client. This allows hook callables to run
        after unbuffered page handlers have terminated. In general,
        this should be run inside the request object's "close" method.
    
    * before_error_response: called just before generating a response
        due to an unexpected exception.
    * after_error_response: called just after generating a response
        due to an unexpected exception.

3.1.2 Hook objects

In order to facilitate the declaration, inspection, and invocation of hook
callables, each one MUST be wrapped in a Hook object. Each Hook object MUST
possess the following attributes:

    * callback: The hook callable that this Hook object is wrapping,
        which will be called when the Hook is called.
    * failsafe: If True, the callback MUST be guaranteed to run even if
        other callbacks from the same call point raise any exceptions
        (other than KeyboardInterrupt and SystemExit). Because errors
        may be silenced by failsafe hooks, unexpected exceptions which
        occur during the execution of a hook MUST be logged.
    * priority: Defines the order of execution for a list of Hooks at
        the same hook point. Priority numbers SHOULD be limited to the
        closed interval [0, 100], but values outside this range are
        acceptable, as are fractional values.
    * kwargs: A set of keyword arguments that will be passed to the
        callable on each call.

3.2 Tools

The Tool interface allows pluggable extensions, both simple and complex,
to be declared by a uniform API. It also allows request objects to run
code between the page handler lookup (section 2.3.2) and the first hook
(section 3.1). This is essential to provide dynamic hook declarations
based on the configuration in effect for each request.

Tool objects MUST possess a single "_setup" method which takes no arguments.
This method MUST be called after the request.handler has been obtained,
and before the first hook point is reached. The reference implementation
uses toolboxes (section 3.3), each with its own configuration namespace
(section 3.4.2), to accomplish this. Tools SHOULD belong to a toolbox.
The "_setup" method SHOULD attach hooks in order to invoke functionality
at appropriate points in the request process.

3.2.1 Decorators

Tool objects SHOULD be callable, and this feature SHOULD be used as a
decorator to declare that a given tool applies to a given handler. For
example, given a Tool object called "tools.proxy", the following code
snippet would enable the tool for the given handler:

    @tools.proxy(base="https://www.mydomain.cz")
    def whats_my_base(self):
        return cherrypy.request.base
    whats_my_base.exposed = True

Note in particular that the Tool object must be called to be used in
this fashion. This allows application developers to supply keyword
arguments to the decorator that will then be used by the tool when
its "_setup" method is called. That is, the following code is not
expected to work (its behavior is undefined by this specification),
since tools.proxy is used as a decorator itself, rather than the
result of tools.proxy():

    @tools.proxy
    def whats_my_base(self):
        return cherrypy.request.base
    whats_my_base.exposed = True

Note also that the reference implementation does not wrap the original
function; instead, it asserts that the decorated handler function has a
configuration attribute (section 3.4.3) which enables the tool. Tool
implementations SHOULD do likewise.

3.2.2 Callables

Tool objects SHOULD expose an attribute named "callable", which allows
the functionality of the tool to be invoked anywhere, most likely from
within a page handler. If the tool object does not have invokable
functionality, or if it uses cooperating hooks that are not useful
in isolation, it SHOULD NOT expose the "callable" attribute.

3.2.3 Handlers

Some tools are designed to circumvent the normal calling of a page handler;
for example, a tool which finds static files and serves them as the response
does not need to then call a separate handler. Such tools SHOULD expose a
"handler" method, which allows the tool to be declared in place of a "normal"
page handler method:

    from cherrypy.tools import staticdir
    
    class Root:
        nav = staticdir.handler(section="/nav", dir="nav", root=absDir)

The "handler" method, if provided, MUST return a callable which can be used
as a request.handler callable. That callable SHOULD have its "exposed"
attribute set to True before being returned from the "handler" method.

The reference implementation includes a HandlerTool class which implements
these recommendations.

3.3 Toolboxes

A toolbox is a set of tools sharing a single namespace. CherryPy uses the "tools"
namespace for the built-in tools. Distinct toolboxes should be unaware of each other.

3.4 Configuration

In CherryPy, "configuration" refers to the (declarative) values and attributes
which affect the (imperative) behavior of a running program. Implementations
MUST provide a means of declaring configuration values (indeed, they can
hardly prevent normal code from being one); they MAY do so in formats other
than Python code (such as INI-style config files).

3.4.1 Scopes

CherryPy configuration is separated in several ways, each set of boundaries
mapping directly to some user need.

Configuration data MUST allow for two independent layers: that which applies
to a single application and that which applies to ALL applications. The former
is called "(per-)application" config, and the latter is called "global"
(or "site-wide") config.

Application config is further separated by URI in a hierarchical fashion.
That is, each configuration entry for a given URI MUST apply to that URI
and all its child URI's (all URI's that begin with the given URI), unless
explicitly counteracted by an opposing entry for a child URI.

In some cases, two different applications may share a common URI. For example,
a WSGI dispatcher may choose one over another based on the contents of the
"Accept" header. A more common example occurs when one application is "mounted"
at "/" and another mounted at "/foo". When this occurs, the configuration of
each application MUST be isolated to that application; that is, configuration
entries from one application MUST NOT "leak" into another, even if they share
the same URI-space.

3.4.2 Namespaces

CherryPy config entries, whether global- or application-scoped, SHOULD be
"namespaced"; that is, they should use a hierarchical naming scheme for
the keys. The reference implementation, for example, adopts the Python
"dotted attribute" notation, so that e.g. "tools.sessions.name" refers
to a "tools" container (object) with a "sessions" attribute, and a "name"
subattribute. This allows the parsing and activation of configuration data
to be controlled by smaller "handler" components (at the least, one for each
top-level namespace), rather than by a monolithic parser.

3.4.2.1 Namespace handlers

Namespace handlers are objects which parse and activate configuration
entries based on a hierarchy. In order to reduce confusion and allow for
easy extension, CherryPy implementations SHOULD use sets of namespace
handlers exclusively for this task.

Each handler in a set MUST be either a callable which takes a key and a
value argument, or a Python 2.5-style context manager [1] whose __enter__
method returns such a callable. The "key" argument MUST by a string, and
that key MAY include further hierarchical delimiters (which the callable will
parse on its own). The value's type and range are variable for each entry.

3.4.3 Page Handler Attributes

In addition to allowing application developers and deployers to associate
configuration with specific URI's, the implementation SHOULD allow them
to associate configuration entries with specific page handlers. Because
the mapping of URI's to page handlers is not 1:1, this allows maximum
developer flexibility.


4. Footnotes and References

[1] For a complete discussion of the use and requirements of context
    managers, see http://www.python.org/dev/peps/pep-0343/

Updated