1. Ian Bicking
  2. wsgi-peps


wsgi-peps / pep-XXXX.txt

Title: Python Web Server Gateway Interface v2.0
Version: $Revision: 71593 $
Last-Modified: $Date: 2009-04-13 13:58:19 -0700 (Mon, 13 Apr 2009) $
Author: Armin Ronacher <armin.ronacher@active-4.com>
Discussions-To: Python Web-SIG <web-sig@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 19-Sep-2009


This document specifies a proposed standard interface between web
servers and Python web applications or frameworks, to promote web
application portability across a variety of web servers.

It superseeds :pep:`0333` for unicode aware applications for both
Python 2.x and Python 3.

Rationale and Goals

Starting with Python 3.0, Python now features two distinct string
types for text and binary data.  This also made it necessary to
specify a new revision of WSGI that is based on unicode.

Specification Overview

This specification only highlights the differences between WSGI 1.1
and WSGI 2.0.

String Types

The following string types are used throughout the specification:

-   byte string
-   unicode string
-   native string

A 'native string' is the primary string type for a particular Python
implementation.  For Python 2.X this is a byte string, for Python 3.x
this is a unicode string.

=========== =============== ===============
            Python 2.x      Python 3.x
----------- --------------- ---------------
native      `str` (bytes)   `str` (unicode)
bytes       `str`           `bytes`
unicode     `unicode`       `str`
----------- --------------- ---------------

Differences to WSGI 1.0

Headers and Environment

- The application is passed an instance of a Python dictionary containing what
  is referred to as the WSGI environment.  All keys in this dictionary are
  native strings.  For CGI variables, all names are going to be `iso-8859-1`
  and so where native strings are unicode strings, that encoding is used for
  the names of CGI variables

- For the WSGI variables ``'wsgi.url_scheme'`` and ``'wsgi.uri_encoding'``
  contained in the WSGI environment, the value of the variable should be a
  unicode string.

- For the CGI variables contained in the WSGI environment, the values of the
  variables are unicode strings.  `iso-8859-1` encoding is used for
  decoding such that the original character data is preserved and as necessary
  the unicode string can be converted back to bytes and thence decoded to
  unicode again using a different encoding.  (Except for URI values, see the
  URI Decoding section)

- The WSGI input stream ``'wsgi.input'`` contained in the WSGI environment and
  from which request content is read, MUST yield byte strings.

- The status line specified by the WSGI application should be a unicode string
  but might also be a byte string.  If an unicode string is used the
  server encodes the value as `iso-8859-1`.

- The list of response headers specified by the WSGI application should
  contain tuples consisting of two values, where each value is a unicode or
  byte string.  If a unicode string is used it is encoded as `iso-8859-1`.

- The iterable returned by the application and from which response content
  is derived, MUST yield byte strings.

- The version information in the WSGI environment (`wsgi.version`) is ``(2, 0)``.

URI Decoding

For the keys ``SCRIPT_NAME``, ``PATH_INFO`` (and ``REQUEST_URI`` if
available but that variable will most likely only contain ASCII characters
because it is quoted) the server has to use the following algorithm for

-   it decodes all values as `utf-8`.
-   if that fails, it decodes all values as `iso-8859-1`.

The latter will always work.  The encoding the server used to decode the
value is then stored in ``'wsgi.uri_encoding'``.  The application MUST use this
value to decode the ``'QUERY_STRING'`` as well.

Example implementation (this assumes that `path_info_bytes` and
`script_name_bytes` are bytes)::

        path_info = path_info_bytes.decode('utf-8')
        script_name = script_name_bytes.decode('utf-8')
        uri_encoding = 'utf-8'
    except UnicodeError:
        path_info = path_info_bytes.decode('iso-8859-1')
        script_name = script_name_bytes.decode('iso-8859-1')
        uri_encoding = 'iso-8859-1'

    environ['PATH_INFO'] = path_info
    environ['SCRIPT_NAME'] = script_name
    environ['wsgi.uri_encoding'] = uri_encoding

URI Re-decoding

A middleware might re-decode the values if the `wsgi.uri_encoding` is
`iso-8859-1` and only then.  If it performs re-encoding it must alter
the value of `wsgi.uri_encoding` as well.  For example if an application
accepts legacy URL as `iso-8859-7` encoded, this middleware might be

    import codecs
    iso_8859_1 = codecs.lookup('iso-8859-1')

    def redecode(string):
        return string.encode('iso-8859-1').decode('iso-8859-7')

    def latin7_fallback_middleware(application):
        def new_application(environ, start_response):
            if codecs.lookup(environ['wsgi.uri_encoding']) == iso_8859_1:
                environ['PATH_INFO'] = redecode(environ['PATH_INFO'])
                environ['SCRIPT_NAME'] = redecode(environ['SCRIPT_NAME'])
                environ['wsgi.uri_encoding'] = 'iso-8859-7'
            return application(environ, start_response)
        return new_application

URI Encoding

If the application encodes URIs it is required to encode the URLs to
`utf-8`, independent of the value of the `wsgi.uri_encoding`.


The WSGI server has to provide a `write()` function that works like
exactly like the function in WSGI 1.0, but it is required to emit a
deprecation warning to warns about this function being obsolete.
`write()` will be remove in WSGI 2.0, which will be based on WSGI 2.0.

The same rule applies for the `exc_info` parameter of the `start_response`
function.  If this parameter is used in WSGI 2.0, the server must still
handle it, but warn with a deprecation warning.


This document has been placed in the public domain.

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70