Title: Python Web Server Gateway Interface v2.0
Version: $Revision: 71593 $
Last-Modified: $Date: 2009-04-13 13:58:19 -0700 (Mon, 13 Apr 2009) $
Author: Armin Ronacher <firstname.lastname@example.org>
Discussions-To: Python Web-SIG <email@example.com>
This document specifies a proposed standard interface between web
servers and Python web applications or frameworks, to promote web
application portability across a variety of web servers.
It superseeds :pep:`0333` for unicode aware applications for both
Python 2.x and Python 3.
Rationale and Goals
Starting with Python 3.0, Python now features two distinct string
types for text and binary data. This also made it necessary to
specify a new revision of WSGI that is based on unicode.
This specification only highlights the differences between WSGI 1.1
and WSGI 2.0.
The following string types are used throughout the specification:
- byte string
- unicode string
- native string
A 'native string' is the primary string type for a particular Python
implementation. For Python 2.X this is a byte string, for Python 3.x
this is a unicode string.
=========== =============== ===============
Python 2.x Python 3.x
----------- --------------- ---------------
native `str` (bytes) `str` (unicode)
bytes `str` `bytes`
unicode `unicode` `str`
----------- --------------- ---------------
Differences to WSGI 1.0
Headers and Environment
- The application is passed an instance of a Python dictionary containing what
is referred to as the WSGI environment. All keys in this dictionary are
native strings. For CGI variables, all names are going to be `iso-8859-1`
and so where native strings are unicode strings, that encoding is used for
the names of CGI variables
- For the WSGI variables ``'wsgi.url_scheme'`` and ``'wsgi.uri_encoding'``
contained in the WSGI environment, the value of the variable should be a
- For the CGI variables contained in the WSGI environment, the values of the
variables are unicode strings. `iso-8859-1` encoding is used for
decoding such that the original character data is preserved and as necessary
the unicode string can be converted back to bytes and thence decoded to
unicode again using a different encoding. (Except for URI values, see the
URI Decoding section)
- The WSGI input stream ``'wsgi.input'`` contained in the WSGI environment and
from which request content is read, MUST yield byte strings.
- The status line specified by the WSGI application should be a unicode string
but might also be a byte string. If an unicode string is used the
server encodes the value as `iso-8859-1`.
- The list of response headers specified by the WSGI application should
contain tuples consisting of two values, where each value is a unicode or
byte string. If a unicode string is used it is encoded as `iso-8859-1`.
- The iterable returned by the application and from which response content
is derived, MUST yield byte strings.
- The version information in the WSGI environment (`wsgi.version`) is ``(2, 0)``.
For the keys ``SCRIPT_NAME``, ``PATH_INFO`` (and ``REQUEST_URI`` if
available but that variable will most likely only contain ASCII characters
because it is quoted) the server has to use the following algorithm for
- it decodes all values as `utf-8`.
- if that fails, it decodes all values as `iso-8859-1`.
The latter will always work. The encoding the server used to decode the
value is then stored in ``'wsgi.uri_encoding'``. The application MUST use this
value to decode the ``'QUERY_STRING'`` as well.
Example implementation (this assumes that `path_info_bytes` and
`script_name_bytes` are bytes)::
path_info = path_info_bytes.decode('utf-8')
script_name = script_name_bytes.decode('utf-8')
uri_encoding = 'utf-8'
path_info = path_info_bytes.decode('iso-8859-1')
script_name = script_name_bytes.decode('iso-8859-1')
uri_encoding = 'iso-8859-1'
environ['PATH_INFO'] = path_info
environ['SCRIPT_NAME'] = script_name
environ['wsgi.uri_encoding'] = uri_encoding
A middleware might re-encode the values if the `wsgi.uri_encoding` is
`iso-8859-1` and only then. If it performs re-encoding it must alter
the value of `wsgi.uri_encoding` as well. For example if an application
accepts legacy URL as `iso-8859-7` encoded, this middleware might be
iso_8859_1 = codecs.lookup('iso-8859-1')
def new_application(environ, start_response):
if codecs.lookup(environ['wsgi.uri_encoding']) == iso_8859_1:
environ['PATH_INFO'] = reencode(environ['PATH_INFO'])
environ['SCRIPT_NAME'] = reencode(environ['SCRIPT_NAME'])
environ['wsgi.uri_encoding'] = 'iso-8859-7'
return application(environ, start_response)
If the application encodes URIs it is required to encode the URLs to
`utf-8`, independent of the value of the `wsgi.uri_encoding`.
The WSGI server has to provide a `write()` function that works like
exactly like the function in WSGI 1.0, but it is required to emit a
deprecation warning to warns about this function being obsolete.
`write()` will be remove in WSGI 2.0, which will be based on WSGI 2.0.
The same rule applies for the `exc_info` parameter of the `start_response`
function. If this parameter is used in WSGI 2.0, the server must still
handle it, but warn with a deprecation warning.
This document has been placed in the public domain.