mitsuhiko  committed b518934

Changed the unicode section for SCRIPT_NAME and PATH_INFO.

  • Participants
  • Parent commits 03162ee
  • Branches default

Comments (0)

Files changed (1)

File pep-0333.txt

 That is, they must either be ISO-8859-1 characters, or use RFC 2047
 MIME encoding.
-.. comment:
-   Should this next paragraph be deleted?
-On Python platforms where the ``str`` or ``StringType`` type is in
-fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
-"strings" referred to in this specification must contain only 
-code points representable in ISO-8859-1 encoding (``\u0000`` through
+The strings used in WSGI are byte only.  In Python 3 an implementation
+is required to use ``bytes`` instead of ``str`` to match the
+specification.  If a platform does not provide a string type at all it may
+provide the data as string that must contain only code points
+representable in ISO-8895-1 encoding (``\u0000`` through
 ``\u00FF``, inclusive).  It is a fatal error for an application to 
 supply strings containing any other Unicode character or code point.
 Similarly, servers and gateways **must not** supply
 strings to an application containing any other Unicode characters.
-Again, all strings referred to in this specification **must** be
-of type ``str`` or ``StringType``, and **must not** be of type
-``unicode`` or ``UnicodeType``.  And, even if a given platform allows
-for more than 8 bits per character in ``str``/``StringType`` objects,
-only the lower 8 bits may be used, for any value referred to in
-this specification as a "string".
+The big issues with strings are the unquoted strings in the WSGI
+``environ`` (`PATH_INFO` and `SCRIPT_NAME`).  All the others (except for
+server provided values such as `SERVER_NAME`) will not contain non-ASCII
+data because they are either numeric or URL encoded.
+Because of this future revisions of WSGI will most likely switch away from
+a raw CGI environment to require the server to provide these values to be
+quoted and available on a different key.
+If a server is unable to determine the encoding of the unquoted keys
+because they were lost in the process it **should** encode the value to
+ISO-8895-1 which is most likely what the data was originally or use a more
+reliable way to get the data (such as decoding and splitting the
+``REQUEST_URI`` environ key if available).
 Error Handling