Source

python-peps / pep-0444.txt

Full commit
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
PEP: 444
Title: Python Web3 Interface
Version: $Revision$
Last-Modified: $Date$
Author: Chris McDonough <chrism@plope.com>,
        Armin Ronacher <armin.ronacher@active-4.com>
Discussions-To: Python Web-SIG <web-sig@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 19-Jul-2010


Abstract
========

This document specifies a proposed second-generation standard
interface between web servers and Python web applications or
frameworks.


Rationale and Goals
===================

This protocol and specification is influenced heavily by the Web
Services Gateway Interface (WSGI) 1.0 standard described in PEP 333
[1]_ .  The high-level rationale for having any standard that allows
Python-based web servers and applications to interoperate is outlined
in PEP 333.  This document essentially uses PEP 333 as a template, and
changes its wording in various places for the purpose of forming a
different standard.

Python currently boasts a wide variety of web application frameworks
which use the WSGI 1.0 protocol.  However, due to changes in the
language, the WSGI 1.0 protocol is not compatible with Python 3.  This
specification describes a standardized WSGI-like protocol that lets
Python 2.6, 2.7 and 3.1+ applications communicate with web servers.
Web3 is clearly a WSGI derivative; it only uses a different name than
"WSGI" in order to indicate that it is not in any way backwards
compatible.

Applications and servers which are written to this specification are
meant to work properly under Python 2.6.X, Python 2.7.X and Python
3.1+.  Neither an application nor a server that implements the Web3
specification can be easily written which will work under Python 2
versions earlier than 2.6 nor Python 3 versions earlier than 3.1.

.. note::

   Whatever Python 3 version fixed http://bugs.python.org/issue4006 so
   ``os.environ['foo']`` returns surrogates (ala PEP 383) when the
   value of 'foo' cannot be decoded using the current locale instead
   of failing with a KeyError is the *true* minimum Python 3 version.
   In particular, however, Python 3.0 is not supported.

.. note::

   Python 2.6 is the first Python version that supported an alias for
   ``bytes`` and the ``b"foo"`` literal syntax.  This is why it is the
   minimum version supported by Web3.

Explicability and documentability are the main technical drivers for
the decisions made within the standard.


Differences from WSGI
=====================

- All protocol-specific environment names are prefixed with ``web3.``
  rather than ``wsgi.``, eg. ``web3.input`` rather than
  ``wsgi.input``.

- All values present as environment dictionary *values* are explicitly
  *bytes* instances instead of native strings.  (Environment *keys*
  however are native strings, always ``str`` regardless of
  platform).

- All values returned by an application must be bytes instances,
  including status code, header names and values, and the body.

- Wherever WSGI 1.0 referred to an ``app_iter``, this specification
  refers to a ``body``.

- No ``start_response()`` callback (and therefore no ``write()``
  callable nor ``exc_info`` data).

- The ``readline()`` function of ``web3.input`` must support a size
  hint parameter.

- The ``read()`` function of ``web3.input`` must be length delimited.
  A call without a size argument must not read more than the content
  length header specifies.  In case a content length header is absent
  the stream must not return anything on read.  It must never request
  more data than specified from the client.

- No requirement for middleware to yield an empty string if it needs
  more information from an application to produce output (e.g. no
  "Middleware Handling of Block Boundaries").

- Filelike objects passed to a "file_wrapper" must have an
  ``__iter__`` which returns bytes (never text).

- ``wsgi.file_wrapper`` is not supported.

- ``QUERY_STRING``, ``SCRIPT_NAME``, ``PATH_INFO`` values required to
  be placed in environ by server (each as the empty bytes instance if
  no associated value is received in the HTTP request).

- ``web3.path_info`` and ``web3.script_name`` should be put into the
  Web3 environment, if possible, by the origin Web3 server.  When
  available, each is the original, plain 7-bit ASCII, URL-encoded
  variant of its CGI equivalent derived directly from the request URI
  (with %2F segment markers and other meta-characters intact).  If the
  server cannot provide one (or both) of these values, it must omit
  the value(s) it cannot provide from the environment.

- This requirement was removed: "middleware components **must not**
  block iteration waiting for multiple values from an application
  iterable.  If the middleware needs to accumulate more data from the
  application before it can produce any output, it **must** yield an
  empty string."

- ``SERVER_PORT`` must be a bytes instance (not an integer).

- The server must not inject an additional ``Content-Length`` header
  by guessing the length from the response iterable.  This must be set
  by the application itself in all situations.

- If the origin server advertises that it has the ``web3.async``
  capability, a Web3 application callable used by the server is
  permitted to return a callable that accepts no arguments.  When it
  does so, this callable is to be called periodically by the origin
  server until it returns a non-``None`` response, which must be a
  normal Web3 response tuple.

  .. XXX (chrism) Needs a section of its own for explanation.


Specification Overview
======================

The Web3 interface has two sides: the "server" or "gateway" side, and
the "application" or "framework" side.  The server side invokes a
callable object that is provided by the application side.  The
specifics of how that object is provided are up to the server or
gateway.  It is assumed that some servers or gateways will require an
application's deployer to write a short script to create an instance
of the server or gateway, and supply it with the application object.
Other servers and gateways may use configuration files or other
mechanisms to specify where an application object should be imported
from, or otherwise obtained.

In addition to "pure" servers/gateways and applications/frameworks, it
is also possible to create "middleware" components that implement both
sides of this specification.  Such components act as an application to
their containing server, and as a server to a contained application,
and can be used to provide extended APIs, content transformation,
navigation, and other useful functions.

Throughout this specification, we will use the term "application
callable" to mean "a function, a method, or an instance with a
``__call__`` method".  It is up to the server, gateway, or application
implementing the application callable to choose the appropriate
implementation technique for their needs.  Conversely, a server,
gateway, or application that is invoking a callable **must not** have
any dependency on what kind of callable was provided to it.
Application callables are only to be called, not introspected upon.


The Application/Framework Side
------------------------------

The application object is simply a callable object that accepts one
argument.  The term "object" should not be misconstrued as requiring
an actual object instance: a function, method, or instance with a
``__call__`` method are all acceptable for use as an application
object.  Application objects must be able to be invoked more than
once, as virtually all servers/gateways (other than CGI) will make
such repeated requests.  If this cannot be guaranteed by the
implementation of the actual application, it has to be wrapped in a
function that creates a new instance on each call.

.. note::

   Although we refer to it as an "application" object, this should not
   be construed to mean that application developers will use Web3 as a
   web programming API.  It is assumed that application developers
   will continue to use existing, high-level framework services to
   develop their applications.  Web3 is a tool for framework and
   server developers, and is not intended to directly support
   application developers.)

An example of an application which is a function (``simple_app``)::

    def simple_app(environ):
        """Simplest possible application object"""
        status = b'200 OK'
        headers = [(b'Content-type', b'text/plain')]
        body = [b'Hello world!\n']
        return body, status, headers

An example of an application which is an instance (``simple_app``)::

    class AppClass(object):

        """Produce the same output, but using an instance.  An
        instance of this class must be instantiated before it is
        passed to the server.  """

      def __call__(self, environ):
            status = b'200 OK'
            headers = [(b'Content-type', b'text/plain')]
            body = [b'Hello world!\n']
            return body, status, headers

    simple_app = AppClass()

Alternately, an application callable may return a callable instead of
the tuple if the server supports asynchronous execution.  See
information concerning ``web3.async`` for more information.


The Server/Gateway Side
-----------------------

The server or gateway invokes the application callable once for each
request it receives from an HTTP client, that is directed at the
application.  To illustrate, here is a simple CGI gateway, implemented
as a function taking an application object.  Note that this simple
example has limited error handling, because by default an uncaught
exception will be dumped to ``sys.stderr`` and logged by the web
server.

::

    import locale
    import os
    import sys

    encoding = locale.getpreferredencoding()

    stdout = sys.stdout

    if hasattr(sys.stdout, 'buffer'):
        # Python 3 compatibility; we need to be able to push bytes out
        stdout = sys.stdout.buffer

    def get_environ():
        d = {}
        for k, v in os.environ.items():
            # Python 3 compatibility
            if not isinstance(v, bytes):
                # We must explicitly encode the string to bytes under
                # Python 3.1+
                v = v.encode(encoding, 'surrogateescape')
            d[k] = v
        return d

    def run_with_cgi(application):

        environ = get_environ()
        environ['web3.input']        = sys.stdin
        environ['web3.errors']       = sys.stderr
        environ['web3.version']      = (1, 0)
        environ['web3.multithread']  = False
        environ['web3.multiprocess'] = True
        environ['web3.run_once']     = True
        environ['web3.async']        = False

        if environ.get('HTTPS', b'off') in (b'on', b'1'):
            environ['web3.url_scheme'] = b'https'
        else:
            environ['web3.url_scheme'] = b'http'

        rv = application(environ)
        if hasattr(rv, '__call__'):
            raise TypeError('This webserver does not support asynchronous '
                            'responses.')
        body, status, headers = rv

        CLRF = b'\r\n'

        try:
            stdout.write(b'Status: ' + status + CRLF)
            for header_name, header_val in headers:
                stdout.write(header_name + b': ' + header_val + CRLF)
            stdout.write(CRLF)
            for chunk in body:
                stdout.write(chunk)
                stdout.flush()
        finally:
            if hasattr(body, 'close'):
                body.close()


Middleware: Components that Play Both Sides
-------------------------------------------

A single object may play the role of a server with respect to some
application(s), while also acting as an application with respect to
some server(s).  Such "middleware" components can perform such
functions as:

* Routing a request to different application objects based on the
  target URL, after rewriting the ``environ`` accordingly.

* Allowing multiple applications or frameworks to run side-by-side in
  the same process.

* Load balancing and remote processing, by forwarding requests and
  responses over a network.

* Perform content postprocessing, such as applying XSL stylesheets.

The presence of middleware in general is transparent to both the
"server/gateway" and the "application/framework" sides of the
interface, and should require no special support.  A user who desires
to incorporate middleware into an application simply provides the
middleware component to the server, as if it were an application, and
configures the middleware component to invoke the application, as if
the middleware component were a server.  Of course, the "application"
that the middleware wraps may in fact be another middleware component
wrapping another application, and so on, creating what is referred to
as a "middleware stack".

A middleware must support asychronous execution if possible or fall
back to disabling itself.

Here a middleware that changes the ``HTTP_HOST`` key if an ``X-Host``
header exists and adds a comment to all html responses::

    import time

    def apply_filter(app, environ, filter_func):
        """Helper function that passes the return value from an
        application to a filter function when the results are
        ready.
        """
        app_response = app(environ)

        # synchronous response, filter now
        if not hasattr(app_response, '__call__'):
            return filter_func(*app_response)

        # asychronous response.  filter when results are ready
        def polling_function():
            rv = app_response()
            if rv is not None:
                return filter_func(*rv)
        return polling_function

    def proxy_and_timing_support(app):
        def new_application(environ):
            def filter_func(body, status, headers):
                now = time.time()
                for key, value in headers:
                    if key.lower() == b'content-type' and \
                       value.split(b';')[0] == b'text/html':
                        # assumes ascii compatible encoding in body,
                        # but the middleware should actually parse the
                        # content type header and figure out the
                        # encoding when doing that.
                        body += ('<!-- Execution time: %.2fsec -->' %
                                 (now - then)).encode('ascii')
                        break
                return body, status, headers
            then = time.time()
            host = environ.get('HTTP_X_HOST')
            if host is not None:
                environ['HTTP_HOST'] = host

            # use the apply_filter function that applies a given filter
            # function for both async and sync responses.
            return apply_filter(app, environ, filter_func)
        return new_application

    app = proxy_and_timing_support(app)


Specification Details
=====================

The application callable must accept one positional argument.  For the
sake of illustration, we have named it ``environ``, but it is not
required to have this name.  A server or gateway **must** invoke the
application object using a positional (not keyword) argument.
(E.g. by calling ``body, status, headers = application(environ)`` as
shown above.)

The ``environ`` parameter is a dictionary object, containing CGI-style
environment variables.  This object **must** be a builtin Python
dictionary (*not* a subclass, ``UserDict`` or other dictionary
emulation), and the application is allowed to modify the dictionary in
any way it desires.  The dictionary must also include certain
Web3-required variables (described in a later section), and may also
include server-specific extension variables, named according to a
convention that will be described below.

When called by the server, the application object must return a tuple
yielding three elements: ``status``, ``headers`` and ``body``, or, if
supported by an async server, an argumentless callable which either
returns ``None`` or a tuple of those three elements.

The ``status`` element is a status in bytes of the form ``b'999
Message here'``.

``headers`` is a Python list of ``(header_name, header_value)`` pairs
describing the HTTP response header.  The ``headers`` structure must
be a literal Python list; it must yield two-tuples.  Both
``header_name`` and ``header_value`` must be bytes values.

The ``body`` is an iterable yielding zero or more bytes instances.
This can be accomplished in a variety of ways, such as by returning a
list containing bytes instances as ``body``, or by returning a
generator function as ``body`` that yields bytes instances, or by the
``body`` being an instance of a class which is iterable.  Regardless
of how it is accomplished, the application object must always return a
``body`` iterable yielding zero or more bytes instances.

The server or gateway must transmit the yielded bytes to the client in
an unbuffered fashion, completing the transmission of each set of
bytes before requesting another one.  (In other words, applications
**should** perform their own buffering.  See the `Buffering and
Streaming`_ section below for more on how application output must be
handled.)

The server or gateway should treat the yielded bytes as binary byte
sequences: in particular, it should ensure that line endings are not
altered.  The application is responsible for ensuring that the
string(s) to be written are in a format suitable for the client.  (The
server or gateway **may** apply HTTP transfer encodings, or perform
other transformations for the purpose of implementing HTTP features
such as byte-range transmission.  See `Other HTTP Features`_, below,
for more details.)

If the ``body`` iterable returned by the application has a ``close()``
method, the server or gateway **must** call that method upon
completion of the current request, whether the request was completed
normally, or terminated early due to an error.  This is to support
resource release by the application amd is intended to complement PEP
325's generator support, and other common iterables with ``close()``
methods.

Finally, servers and gateways **must not** directly use any other
attributes of the ``body`` iterable returned by the application.


``environ`` Variables
---------------------

The ``environ`` dictionary is required to contain various CGI
environment variables, as defined by the Common Gateway Interface
specification [2]_.

The following CGI variables **must** be present.  Each key is a native
string.  Each value is a bytes instance.

.. note::

   In Python 3.1+, a "native string" is a ``str`` type decoded using
   the ``surrogateescape`` error handler, as done by
   ``os.environ.__getitem__``.  In Python 2.6 and 2.7, a "native
   string" is a ``str`` types representing a set of bytes.

``REQUEST_METHOD``
  The HTTP request method, such as ``"GET"`` or ``"POST"``.

``SCRIPT_NAME``
  The initial portion of the request URL's "path" that corresponds to
  the application object, so that the application knows its virtual
  "location".  This may be the empty bytes instance if the application
  corresponds to the "root" of the server.  SCRIPT_NAME will be a
  bytes instance representing a sequence of URL-encoded segments
  separated by the slash character (``/``).  It is assumed that
  ``%2F`` characters will be decoded into literal slash characters
  within ``PATH_INFO`` , as per CGI.

``PATH_INFO``
  The remainder of the request URL's "path", designating the virtual
  "location" of the request's target within the application.  This
  **may** be a bytes instance if the request URL targets the
  application root and does not have a trailing slash.  PATH_INFO will
  be a bytes instance representing a sequence of URL-encoded segments
  separated by the slash character (``/``).  It is assumed that
  ``%2F`` characters will be decoded into literal slash characters
  within ``PATH_INFO`` , as per CGI.

``QUERY_STRING``
  The portion of the request URL (in bytes) that follows the ``"?"``,
  if any, or the empty bytes instance.

``SERVER_NAME``, ``SERVER_PORT``
  When combined with ``SCRIPT_NAME`` and ``PATH_INFO`` (or their raw
  equivalents)`, these variables can be used to complete the URL.
  Note, however, that ``HTTP_HOST``, if present, should be used in
  preference to ``SERVER_NAME`` for reconstructing the request URL.
  See the `URL Reconstruction`_ section below for more detail.
  ``SERVER_PORT`` should be a bytes instance, not an integer.

``SERVER_PROTOCOL``
  The version of the protocol the client used to send the request.
  Typically this will be something like ``"HTTP/1.0"`` or
  ``"HTTP/1.1"`` and may be used by the application to determine how
  to treat any HTTP request headers.  (This variable should probably
  be called ``REQUEST_PROTOCOL``, since it denotes the protocol used
  in the request, and is not necessarily the protocol that will be
  used in the server's response.  However, for compatibility with CGI
  we have to keep the existing name.)

The following CGI values **may** present be in the Web3 environment.
Each key is a native string.  Each value is a bytes instances.

``CONTENT_TYPE``
  The contents of any ``Content-Type`` fields in the HTTP request.

``CONTENT_LENGTH``
  The contents of any ``Content-Length`` fields in the HTTP request.

``HTTP_`` Variables
  Variables corresponding to the client-supplied HTTP request headers
  (i.e., variables whose names begin with ``"HTTP_"``).  The presence
  or absence of these variables should correspond with the presence or
  absence of the appropriate HTTP header in the request.

A server or gateway **should** attempt to provide as many other CGI
variables as are applicable, each with a string for its key and a
bytes instance for its value.  In addition, if SSL is in use, the
server or gateway **should** also provide as many of the Apache SSL
environment variables [5]_ as are applicable, such as ``HTTPS=on`` and
``SSL_PROTOCOL``.  Note, however, that an application that uses any
CGI variables other than the ones listed above are necessarily
non-portable to web servers that do not support the relevant
extensions.  (For example, web servers that do not publish files will
not be able to provide a meaningful ``DOCUMENT_ROOT`` or
``PATH_TRANSLATED``.)

A Web3-compliant server or gateway **should** document what variables
it provides, along with their definitions as appropriate.
Applications **should** check for the presence of any variables they
require, and have a fallback plan in the event such a variable is
absent.

Note that CGI variable *values* must be bytes instances, if they are
present at all.  It is a violation of this specification for a CGI
variable's value to be of any type other than ``bytes``.  On Python 2,
this means they will be of type ``str``.  On Python 3, this means they
will be of type ``bytes``.

They *keys* of all CGI and non-CGI variables in the environ, however,
must be "native strings" (on both Python 2 and Python 3, they will be
of type ``str``).

In addition to the CGI-defined variables, the ``environ`` dictionary
**may** also contain arbitrary operating-system "environment
variables", and **must** contain the following Web3-defined variables.

=====================  ===============================================
Variable               Value
=====================  ===============================================
``web3.version``       The tuple ``(1, 0)``, representing Web3
                       version 1.0.

``web3.url_scheme``    A bytes value representing the "scheme" portion of
                       the URL at which the application is being
                       invoked.  Normally, this will have the value
                       ``b"http"`` or ``b"https"``, as appropriate.

``web3.input``         An input stream (file-like object) from which bytes
                       constituting the HTTP request body can be read.
                       (The server or gateway may perform reads
                       on-demand as requested by the application, or
                       it may pre- read the client's request body and
                       buffer it in-memory or on disk, or use any
                       other technique for providing such an input
                       stream, according to its preference.)

``web3.errors``        An output stream (file-like object) to which error
                       output text can be written, for the purpose of
                       recording program or other errors in a
                       standardized and possibly centralized location.
                       This should be a "text mode" stream; i.e.,
                       applications should use ``"\n"`` as a line
                       ending, and assume that it will be converted to
                       the correct line ending by the server/gateway.
                       Applications may *not* send bytes to the
                       'write' method of this stream; they may only
                       send text.

                       For many servers, ``web3.errors`` will be the
                       server's main error log. Alternatively, this
                       may be ``sys.stderr``, or a log file of some
                       sort.  The server's documentation should
                       include an explanation of how to configure this
                       or where to find the recorded output.  A server
                       or gateway may supply different error streams
                       to different applications, if this is desired.

``web3.multithread``   This value should evaluate true if the
                       application object may be simultaneously
                       invoked by another thread in the same process,
                       and should evaluate false otherwise.

``web3.multiprocess``  This value should evaluate true if an
                       equivalent application object may be
                       simultaneously invoked by another process, and
                       should evaluate false otherwise.

``web3.run_once``      This value should evaluate true if the server
                       or gateway expects (but does not guarantee!)
                       that the application will only be invoked this
                       one time during the life of its containing
                       process.  Normally, this will only be true for
                       a gateway based on CGI (or something similar).

``web3.script_name``   The non-URL-decoded ``SCRIPT_NAME`` value.
                       Through a historical inequity, by virtue of the
                       CGI specification, ``SCRIPT_NAME`` is present
                       within the environment as an already
                       URL-decoded string.  This is the original
                       URL-encoded value derived from the request URI.
                       If the server cannot provide this value, it
                       must omit it from the environ.

``web3.path_info``     The non-URL-decoded ``PATH_INFO`` value.
                       Through a historical inequity, by virtue of the
                       CGI specification, ``PATH_INFO`` is present
                       within the environment as an already
                       URL-decoded string.  This is the original
                       URL-encoded value derived from the request URI.
                       If the server cannot provide this value, it
                       must omit it from the environ.

``web3.async``         This is ``True`` if the webserver supports
                       async invocation.  In that case an application
                       is allowed to return a callable instead of a
                       tuple with the response.  The exact semantics
                       are not specified by this specification.

=====================  ===============================================

Finally, the ``environ`` dictionary may also contain server-defined
variables.  These variables should have names which are native
strings, composed of only lower-case letters, numbers, dots, and
underscores, and should be prefixed with a name that is unique to the
defining server or gateway.  For example, ``mod_web3`` might define
variables with names like ``mod_web3.some_variable``.


Input Stream
~~~~~~~~~~~~

The input stream (``web3.input``) provided by the server must support
the following methods:

=====================  ========
Method                 Notes
=====================  ========
``read(size)``         1,4
``readline([size])``   1,2,4
``readlines([size])``  1,3,4
``__iter__()``         4
=====================  ========

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1. The server is not required to read past the client's specified
   ``Content-Length``, and is allowed to simulate an end-of-file
   condition if the application attempts to read past that point.  The
   application **should not** attempt to read more data than is
   specified by the ``CONTENT_LENGTH`` variable.

2. The implementation must support the optional ``size`` argument to
   ``readline()``.

3. The application is free to not supply a ``size`` argument to
   ``readlines()``, and the server or gateway is free to ignore the
   value of any supplied ``size`` argument.

4. The ``read``, ``readline`` and ``__iter__`` methods must return a
   bytes instance.  The ``readlines`` method must return a sequence
   which contains instances of bytes.

The methods listed in the table above **must** be supported by all
servers conforming to this specification.  Applications conforming to
this specification **must not** use any other methods or attributes of
the ``input`` object.  In particular, applications **must not**
attempt to close this stream, even if it possesses a ``close()``
method.

The input stream should silently ignore attempts to read more than the
content length of the request.  If no content length is specified the
stream must be a dummy stream that does not return anything.


Error Stream
~~~~~~~~~~~~

The error stream (``web3.errors``) provided by the server must support
the following methods:

===================   ==========  ========
Method                Stream      Notes
===================   ==========  ========
``flush()``           ``errors``  1
``write(str)``        ``errors``  2
``writelines(seq)``   ``errors``  2
===================   ==========  ========

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1. Since the ``errors`` stream may not be rewound, servers and
   gateways are free to forward write operations immediately, without
   buffering.  In this case, the ``flush()`` method may be a no-op.
   Portable applications, however, cannot assume that output is
   unbuffered or that ``flush()`` is a no-op.  They must call
   ``flush()`` if they need to ensure that output has in fact been
   written.  (For example, to minimize intermingling of data from
   multiple processes writing to the same error log.)

2. The ``write()`` method must accept a string argument, but needn't
   necessarily accept a bytes argument.  The ``writelines()`` method
   must accept a sequence argument that consists entirely of strings,
   but needn't necessarily accept any bytes instance as a member of
   the sequence.

The methods listed in the table above **must** be supported by all
servers conforming to this specification.  Applications conforming to
this specification **must not** use any other methods or attributes of
the ``errors`` object.  In particular, applications **must not**
attempt to close this stream, even if it possesses a ``close()``
method.


Values Returned by A Web3 Application
-------------------------------------

Web3 applications return a tuple in the form (``status``, ``headers``,
``body``).  If the server supports asynchronous applications
(``web3.async``), the response may be a callable object (which accepts no
arguments).

The ``status`` value is assumed by a gateway or server to be an HTTP
"status" bytes instance like ``b'200 OK'`` or ``b'404 Not Found'``.
That is, it is a string consisting of a Status-Code and a
Reason-Phrase, in that order and separated by a single space, with no
surrounding whitespace or other characters.  (See RFC 2616, Section
6.1.1 for more information.)  The string **must not** contain control
characters, and must not be terminated with a carriage return,
linefeed, or combination thereof.

The ``headers`` value is assumed by a gateway or server to be a
literal Python list of ``(header_name, header_value)`` tuples.  Each
``header_name`` must be a bytes instance representing a valid HTTP
header field-name (as defined by RFC 2616, Section 4.2), without a
trailing colon or other punctuation.  Each ``header_value`` must be a
bytes instance and **must not** include any control characters,
including carriage returns or linefeeds, either embedded or at the
end.  (These requirements are to minimize the complexity of any
parsing that must be performed by servers, gateways, and intermediate
response processors that need to inspect or modify response headers.)

In general, the server or gateway is responsible for ensuring that
correct headers are sent to the client: if the application omits a
header required by HTTP (or other relevant specifications that are in
effect), the server or gateway **must** add it.  For example, the HTTP
``Date:`` and ``Server:`` headers would normally be supplied by the
server or gateway.  The gateway must however not override values with
the same name if they are emitted by the application.

(A reminder for server/gateway authors: HTTP header names are
case-insensitive, so be sure to take that into consideration when
examining application-supplied headers!)

Applications and middleware are forbidden from using HTTP/1.1
"hop-by-hop" features or headers, any equivalent features in HTTP/1.0,
or any headers that would affect the persistence of the client's
connection to the web server.  These features are the exclusive
province of the actual web server, and a server or gateway **should**
consider it a fatal error for an application to attempt sending them,
and raise an error if they are supplied as return values from an
application in the ``headers`` structure.  (For more specifics on
"hop-by-hop" features and headers, please see the `Other HTTP
Features`_ section below.)


Dealing with Compatibility Across Python Versions
-------------------------------------------------

Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+
requires some care on the part of the developer.  In general, the Web3
specification assumes a certain level of equivalence between the
Python 2 ``str`` type and the Python 3 ``bytes`` type.  For example,
under Python 2, the values present in the Web3 ``environ`` will be
instances of the ``str`` type; in Python 3, these will be instances of
the ``bytes`` type.  The Python 3 ``bytes`` type does not possess all
the methods of the Python 2 ``str`` type, and some methods which it
does possess behave differently than the Python 2 ``str`` type.
Effectively, to ensure that Web3 middleware and applications work
across Python versions, developers must do these things:

#) Do not assume comparison equivalence between text values and bytes
   values.  If you do so, your code may work under Python 2, but it
   will not work properly under Python 3.  For example, don't write
   ``somebytes == 'abc'``.  This will sometimes be true on Python 2
   but it will never be true on Python 3, because a sequence of bytes
   never compares equal to a string under Python 3.  Instead, always
   compare a bytes value with a bytes value, e.g. "somebytes ==
   b'abc'".  Code which does this is compatible with and works the
   same in Python 2.6, 2.7, and 3.1.  The ``b`` in front of ``'abc'``
   signals to Python 3 that the value is a literal bytes instance;
   under Python 2 it's a forward compatibility placebo.

#) Don't use the ``__contains__`` method (directly or indirectly) of
   items that are meant to be byteslike without ensuring that its
   argument is also a bytes instance.  If you do so, your code may
   work under Python 2, but it will not work properly under Python 3.
   For example, ``'abc' in somebytes'`` will raise a ``TypeError``
   under Python 3, but it will return ``True`` under Python 2.6 and
   2.7.  However, ``b'abc' in somebytes`` will work the same on both
   versions.  In Python 3.2, this restriction may be partially
   removed, as it's rumored that bytes types may obtain a ``__mod__``
   implementation.

#) ``__getitem__`` should not be used.

   .. XXX

#) Dont try to use the ``format`` method or the ``__mod__`` method of
   instances of bytes (directly or indirectly).  In Python 2, the
   ``str`` type which we treat equivalently to Python 3's ``bytes``
   supports these method but actual Python 3's ``bytes`` instances
   don't support these methods.  If you use these methods, your code
   will work under Python 2, but not under Python 3.

#) Do not try to concatenate a bytes value with a string value.  This
   may work under Python 2, but it will not work under Python 3.  For
   example, doing ``'abc' + somebytes`` will work under Python 2, but
   it will result in a ``TypeError`` under Python 3.  Instead, always
   make sure you're concatenating two items of the same type,
   e.g. ``b'abc' + somebytes``.

Web3 expects byte values in other places, such as in all the values
returned by an application.

In short, to ensure compatibility of Web3 application code between
Python 2 and Python 3, in Python 2, treat CGI and server variable
values in the environment as if they had the Python 3 ``bytes`` API
even though they actually have a more capable API.  Likewise for all
stringlike values returned by a Web3 application.


Buffering and Streaming
-----------------------

Generally speaking, applications will achieve the best throughput by
buffering their (modestly-sized) output and sending it all at once.
This is a common approach in existing frameworks: the output is
buffered in a StringIO or similar object, then transmitted all at
once, along with the response headers.

The corresponding approach in Web3 is for the application to simply
return a single-element ``body`` iterable (such as a list) containing
the response body as a single string.  This is the recommended
approach for the vast majority of application functions, that render
HTML pages whose text easily fits in memory.

For large files, however, or for specialized uses of HTTP streaming
(such as multipart "server push"), an application may need to provide
output in smaller blocks (e.g. to avoid loading a large file into
memory).  It's also sometimes the case that part of a response may be
time-consuming to produce, but it would be useful to send ahead the
portion of the response that precedes it.

In these cases, applications will usually return a ``body`` iterator
(often a generator-iterator) that produces the output in a
block-by-block fashion.  These blocks may be broken to coincide with
mulitpart boundaries (for "server push"), or just before
time-consuming tasks (such as reading another block of an on-disk
file).

Web3 servers, gateways, and middleware **must not** delay the
transmission of any block; they **must** either fully transmit the
block to the client, or guarantee that they will continue transmission
even while the application is producing its next block.  A
server/gateway or middleware may provide this guarantee in one of
three ways:

1. Send the entire block to the operating system (and request that any
   O/S buffers be flushed) before returning control to the
   application, OR

2. Use a different thread to ensure that the block continues to be
   transmitted while the application produces the next block.

3. (Middleware only) send the entire block to its parent
   gateway/server.

By providing this guarantee, Web3 allows applications to ensure that
transmission will not become stalled at an arbitrary point in their
output data.  This is critical for proper functioning of
e.g. multipart "server push" streaming, where data between multipart
boundaries should be transmitted in full to the client.


Unicode Issues
--------------

HTTP does not directly support Unicode, and neither does this
interface.  All encoding/decoding must be handled by the
**application**; all values passed to or from the server must be of
the Python 3 type ``bytes`` or instances of the Python 2 type ``str``,
not Python 2 ``unicode`` or Python 3 ``str`` objects.

All "bytes instances" referred to in this specification **must**:

- On Python 2, be of type ``str``.

- On Python 3, be of type ``bytes``.

All "bytes instances" **must not** :

- On Python 2,  be of type ``unicode``.

- On Python 3, be of type ``str``.

The result of using a textlike object where a byteslike object is
required is undefined.

Values returned from a Web3 app as a status or as response headers
**must** follow RFC 2616 with respect to encoding.  That is, the bytes
returned must contain a character stream of ISO-8859-1 characters, or
the character stream should use RFC 2047 MIME encoding.

On Python platforms which do not have a native bytes-like type
(e.g. IronPython, etc.), but instead which generally use textlike
strings to represent bytes data, the definition of "bytes instance"
can be changed: their "bytes instances" must be native strings that
contain only code points representable in ISO-8859-1 encoding
(``\u0000`` through ``\u00FF``, inclusive).  It is a fatal error for
an application on such a platform to supply strings containing any
other Unicode character or code point.  Similarly, servers and
gateways on those platforms **must not** supply strings to an
application containing any other Unicode characters.

.. XXX (armin: Jython now has a bytes type, we might remove this
   section after seeing about IronPython)


HTTP 1.1 Expect/Continue
------------------------

Servers and gateways that implement HTTP 1.1 **must** provide
transparent support for HTTP 1.1's "expect/continue" mechanism.  This
may be done in any of several ways:

1. Respond to requests containing an ``Expect: 100-continue`` request
   with an immediate "100 Continue" response, and proceed normally.

2. Proceed with the request normally, but provide the application with
   a ``web3.input`` stream that will send the "100 Continue" response
   if/when the application first attempts to read from the input
   stream.  The read request must then remain blocked until the client
   responds.

3. Wait until the client decides that the server does not support
   expect/continue, and sends the request body on its own.  (This is
   suboptimal, and is not recommended.)

Note that these behavior restrictions do not apply for HTTP 1.0
requests, or for requests that are not directed to an application
object.  For more information on HTTP 1.1 Expect/Continue, see RFC
2616, sections 8.2.3 and 10.1.1.


Other HTTP Features
-------------------

In general, servers and gateways should "play dumb" and allow the
application complete control over its output.  They should only make
changes that do not alter the effective semantics of the application's
response.  It is always possible for the application developer to add
middleware components to supply additional features, so server/gateway
developers should be conservative in their implementation.  In a
sense, a server should consider itself to be like an HTTP "gateway
server", with the application being an HTTP "origin server".  (See RFC
2616, section 1.3, for the definition of these terms.)

However, because Web3 servers and applications do not communicate via
HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to Web3
internal communications.  Web3 applications **must not** generate any
"hop-by-hop" headers [4]_, attempt to use HTTP features that would
require them to generate such headers, or rely on the content of any
incoming "hop-by-hop" headers in the ``environ`` dictionary.  Web3
servers **must** handle any supported inbound "hop-by-hop" headers on
their own, such as by decoding any inbound ``Transfer-Encoding``,
including chunked encoding if applicable.

Applying these principles to a variety of HTTP features, it should be
clear that a server **may** handle cache validation via the
``If-None-Match`` and ``If-Modified-Since`` request headers and the
``Last-Modified`` and ``ETag`` response headers.  However, it is not
required to do this, and the application **should** perform its own
cache validation if it wants to support that feature, since the
server/gateway is not required to do such validation.

Similarly, a server **may** re-encode or transport-encode an
application's response, but the application **should** use a suitable
content encoding on its own, and **must not** apply a transport
encoding.  A server **may** transmit byte ranges of the application's
response if requested by the client, and the application doesn't
natively support byte ranges.  Again, however, the application
**should** perform this function on its own if desired.

Note that these restrictions on applications do not necessarily mean
that every application must reimplement every HTTP feature; many HTTP
features can be partially or fully implemented by middleware
components, thus freeing both server and application authors from
implementing the same features over and over again.


Thread Support
--------------

Thread support, or lack thereof, is also server-dependent.  Servers
that can run multiple requests in parallel, **should** also provide
the option of running an application in a single-threaded fashion, so
that applications or frameworks that are not thread-safe may still be
used with that server.


Implementation/Application Notes
================================

Server Extension APIs
---------------------

Some server authors may wish to expose more advanced APIs, that
application or framework authors can use for specialized purposes.
For example, a gateway based on ``mod_python`` might wish to expose
part of the Apache API as a Web3 extension.

In the simplest case, this requires nothing more than defining an
``environ`` variable, such as ``mod_python.some_api``.  But, in many
cases, the possible presence of middleware can make this difficult.
For example, an API that offers access to the same HTTP headers that
are found in ``environ`` variables, might return different data if
``environ`` has been modified by middleware.

In general, any extension API that duplicates, supplants, or bypasses
some portion of Web3 functionality runs the risk of being incompatible
with middleware components.  Server/gateway developers should *not*
assume that nobody will use middleware, because some framework
developers specifically organize their frameworks to function almost
entirely as middleware of various kinds.

So, to provide maximum compatibility, servers and gateways that
provide extension APIs that replace some Web3 functionality, **must**
design those APIs so that they are invoked using the portion of the
API that they replace.  For example, an extension API to access HTTP
request headers must require the application to pass in its current
``environ``, so that the server/gateway may verify that HTTP headers
accessible via the API have not been altered by middleware.  If the
extension API cannot guarantee that it will always agree with
``environ`` about the contents of HTTP headers, it must refuse service
to the application, e.g. by raising an error, returning ``None``
instead of a header collection, or whatever is appropriate to the API.

These guidelines also apply to middleware that adds information such
as parsed cookies, form variables, sessions, and the like to
``environ``.  Specifically, such middleware should provide these
features as functions which operate on ``environ``, rather than simply
stuffing values into ``environ``.  This helps ensure that information
is calculated from ``environ`` *after* any middleware has done any URL
rewrites or other ``environ`` modifications.

It is very important that these "safe extension" rules be followed by
both server/gateway and middleware developers, in order to avoid a
future in which middleware developers are forced to delete any and all
extension APIs from ``environ`` to ensure that their mediation isn't
being bypassed by applications using those extensions!


Application Configuration
-------------------------

This specification does not define how a server selects or obtains an
application to invoke.  These and other configuration options are
highly server-specific matters.  It is expected that server/gateway
authors will document how to configure the server to execute a
particular application object, and with what options (such as
threading options).

Framework authors, on the other hand, should document how to create an
application object that wraps their framework's functionality.  The
user, who has chosen both the server and the application framework,
must connect the two together.  However, since both the framework and
the server have a common interface, this should be merely a mechanical
matter, rather than a significant engineering effort for each new
server/framework pair.

Finally, some applications, frameworks, and middleware may wish to use
the ``environ`` dictionary to receive simple string configuration
options.  Servers and gateways **should** support this by allowing an
application's deployer to specify name-value pairs to be placed in
``environ``.  In the simplest case, this support can consist merely of
copying all operating system-supplied environment variables from
``os.environ`` into the ``environ`` dictionary, since the deployer in
principle can configure these externally to the server, or in the CGI
case they may be able to be set via the server's configuration files.

Applications **should** try to keep such required variables to a
minimum, since not all servers will support easy configuration of
them.  Of course, even in the worst case, persons deploying an
application can create a script to supply the necessary configuration
values::

   from the_app import application

   def new_app(environ):
       environ['the_app.configval1'] = b'something'
       return application(environ)

But, most existing applications and frameworks will probably only need
a single configuration value from ``environ``, to indicate the
location of their application or framework-specific configuration
file(s).  (Of course, applications should cache such configuration, to
avoid having to re-read it upon each invocation.)


URL Reconstruction
------------------

If an application wishes to reconstruct a request's complete URL (as a
bytes object), it may do so using the following algorithm::

    host = environ.get('HTTP_HOST')

    scheme = environ['web3.url_scheme']
    port = environ['SERVER_PORT']
    query = environ['QUERY_STRING']

    url = scheme + b'://'

    if host:
        url += host
    else:
        url += environ['SERVER_NAME']

        if scheme == b'https':
            if port != b'443':
               url += b':' + port
        else:
            if port != b'80':
               url += b':' + port

    if 'web3.script_name' in url:
        url += url_quote(environ['web3.script_name'])
    else:
        url += environ['SCRIPT_NAME']
    if 'web3.path_info' in environ:
        url += url_quote(environ['web3.path_info'])
    else:
        url += environ['PATH_INFO']
    if query:
        url += b'?' + query

Note that such a reconstructed URL may not be precisely the same URI
as requested by the client.  Server rewrite rules, for example, may
have modified the client's originally requested URL to place it in a
canonical form.


Open Questions
==============

- ``file_wrapper`` replacement.  Currently nothing is specified here
  but it's clear that the old system of in-band signalling is broken
  if it does not provide a way to figure out as a middleware in the
  process if the response is a file wrapper.


Points of Contention
====================

Outlined below are potential points of contention regarding this
specification.


WSGI 1.0 Compatibility
----------------------

Components written using the WSGI 1.0 specification will not
transparently interoperate with components written using this
specification.  That's because the goals of this proposal and the
goals of WSGI 1.0 are not directly aligned.

WSGI 1.0 is obliged to provide specification-level backwards
compatibility with versions of Python between 2.2 and 2.7.  This
specification, however, ditches Python 2.5 and lower compatibility in
order to provide compatibility between relatively recent versions of
Python 2 (2.6 and 2.7) as well as relatively recent versions of Python
3 (3.1).

It is currently impossible to write components which work reliably
under both Python 2 and Python 3 using the WSGI 1.0 specification,
because the specification implicitly posits that CGI and server
variable values in the environ and values returned via
``start_response`` represent a sequence of bytes that can be addressed
using the Python 2 string API.  It posits such a thing because that
sort of data type was the sensible way to represent bytes in all
Python 2 versions, and WSGI 1.0 was conceived before Python 3 existed.

Python 3's ``str`` type supports the full API provided by the Python 2
``str`` type, but Python 3's ``str`` type does not represent a
sequence of bytes, it instead represents text.  Therefore, using it to
represent environ values also requires that the environ byte sequence
be decoded to text via some encoding.  We cannot decode these bytes to
text (at least in any way where the decoding has any meaning other
than as a tunnelling mechanism) without widening the scope of WSGI to
include server and gateway knowledge of decoding policies and
mechanics.  WSGI 1.0 never concerned itself with encoding and
decoding.  It made statements about allowable transport values, and
suggested that various values might be best decoded as one encoding or
another, but it never required a server to *perform* any decoding
before

Python 3 does not have a stringlike type that can be used instead to
represent bytes: it has a ``bytes`` type.  A bytes type operates quite
a bit like a Python 2 ``str`` in Python 3.1+, but it lacks behavior
equivalent to ``str.__mod__`` and its iteration protocol, and
containment, sequence treatment, and equivalence comparisons are
different.

In either case, there is no type in Python 3 that behaves just like
the Python 2 ``str`` type, and a way to create such a type doesn't
exist because there is no such thing as a "String ABC" which would
allow a suitable type to be built.  Due to this design
incompatibility, existing WSGI 1.0 servers, middleware, and
applications will not work under Python 3, even after they are run
through ``2to3``.

Existing Web-SIG discussions about updating the WSGI specification so
that it is possible to write a WSGI application that runs in both
Python 2 and Python 3 tend to revolve around creating a
specification-level equivalence between the Python 2 ``str`` type
(which represents a sequence of bytes) and the Python 3 ``str`` type
(which represents text).  Such an equivalence becomes strained in
various areas, given the different roles of these types.  An arguably
more straightforward equivalence exists between the Python 3 ``bytes``
type API and a subset of the Python 2 ``str`` type API.  This
specification exploits this subset equivalence.

In the meantime, aside from any Python 2 vs. Python 3 compatibility
issue, as various discussions on Web-SIG have pointed out, the WSGI
1.0 specification is too general, providing support (via ``.write``)
for asynchronous applications at the expense of implementation
complexity.  This specification uses the fundamental incompatibility
between WSGI 1.0 and Python 3 as a natural divergence point to create
a specification with reduced complexity by changing specialized
support for asynchronous applications.

To provide backwards compatibility for older WSGI 1.0 applications, so
that they may run on a Web3 stack, it is presumed that Web3 middleware
will be created which can be used "in front" of existing WSGI 1.0
applications, allowing those existing WSGI 1.0 applications to run
under a Web3 stack.  This middleware will require, when under Python
3, an equivalence to be drawn between Python 3 ``str`` types and the
bytes values represented by the HTTP request and all the attendant
encoding-guessing (or configuration) it implies.

.. note::

   Such middleware *might* in the future, instead of drawing an
   equivalence between Python 3 ``str`` and HTTP byte values, make use
   of a yet-to-be-created "ebytes" type (aka "bytes-with-benefits"),
   particularly if a String ABC proposal is accepted into the Python
   core and implemented.

Conversely, it is presumed that WSGI 1.0 middleware will be created
which will allow a Web3 application to run behind a WSGI 1.0 stack on
the Python 2 platform.


Environ and Response Values as Bytes
------------------------------------

Casual middleware and application writers may consider the use of
bytes as environment values and response values inconvenient.  In
particular, they won't be able to use common string formatting
functions such as ``('%s' % bytes_val)`` or
``bytes_val.format('123')`` because bytes don't have the same API as
strings on platforms such as Python 3 where the two types differ.
Likewise, on such platforms, stdlib HTTP-related API support for using
bytes interchangeably with text can be spotty.  In places where bytes
are inconvenient or incompatible with library APIs, middleware and
application writers will have to decode such bytes to text explicitly.
This is particularly inconvenient for middleware writers: to work with
environment values as strings, they'll have to decode them from an
implied encoding and if they need to mutate an environ value, they'll
then need to encode the value into a byte stream before placing it
into the environ.  While the use of bytes by the specification as
environ values might be inconvenient for casual developers, it
provides several benefits.

Using bytes types to represent HTTP and server values to an
application most closely matches reality because HTTP is fundamentally
a bytes-oriented protocol.  If the environ values are mandated to be
strings, each server will need to use heuristics to guess about the
encoding of various values provided by the HTTP environment.  Using
all strings might increase casual middleware writer convenience, but
will also lead to ambiguity and confusion when a value cannot be
decoded to a meaningful non-surrogate string.

Use of bytes as environ values avoids any potential for the need for
the specification to mandate that a participating server be informed
of encoding configuration parameters.  If environ values are treated
as strings, and so must be decoded from bytes, configuration
parameters may eventually become necessary as policy clues from the
application deployer.  Such a policy would be used to guess an
appropriate decoding strategy in various circumstances, effectively
placing the burden for enforcing a particular application encoding
policy upon the server.  If the server must serve more than one
application, such configuration would quickly become complex.  Many
policies would also be impossible to express declaratively.

In reality, HTTP is a complicated and legacy-fraught protocol which
requires a complex set of heuristics to make sense of. It would be
nice if we could allow this protocol to protect us from this
complexity, but we cannot do so reliably while still providing to
application writers a level of control commensurate with reality.
Python applications must often deal with data embedded in the
environment which not only must be parsed by legacy heuristics, but
*does not conform even to any existing HTTP specification*.  While
these eventualities are unpleasant, they crop up with regularity,
making it impossible and undesirable to hide them from application
developers, as application developers are the only people who are able
to decide upon an appropriate action when an HTTP specification
violation is detected.

Some have argued for mixed use of bytes and string values as environ
*values*.  This proposal avoids that strategy.  Sole use of bytes as
environ values makes it possible to fit this specification entirely in
one's head; you won't need to guess about which values are strings and
which are bytes.

This protocol would also fit in a developer's head if all environ
values were strings, but this specification doesn't use that strategy.
This will likely be the point of greatest contention regarding the use
of bytes.  In defense of bytes: developers often prefer protocols with
consistent contracts, even if the contracts themselves are suboptimal.
If we hide encoding issues from a developer until a value that
contains surrogates causes problems after it has already reached
beyond the I/O boundary of their application, they will need to do a
lot more work to fix assumptions made by their application than if we
were to just present the problem much earlier in terms of "here's some
bytes, you decode them".  This is also a counter-argument to the
"bytes are inconvenient" assumption: while presenting bytes to an
application developer may be inconvenient for a casual application
developer who doesn't care about edge cases, they are extremely
convenient for the application developer who needs to deal with
complex, dirty eventualities, because use of bytes allows him the
appropriate level of control with a clear separation of
responsibility.

If the protocol uses bytes, it is presumed that libraries will be
created to make working with bytes-only in the environ and within
return values more pleasant; for example, analogues of the WSGI 1.0
libraries named "WebOb" and "Werkzeug".  Such libraries will fill the
gap between convenience and control, allowing the spec to remain
simple and regular while still allowing casual authors a convenient
way to create Web3 middleware and application components.  This seems
to be a reasonable alternative to baking encoding policy into the
protocol, because many such libraries can be created independently
from the protocol, and application developers can choose the one that
provides them the appropriate levels of control and convenience for a
particular job.

Here are some alternatives to using all bytes:

- Have the server decode all values representing CGI and server
  environ values into strings using the ``latin-1`` encoding, which is
  lossless.  Smuggle any undecodable bytes within the resulting
  string.

- Encode all CGI and server environ values to strings using the
  ``utf-8`` encoding with the ``surrogateescape`` error handler.  This
  does not work under any existing Python 2.

- Encode some values into bytes and other values into strings, as
  decided by their typical usages.


Applications Should be Allowed to Read ``web3.input`` Past ``CONTENT_LENGTH``
-----------------------------------------------------------------------------

At [6]_, Graham Dumpleton makes the assertion that ``wsgi.input``
should be required to return the empty string as a signifier of
out-of-data, and that applications should be allowed to read past the
number of bytes specified in ``CONTENT_LENGTH``, depending only upon
the empty string as an EOF marker.  WSGI relies on an application
"being well behaved and once all data specified by ``CONTENT_LENGTH``
is read, that it processes the data and returns any response. That
same socket connection could then be used for a subsequent request."
Graham would like WSGI adapters to be required to wrap raw socket
connections: "this wrapper object will need to count how much data has
been read, and when the amount of data reaches that as defined by
``CONTENT_LENGTH``, any subsequent reads should return an empty string
instead."  This may be useful to support chunked encoding and input
filters.


``web3.input`` Unknown Length
-----------------------------

There's no documented way to indicate that there is content in
``environ['web3.input']``, but the content length is unknown.


``read()`` of ``web3.input`` Should Support No-Size Calling Convention
----------------------------------------------------------------------

At [6]_, Graham Dumpleton makes the assertion that the ``read()``
method of ``wsgi.input`` should be callable without arguments, and
that the result should be "all available request content".  Needs
discussion.

Comment Armin: I changed the spec to require that from an
implementation.  I had too much pain with that in the past already.
Open for discussions though.


Input Filters should set environ ``CONTENT_LENGTH`` to -1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At [6]_, Graham Dumpleton suggests that an input filter might set
``environ['CONTENT_LENGTH']`` to -1 to indicate that it mutated the
input.


``headers`` as Literal List of Two-Tuples
-----------------------------------------

Why do we make applications return a ``headers`` structure that is a
literal list of two-tuples?  I think the iterability of ``headers``
needs to be maintained while it moves up the stack, but I don't think
we need to be able to mutate it in place at all times.  Could we
loosen that requirement?

Comment Armin: Strong yes


Removed Requirement that Middleware Not Block
---------------------------------------------

This requirement was removed: "middleware components **must not**
block iteration waiting for multiple values from an application
iterable.  If the middleware needs to accumulate more data from the
application before it can produce any output, it **must** yield an
empty string."  This requirement existed to support asynchronous
applications and servers (see PEP 333's "Middleware Handling of Block
Boundaries").  Asynchronous applications are now serviced explicitly
by ``web3.async`` capable protocol (a Web3 application callable may
itself return a callable).


``web3.script_name`` and ``web3.path_info``
-------------------------------------------

These values are required to be placed into the environment by an
origin server under this specification.  Unlike ``SCRIPT_NAME`` and
``PATH_INFO``, these must be the original *URL-encoded* variants
derived from the request URI.  We probably need to figure out how
these should be computed originally, and what their values should be
if the server performs URL rewriting.


Long Response Headers
---------------------

Bob Brewer notes on Web-SIG [7]_:

    Each header_value must not include any control characters,
    including carriage returns or linefeeds, either embedded or at the
    end.  (These requirements are to minimize the complexity of any
    parsing that must be performed by servers, gateways, and
    intermediate response processors that need to inspect or modify
    response headers.) [1]_

That's understandable, but HTTP headers are defined as (mostly)
\*TEXT, and "words of \*TEXT MAY contain characters from character
sets other than ISO-8859-1 only when encoded according to the rules of
RFC 2047."  [2]_ And RFC 2047 specifies that "an 'encoded-word' may
not be more than 75 characters long...  If it is desirable to encode
more text than will fit in an 'encoded-word' of 75 characters,
multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3]_
This satisfies HTTP header folding rules, as well: "Header fields can
be extended over multiple lines by preceding each extra line with at
least one SP or HT." [1]_

So in my reading of HTTP, some code somewhere should introduce
newlines in longish, encoded response header values.  I see three
options:

1. Keep things as they are and disallow response header values if they
   contain words over 75 chars that are outside the ISO-8859-1
   character set.

2. Allow newline characters in WSGI response headers.

3. Require/strongly suggest WSGI servers to do the encoding and
   folding before sending the value over HTTP.


Request Trailers and Chunked Transfer Encoding
----------------------------------------------

When using chunked transfer encoding on request content, the RFCs
allow there to be request trailers.  These are like request headers
but come after the final null data chunk.  These trailers are only
available when the chunked data stream is finite length and when it
has all been read in.  Neither WSGI nor Web3 currently supports them.

.. XXX (armin) yield from application iterator should be specify write
   plus flush by server.

.. XXX (armin) websocket API.


References
==========

.. [1] PEP 333: Python Web Services Gateway Interface
   (http://www.python.org/dev/peps/pep-0333/)

.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
   (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)

.. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1
   (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)

.. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1, Section 13.5.1
   (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)

.. [5] mod_ssl Reference, "Environment Variables"
   (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)

.. [6] Details on WSGI 1.0 amendments/clarifications.
   (http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html)

.. [7] [Web-SIG] WSGI and long response header values
   http://mail.python.org/pipermail/web-sig/2006-September/002244.html

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: