Issue #497 resolved

Decode the "chunked" transfer-coding

Robert Brewer
created an issue

From [http://www.faqs.org/rfcs/rfc2616.html RFC 2616]:

{{{ An implementation is not compliant if it fails to satisfy one or more of the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant."

...

All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding, and MUST ignore chunk-extension extensions they do not understand. }}}

Comments (7)

  1. Anonymous

    I was reading the specification regarding Chunked encoding and this won't be easy to implement.

    We have two separated issues. Since Transfer-Encoding (TE) is applied to the entity body, it can be initiated by the server or the UA. This means that CherryPy MUST be able to received and decode "chunked" data (in HTTP/1.1) but it should (note that this is not a SHOULD) also support it to send chunked data to the UA. As I understand it the goal of splitting an entity-body into chunks is to reduce the load on both parts. It could also be used for streaming or dynamic input contents to be sent.

    It seems to me that from our the way CherryPy is designed, it would not really help us to receive chunked data as we would have to buffer it on our side as I assume our processBody() has to read the complete response before going further. We might change that behavior but it might we need to get rid of FieldStorage (but I might misunderstand FieldStorage here).

    Well on the other hand since the RFC does not imply chunked data has to mean something for either part of the wire and only that we have to support it in HTTP/1.1 I guess this could be done through _cpgifs.py and simply buffer the response.

    On the other hand, it would be much more interesting to make our server being able to send chunked body as it could be a relief in terms of load constraints for big contents. We would need a way to specify what should be chunked though.

  2. Robert Brewer reporter

    Note that the WSGI spec expressly requires that the WSGI server do the decoding, so none of this can be done in processBody(). Instead, it would have to be another extension inside _cpwsgi.WSGIServer. We have two options, as I see it:

    1. When a request is chunked, buffer the entire body, read any header lines in the trailer, and then hand a new rfile to the Engine which wraps the buffered body in a StringIO object. Could be a memory-hog (although we should still be able to test against max_request_body as we buffer); a temp file might be better. 2. When a request is chunked, wrap the rfile in a proxy (which decodes chunked data on the fly) and hand that to the Engine. That would require either ignoring header lines in the trailer (i.e., those declared in a "Trailer" request header), or notifying the Request object that there are headers in the trailer. So we would need to processHeaders, processBody, then processTrailingHeaders or some such. This would force a lot of tools which would have run before_request_body to run before_main instead.

  3. Anonymous

    Hard to tell really. I think the first one is the better of the two even if it could be a memory eater. It sounds simpler to put up in practice.

  4. Robert Brewer reporter

    Implemented in [1258]. I went with a StringIO for now (instead of a temp file); _cpwsgi probably needs to check server.max_request_body as we decode.

  5. Log in to comment