Issue #12 resolved

"Fix" for Issue #6 Broke Chunked Transfer Encoding

jaypipes
created an issue

When an HTTP client does chunked-transfer encoding, it is likely because:

  • a) The client has extremely large files to transfer, and/or
  • b) The client does not know how large the file to transfer is (consider distributed stores, where segments of a file may be stored on separate clusters), and/or
  • c) The client does not want to incur the cost of calculating the file size before sending to the HTTP server

The fix for <<issue 6>> removed the previous ability to properly send the Transfer-Encoding: chunked header without a corresponding Content-Length: X header. Previously, our code that did chunked encoding looked like this:

{{{

!python

c = connection_type(self.host, self.port)

Do a simple request or a chunked request, depending

on whether the body param is a file-like object and

the method is PUT or POST

if hasattr(body, 'read') and method.lower() in ('post', 'put'): # Chunk it, baby... c.putrequest(method, action)

for header, value in headers.items():
    c.putheader(header, value)

c.putheader('Transfer-Encoding', 'chunked')
c.endheaders()

chunk = body.read(self.CHUNKSIZE)
while chunk:
    c.send('%x\r\n%s\r\n' % (len(chunk), chunk))
    chunk = body.read(self.CHUNKSIZE)
    c.send('0\r\n\r\n')

else: # Simple request... c.request(method, action, body, headers) }}}

This was working fine, with the receiving server code was able to use webob.Request.body_file properly after calling webob.Request.make_body_seekable().

However, after this change, webob.Request.body and webob.Request.body_file were empty, resulting in an empty file object being passed on to the storage backends of our server.

We had to put the following hack into our code to get things working again:

{{{

!python

c = connection_type(self.host, self.port)

Do a simple request or a chunked request, depending

on whether the body param is a file-like object and

the method is PUT or POST

if hasattr(body, 'read') and method.lower() in ('post', 'put'): # Chunk it, baby... c.putrequest(method, action)

body.seek(0, os.SEEK_END)
total_size = body.tell()
body.seek(0)

for header, value in headers.items():
    c.putheader(header, value)

c.putheader('Transfer-Encoding', 'chunked')
c.putheader('Content-Length', total_size)
c.endheaders()

chunk = body.read(self.CHUNKSIZE)
while chunk:
    c.send('%x\r\n%s\r\n' % (len(chunk), chunk))
    chunk = body.read(self.CHUNKSIZE)
    c.send('0\r\n\r\n')

else: # Simple request... c.request(method, action, body, headers)

}}}

Clearly, the above is neither necessary, according to the HTTP 1.1 spec, nor desirable, because of points a), b) and c) above.

I'm not sure whether the solution is to undo the fix from <<issue 6>> or some other solution, but the hack above is sub-optimal for users of chunked-transfer encoding.

Please let me know your thoughts on how to proceed.

Cheers, and thanks much in advance! -jay

Comments (16)

  1. Sergey Schetinin

    First off I wanted to mention that the problem can be fixed on the server side by setting req.is_body_readable = True. That will tell webob that the wsgi.input stream is readable even if the CONTENT_LENGTH is not set.

    The problem with assuming it is readable without a special flag is that then we need to rely on the server to return empty strings once the entire body was read. Some servers don't do that and just hang up waiting for data from client when there's none (WSGI spec is not strict enough on this point unfortunately). For example wsgiref does that for GET requests (which obv. don't have content-length).

    One alternative approach to this would be to assume the body is readable for PUT, POST and other requests with entity-body, but given how many of those are in WebDAV and possibly other HTTP extensions I'm not aware of, I'm kinda reluctant.

    Checking for transfer-encoding: chunked could work, but here's what the spec says on that:

    WSGI applications must not generate any "hop-by-hop" headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming "hop-by-hop" headers in the environ dictionary. WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

    So if we do that we break the spec. Very few WSGI servers support chunked encoding in requests and I wonder if all of them keep the header or if some strip it.

    Another thing I wanted to mention is that when you call make_body_seekable() with an environ that does not have CONTENT_LENGTH the entire body is read into memory, and only then dumped to disk if it's too big. This is something I wanted to address for a while, but I don't have a use for that myself (as my stuff doesn't need to support chunked-encoded uploads), so it's still that way now.

    Right, so I guess we need to assume the input stream is readable for HTTP methods that are mandated to have a body. I can't find a comprehensive list of such methods off-hand though, so I would appreciate any help with that.

  2. Sergey Schetinin

    Another interesting thing, here's a quote from the HTTP spec:

    If the OPTIONS request includes an entity-body (as indicated by the presence of Content-Length or Transfer-Encoding), then the media type MUST be indicated by a Content-Type field.

    So, assuming that valid WSGI app cannot touch Transfer-Encoding request header, how does it detect that OPTIONS request has a (chunked) body? It cannot just attempt to read the body, because the spec is not clear if that's OK or not when there's no content-type..

  3. jaypipes reporter

    Thanks much for your help on this, Sergey!

    I'll see if that fix will work and update you accordingly. BTW, I need to call make_body_seekable() because I'm passing the body_file as-is to the boto.S3.Key object in its set_contents_from_file() method, and that method requires the file supplied to it to have a seek method. Hope that explains that little piece of the code :)

  4. jaypipes reporter

    Hi again,

    OK, so I did what you suggested and just manually called:

    req.is_body_readable = True

    before I pass req.body_file to the backend storage layer.

    Worked like a charm.

    Cheers, jay

  5. Log in to comment