every webob > 1.0.1 doesnt work with cherrypy correctly

Marcin Lulek avatarMarcin Lulek created an issue

every webob version i tried > 1.0.1 throws timeout exception for request.copy() when i use cherrypy webserver. reverting to 1.0.1 or older fixes the issue for me

# this will handle possible URL generation 
>>  GET = dict(self.request.copy().GET) # needs dict() for py2.5 comp at
>>  self.make_body_seekable()
.......
>>  return self._sock.recv(size)
timeout: timed out

Comments (21)

  1. Marcin Lulek

    /python/bin/pip install -U hg+https://ergo@bitbucket.org/ianb/webob

    is there an freenode irc channel i can catch you? i sit on #pylons and #pyramid if you will need further tests. issue still exists with trunk here is full traceback:

    .... app stack.....
    
      GET = dict(self.request.copy().GET) # needs dict() for py2.5 compat
    File '/home/ergo/python/lib/python2.6/site-packages/webob/request.py', line 625 in copy
      self.make_body_seekable()
    File '/home/ergo/python/lib/python2.6/site-packages/webob/request.py', line 656 in make_body_seekable
      self.copy_body()
    File '/home/ergo/python/lib/python2.6/site-packages/webob/request.py', line 673 in copy_body
      self.body = self.body_file_raw.read(-1)
    File '/home/ergo/python/lib/python2.6/site-packages/paste/script/wsgiserver/__init__.py', line 199 in read
      data = self.rfile.read(size)
    File '/home/ergo/python/lib/python2.6/site-packages/paste/script/wsgiserver/__init__.py', line 767 in read
      data = self.recv(rbufsize)
    File '/home/ergo/python/lib/python2.6/site-packages/paste/script/wsgiserver/__init__.py', line 747 in recv
      return self._sock.recv(size)
    timeout: timed out
    
  2. Sergey Schetinin

    I'm not on IRC. The traceback looks like a bug in cherrypy. The server should not allow reading more data from the input stream that there is.

    If you have a minimal example to reproduce this, that would be welcome.

  3. Sergey Schetinin

    Can you confirm if the CherryPy folks have accepted this as a bug in their server? I'm pretty sure it is, but would like some confirmation before closing this ticket.

  4. Marcin Lulek

    hey Sergey - i got this response on irc:

    <fumanchu> the immediate problem is you can't call read(-1) on that object
    <fumanchu> you have to call read(content length)
    <fumanchu> CherryPy 3.2 supports read() with no argument to mean "read to end of file"
    <fumanchu> webob should change to support WSGI 1.0.1 (specified in PEP 3333), which says
     "A server should allow read() to be called without an argument, and return the remainder
     of the client's input stream."
    <fumanchu> and it should stop passing -1
    
  5. Sergey Schetinin

    PEP-3333: "The server is not required to read past the client's specified Content-Length, and should simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable."

    Also elsewhere in the spec:

    wsgi.input -- An input stream (file-like object) from which the HTTP request body bytes can be read. (The server or gateway may perform reads on-demand as requested by the application, or it may pre- read the client's request body and buffer it in-memory or on disk, or use any other technique for providing such an input stream, according to its preference.)

    The closest to a spec of what consitutes a file-like object would be Python docs which say the following on the 'read' method: "If the size argument is negative or omitted, read all data until EOF is reached." (http://docs.python.org/library/stdtypes.html#file-objects )

  6. Sergey Schetinin

    In other words the spec does not require the app to call .read() with no argument, it just explicitly requires the server to support such a call, so it seems to me fumanchu is wrong.

  7. Ben Bangert

    Graham, whom I would consider a pretty good authority on WSGI servers has cited this problem in this blog post: http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html

    Specifically he'd like the spec to be changed to say:

    2. The 'wsgi.input' must provide an empty string as end of input stream marker.

    This is because right now its a "should", and not a "must". So of course, we absolutely can't count on there being an end-of-file condition. Ah, the joy of terminology. There are multiple WSGI servers that do not provide an end-of-file condition, and are still in spec since the spec only says they "should" do it to comply, not that they "must" provide it.

    This means the only safe way to read the input is to ensure one never reads past the end of what is in the Content-Length, otherwise depending on the WSGI server, a read() could keep going forever.

  8. Sergey Schetinin

    Thanks for filling that in. I wanted to add that webob only ever calls .read(-1) when there's no CONTENT_LENGTH in the environ, I believe there's also no check for the HTTP_METHOD, so if one tries to req.copy() (which the reporter seems to be doing) and the request method is GET, webob would still try copy the input body, and there's no CONTENT_LENGTH so it would result in a attempt to do the read(-1).

    So if there are indeed a lot of servers that don't provide the end-of-file marker and do not populate wsgi.input w/ an empty StringIO or something of that kind for GET requests and similar, then a guard should be added to webob not to ever touch the wsgi.input for such requests.

  9. Daniel Holth

    Sergey,

    You will find webob.Request().copy() also exercises this bug after fork(). I have a unit test in my repository that fails in WebOb >= 1.0.1. See my pull request from http://bitbucket.org/dholth/webob . I suspect this is broken in other forking web servers as well.

    My suggestion would be to single-step whatever Request.copy() does in 1.0.1 and use that code.

  10. Sergey Schetinin

    Aha! So here's what changed -- before 1.0.1 webob assumed that if the env['CONTENT_LENGTH'] is not present or is an empty string, then there's no body. For its own purposes it was sometimes setting it to '-1' to re-read the FakeCGIBody correctly at a later time.

    In 1.0.1 no / empty CONTENT_LENGTH does not assume it's not there but instead tries to read it and instead of marking FakeCGIBody w/ -1 it can just remove the CONTENT_LENGTH from environ. However, GET requests don't have a CONTENT_LENGTH but some servers will apparently still pass in the socket file as wsgi input which causes the problem.

    I also see no reason why your test would work without the forking part.

    CONTENT_LENGTH is specced as "The contents of any Content-Length fields in the HTTP request. May be empty or absent." so the the way webob handles it is quite reasonable, especially combined with the "should provide an empty string as end of input stream marker" from the spec, but that's 'only' a 'should' as pointed out by Ben.

    So does all this mean that we need to forget about chunked-encoded input streams? What do we do with the FakeCGIBody then? The -1 content-length is such a hack. One option I see is to add another flag to the environ 'webob.input-readable' that would mean "wsgi.input is readable even if there's no CONTENT_LENGTH".

  11. Anonymous

    I imagine on POST the client must send either Content-Length: or Transfer-encoding: chunked?

    Please type 'hg pull http://bitbucket.org/dholth/webob' to include my test. I thought this code worked with a threaded web server but not with a forking server because 'paster serve' is able to call request.copy() but the forking server is not, although it could be the difference between wsgiref.simple_server and cherrypy, and the paste server.

  12. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.