Issue #594 resolved

Trouble with gzip and staticdir

Anonymous created an issue

When both the gzip and the staticdir tool are enabled, the static files are unstable, it has two behaviours, every time I reload the page, they appear in order, and repeat forever.

The first one is returning the file succesfully without the last 10 bytes.

The other one is raising this exception: {{{ Traceback (most recent call last): File "/home2/mariasys/maria2b/cherrypy/_cprequest.py", line 544, in respond self.hooks.run('before_finalize') File "/home2/mariasys/maria2b/cherrypy/_cprequest.py", line 79, in run hook() File "/home2/mariasys/maria2b/cherrypy/_cprequest.py", line 44, in call return self.callback(**self.kwargs) File "/home2/mariasys/maria2b/cherrypy/lib/encoding.py", line 208, in gzip ct = response.headers.get('Content-Type').split(';')[0] AttributeError: 'NoneType' object has no attribute 'split' }}}

Reported by Sheco

Comments (14)

  1. Anonymous

    Ok, something I didn't mention, the exception happens on Firefox (currently using 2.0), it doesn't happen with the Internet Explorer, but the case 1 still applies I get the file without the last 10 bytes, I'm testing with small text files.

  2. Robert Brewer

    I'd bet that FF and IE are sending different Accept-Encoding headers. Can you find out what those are and map them to the differing output?

    As always, a test case would be best. :)

  3. michele

    I was going to open a new ticket (or asking fumanchu or IRC) regarding gzipped contents, anyway while there is one...

    We have noticed that not all versions of IE support them properly.

    You can find relevant resources here:

    http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

    and here:

    http://support.microsoft.com/default.aspx?scid=kb;en-us;823386&Product=ie600

    It would be nice if the gzip tools could check the user agent version and avoid sending gzipped content to these particular versions of IE (or just to IE :-D).

  4. Robert Brewer

    Regarding http://support.microsoft.com/kb/823386: it would be nice but not worth the code overhead, IMO. The issue has a hotfix already, and users who experience the problem should fix their browser or they won't be able to get anywhere meaningful on the 'Net. If you wish to do user-agent sniffing on your own, feel free:

    def mygzip(*args, **kwargs):
        ua = cherrypy.request.headers.get("User-Agent", "").lower()
        if 'msie' in ua:
            ... # Additional checks
            return
        return encoding.gzip(*args, **kwargs)
    cherrypy.tools.gzip = Tool('before_finalize', mygzip, priority=80)
    
  5. Anonymous

    I am testing with this short script, tested under Fedora Enterprise Linux release 3, and ubuntu dapper drake.

    It is decompressed(?) incorrectly by Firefox and Internet Explorer (I'm not getting the tracebacks now for some reason with firefox), but lynx does get it correctly.

    test.py:

    import cherrypy
    import os
    
    class root:
      index = cherrypy.tools.staticfile.handler('test.txt', 
        os.path.dirname(os.path.abspath(__file__)))
    
    cherrypy.config.update({ 'global': { 'tools.gzip.on': True } } )
    cherrypy.quickstart(root(), '/')
    

    test.txt:

    123456789012345
    
  6. Anonymous

    The server output...

    189.150.3.65 - - [27/Oct/2006:17:03:34] "GET / HTTP/1.1" 200 16 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
    189.150.3.65 - - [27/Oct/2006:17:03:36] "GET / HTTP/1.1" 304 - "" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"
    127.0.0.1 - - [27/Oct/2006:17:03:49] "GET / HTTP/1.0" 200 16 "" "Lynx/2.8.5dev.7 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7a"
    
  7. Robert Brewer

    Bah. I can't reproduce it. I tried (FF 1.5.0.7, IE 6, and Opera 8.02 on Win2k) against (Py 2.4 on Win2k, Py 2.3 on Debian sarge). Neither the initial 200 OK or the following 304 Not Modified showed any problems on any combination of the above. :/

  8. Anonymous

    Probably it's a bug in ETag and GZip chain.

    If page not modified, then ETag raises HTTPRedirect(304) and deletes cherrypy.response.headers['Content-Type']. From this point GZip fails at:

        ct = response.headers.get('Content-Type').split(';')[0]
    

    Also non working part is:

        if not response.body:
            # Response body is empty (might be a 304 for instance)
            return
    

    because response.body value not None, but <generator> with

        >>> [chunk for chunk in response.body]
        []
    
  9. Anonymous

    Look at:

    Body().__set__(...) in _cprequest.py

    save() hook in lib/sessions.py

    save() hook wraps every sequence in to generator

    New page:

      Enter hook init
      Enter hook decode
      Enter hook trailing_slash
      Enter hook save
        request.body before:  [u"<html>...</html"]
        request.body after:  <generator object>
      Enter hook validate_etags
      Enter hook encode
      Exit hook encode
      Enter hook gzip
      Enter hook close
    

    Matched (by ETag) page:

      Enter hook init
      Enter hook decode
      Enter hook trailing_slash
      Enter hook save
        debug: request.body before:  [u"<html>...</html"]
        debug: request.body after:  <generator object>
      Enter hook validate_etags
        debug: raise cherrypy.HTTPRedirect([], 304)
      Enter hook save
        debug: request.body before:  []
        debug: request.body after:  <generator object>
      Enter hook validate_etags
      Enter hook encode
      Enter hook gzip
        debug: AttributeError: 'NoneType' object has no attribute 'split'
    

    BTW: 'before_finalize' save() and validate_etags() hooks entered twice

  10. Robert Brewer
    Probably it's a bug in ETag and GZip chain.
    

    If so, it's not present in current trunk.

    If page not modified, then ETag raises HTTPRedirect(304)
    and deletes cherrypy.response.headers['Content-Type'].
    

    304 also sets response.body to None.

    From this point GZip fails at:
        ct = response.headers.get('Content-Type').split(';')[0]
    

    It won't reach that point because response.body is None

    Also non working part is:
        if not response.body:
            # Response body is empty (might be a 304 for instance)
            return
    
    because response.body value not None, but <generator> with
    
        >>> [chunk for chunk in response.body]
        []
    

    Not if 304 was raised. HTTPRedirect(304) sets response.body to None.

    save() hook wraps every sequence in to generator
    

    Now this could be a problem. save() probably shouldn't do that in CP 3; it should set hooks instead. So it's a bug in the session Tool implementation, not ETag or GZip. What we really need is to establish a checklist of "gotchas" for Tools, and make sure all the builtins behave.

    However, given all that, the script that Sheco posted doesn't mention ETags or sessions...

  11. Log in to comment