Issue #2 resolved

WebOb 1.0.2 caches bad representation of webob._parsed_post_vars

Chris McDonough
created an issue

The attached application demonstrates a problem with the caching of "post vars" in WebOb 1.0.2. The "good result" shown is correct on the first call to POST.items(); the "bad result" (on the second call to POST.items()) has every key in the form URL-quoted. This is a regression from WebOb 1.0.1.

The problem appears to be related to line 489 of Changing it from:

env['webob._parsed_post_vars'] = (vars, self.body_file_raw)


env['webob._parsed_post_vars'] = (vars, self.body_file)

Produces the correct result. I doubt this is the correct fix though.

Comments (22)

  1. Reed O'Brien

    The error compounds with each access of the POST property:

    (Pdb) self.POST
    UnicodeMultiDict([('With%3A%20funny%20%28chars%29', u'1'), ('white%20space', u'3')])
    (Pdb) self.POST
    UnicodeMultiDict([('With%253A%2520funny%2520%2528chars%2529', u'1'), ('white%2520space', u'3')])
    (Pdb) self.POST
    UnicodeMultiDict([('With%25253A%252520funny%252520%252528chars%252529', u'1'), ('white%252520space', u'3')])
    (Pdb) self.POST
    UnicodeMultiDict([('With%2525253A%25252520funny%25252520%25252528chars%25252529', u'1'), ('white%25252520space', u'3')])
  2. Chris McDonough reporter

    For the sake of posterity, here's a test case that doesn't require a server:

    from webob import Request
    from cStringIO import StringIO
    from webob.request import environ_from_url
    environ = environ_from_url('http://localhost:8080')
    def encode_multipart_formdata(fields):
        BOUNDARY = '----------ThIs_Is_tHe_bouNdaRY_$'
        CRLF = '\r\n'
        L = []
        for (key, value) in fields:
            L.append('--' + BOUNDARY)
            L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('--' + BOUNDARY + '--')
        body = CRLF.join(L)
        content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
        return content_type, body
    names = ["Molecular Epidemiology: Distribution and Diversity of Retroviruses (excluding HIV drug resistance)", "HIV Immunology"]
    ct, body = encode_multipart_formdata([(name, name) for name in names])
    environ['wsgi.input'] = StringIO(body)
    environ['REQUEST_METHOD'] = 'POST'
    environ['CONTENT_LENGTH'] = str(len(body))
    environ['CONTENT_TYPE'] = ct
    req = Request(environ)
    print req.content_type
    print req.str_POST
    print req.str_POST
  3. Sergey Schetinin

    I came up with this test case myself:

    from webob import Request
    req = Request.blank('/foo', method='POST', content_type='multipart/form-data; boundary=boundary')
    req.body = '''--boundary
    Content-Disposition: form-data; name=" "
    print req.POST.items()
    print req.POST.items()
  4. Sergey Schetinin

    What was happening was due to the change in body_file and addition of body_file_raw. When the POST was accessed, the request body was always reread and reparsed. The fix suggested in the report itself did correct that behavior and was the "fix" that I had in the local repo that was preventing me from reproducing the problem ( ).

    However the only thing it changed was avoid the reparsing. The wsgi input file was replaced w/ webob.request.FakeCGIBody and the repeated escaping was caused by that parsing output of webob.request._encode_multipart with cgi.FieldStorage.

    The bug itself ultimately belongs to the stdlib cgi module which does not unquote the field names for multipart/form-data POSTs. This went unnoticed until now because browsers are very conservative when escaping 'name' parameter of content-disposion for parts in such submissions. So for a field name of, say "x :%" it would produce name="x :%", which is the same even without the escaping.

    WebOb's _encode_multipart when trying to reproduce the request body would escape much more aggresively and produce name="x%20%3A%25" which, when reparsed would produce the reported problem.

    The fix adds the unquoting when building the MultiDict from the FieldStorage which fixes the issue.

  5. Sergey Schetinin

    We should probably submit the cgi bug to the Python's tracker and once it's fixed avoid double-unquoting the field names in from_fieldstorage for versions of Python that have it fixed.

    Also, for older versions of webob (and I suppose any library that depends on cgi module for form parsing) you can trigger the bug by creating a form with a field that has a doublequote in the name (such as <input name='"' type=hidden>), when parsed, such a name would show up as '%22'.

    Now, does this fix deserve a version bump to 1.0.3?

  6. Chris McDonough reporter

    My personal policy about releases is to never release different code under the same version number, and to never "unrelease" something. I just release a new version and call the old one a brownbag. So I'd say yes, it does deserve a version bump.

  7. Sergey Schetinin

    I agree about never "overwriting" releases, I was just thinking about maybe postponing 1.0.3 a bit. Anyway I'll release 1.0.3 in a minute.

    Regarding not "unreleasing" things, do you think hiding the 1.0.2 release was a bad idea? (I initially thought that the bug was much more severe -- involving both keys, values and all POST requests, not just multipart/form-data ones).

  8. Log in to comment