Response.write() is O(n^2)

bquinlan_google avatarbquinlan_google created an issue
from timeit import Timer
import sys

from webob import Response
r = Response()
r.write('hello')
print sys.getrefcount(r._body)

init = '''
from webob import Response
r = Response()
r.body
'''

t = Timer("r.write('hello')", init)
print t.timeit(10000)
print t.timeit(20000)
print t.timeit(50000)
# print t.timeit(100000)

t = Timer("r._body += 'hello'", init)
print t.timeit(100000)

t = Timer("a += 'hello'", "a = ''")
print t.timeit(100000)

output:

4
0.358999967575
0.893000125885
3.66100001335
9.33999991417
0.0920000076294

I'm happy to fix this assuming that I can replace _body with a cStringIO for str and StringIO for unicode.

Comments (15)

  1. Sergey Schetinin

    A better solution would be to do (for bytes):

    if self._app_iter is None:
        self._app_iter = [self._body]
    elif not isinstance(self._app_iter, (list, tuple)):
        self._app_iter = list(self._app_iter)
    self._app_iter.append(value)
    

    If you can produce a patch (don't forget some tests for possible app_iter states / types) I would gladly accept.

  2. bquinlan_google

    Attached is a possible fix. It is designed to be as non-invasive as possible though I deliberately changed Response.write(u'foo') raise a TypeError if charset is not set rather than an AttributeError.

    Performance is improved by about 10x for my application. I have increased the cost of switching from Response.body to Response.app_iter but only slightly and I can't really see a use case for this.

    Let me know what you think!

    from timeit import Timer
    import sys
    
    sys.path.insert(0, sys.argv[1])
    
    init = '''
    from webob import Response
    r = Response()
    '''
    
    test1 = '''
    r.write('hello')
    '''
    
    test2 = '''
    r.write(u'hello')
    '''
    
    test3 = '''
    r.body
    r.body = 'hello'
    r.app_iter
    r.app_iter = ['world!']
    '''
    
    t1 = Timer(test1, init)
    print t1.timeit(100000)
    
    t2 = Timer(test2, init)
    print t2.timeit(100000)
    
    t3 = Timer(test3, init)
    print t3.timeit(100000)
    
    % python webob_timing.py webob-pristine/
    6.37901306152
    120.564186096
    1.67405509949
    % python webob_timing.py webob-patched/
    0.440033912659
    1.56671905518
    1.79112792015
    
  3. Sergey Schetinin

    There are some issues with that diff.

    This part in write is incorrect:

    if self._body is None:
        self.body = text
    

    the _app_iter and _body are alternative representations, so if _body is None that means _app_iter holds something and should not be discarded.

    Also, the .write(..) interface is secondary to the .body / .app_iter interface, and I think it's wrong to pivot the implementation around it.

    However, I just got an idea how to simplify all that code which I'll commit in a minute.

  4. bquinlan_google

    I don't understand your point about this being incorrect:

    if self._body is None:
        self.body = text
    

    In the original code, body is always assigned to as well:

        def write(self, text):
            if isinstance(text, unicode):
                self.text += text
            else:
                self.body += text
    

    The .write(...) interface is secondary in favour of .body? Is the user expected to collect up their modifications before setting .body?

  5. bquinlan_google

    I'm not sure about the desired semantics for Response.app_iter:

        def _app_iter__get(self):
            """..."""
            if self._app_iter is None:
                if self._body is None:
                    raise AttributeError("No body or app_iter has been set")
                return [self._body]
            else:
                return self._app_iter
    

    Depending on the code path taken, the returned list will either be attached to the Response or not. Is that the desired behavior?

  6. Sergey Schetinin

    In the original code self.body is not just assigned, its +='d

    The user can use any of the ways provided to generate the body. One way is to set the body to a string. Another is to use app_iter to generate response body on-demand, the .write one is a valid way to generate the body, but is the least used of them all.

    What part of app_iter semantics seems off? Why would an app_iter need to be attached to the response?

  7. bquinlan_google

    In the original code self.body is not just assigned, its +='d

    And that provokes Response._bodyset, the same as my "self.body = text" code.

    The user can use any of the ways provided to generate the body. One way is to set the body to a string. Another is to use app_iter to generate response body on-demand, the .write one is a valid way to generate the body, but is the least used of them all. What part of app_iter semantics seems off? Why would an app_iter need to be attached to the response?

    I mean that the user probably shouldn't write Response.app_iter.append(...) because, in some cases, the returned list isn't attached to the response. So the use case that you are optimizing for is the user does a single assignment to .body or to .app_iter?

  8. Sergey Schetinin

    And that provokes Response._bodyset, the same as my "self.body = text" code.

    It doesn't just discard whatever was in app_iter though, which is my point.

    The user must never try to alter app_iter, correct. It's not a matter of attached / unattached. app_iter can be any kind of iterable, mutable or not, it should not be attempted to be modified. Please see WSGI spec to understand what app_iter is.

  9. bquinlan_google

    It doesn't just discard whatever was in app_iter though, which is my point.

    How does it not? The last line of (the original version of _body__set is "self._app_iter = None".

    The user must never try to alter app_iter, correct. It's not a matter of attached / unattached. app_iter can be any kind of iterable, mutable or not, it should not be attempted to be modified.

    Right, so essentially the current interface does not provide any reasonable way to accumulate changes (using .app_iter is incorrect and calling .body = /.write(...) is O(n)). My change provides such an interface ;-)

  10. Sergey Schetinin

    self.body += ... first reads .body, which in turn reads app_iter and converts it into a string. Try this code with the original code and with your patched version:

    res = Response(app_iter=['a'])
    res.write('b')
    assert res.body == 'ab'
    
  11. Sergey Schetinin

    Using app_iter in the way you've mentioned is incorrect, that doesn't mean using app_iter as a way to accumulate changes is incorrect. There are more alternatives even. You can collect your response in a temp file and then set res.body_file to that etc.

    Anyway, as I said, I have a different patch that will also make .write(..) more efficient.

  12. bquinlan_google

    No worries, I didn't really care about my patch, I'm just happy that this got fixed!

    I can confirm that this change makes .write() O(1) and it is about twice as slow as my version i.e.

    % python webob_timing.py webob-head/
    0.898704051971
    2.08412981033
    1.0757651329
    % python webob_timing.py webob-faster_write_2
    0.448057889938
    1.56829214096
    1.81236720085
    

    In conclusion, looks great, thanks!

  13. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.