Issues

Issue #76 resolved

speed mako up a lot

guest
created an issue

there are too many context.write() in compiled module, it stands for context._buffer_stack[-1].data.append or context._buffer_stack[-1].write, causes a lot of overhead.

make a shortcut for i can reduce the overhead and speed up mako a lot.

Comments (10)

  1. Mike Bayer repo owner

    the patchfile has the same issue as the patch in #78:

    File to patch: lib/mako/codegen.py patching file lib/mako/codegen.py patch: malformed patch at line 43: @@ -455,9 +470,9 @@

    not sure why that is since the #77 patch works.

    Anyway, I'm concerned about nailing "buffer_stack[-1]" into the local namespace since its not apparent to me yet if that value changes at all (also its kind of ugly).

    so see if you can get me a patchfile that works at least and i can play with it.

  2. Mike Bayer repo owner

    thanks. as I expected, because you're nailing to a specific buffer in the context, several unit tests fail:

    z-eeks-Computer:mako classic$ python test/alltests.py /Users/classic/dev/mako/test/babelplugin.py:36: RuntimeWarning: babel not installed: skipping babelplugin test RuntimeWarning, 1) ........................................................................................................................................................F......F......F........................F....................

    FAIL: test_capture_ccall (filters.BufferTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/classic/dev/mako/test/filters.py", line 264, in test_capture_ccall assert flatten_result(t.render()) == "this is foo. body: ccall body" AssertionError

    FAIL: test_def (filters.FilterTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/classic/dev/mako/test/filters.py", line 37, in test_def assert flatten_result(t.render(x="this is x", myfilter=lambda t: "MYFILTER->%s<-MYFILTER" % t)) == "MYFILTER-> this is foo <-MYFILTER" AssertionError

    FAIL: testnonexpression (filters.FilterTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/classic/dev/mako/test/filters.py", line 171, in testnonexpression assert flatten_result(t.render()) == "this is b this is a" AssertionError

    FAIL: test_basic (call.SelfCacheTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/classic/dev/mako/test/call.py", line 424, in test_basic "this is foo" AssertionError

    ---------------------------------------------------------------------- Ran 212 tests in 3.340s

    FAILED (failures=4)

  3. guest reporter

    new patch was added, all the test passed except test_basic (call.SelfCacheTest), which is too magic and should not been recommanded to use.

  4. Mike Bayer repo owner

    I would *greatly* prefer that this patch not dig into context like that. While we aim to be fast, when it starts becoming that the internals are being disassembled within compiled templates its getting out of hand. Theres also a try/except block in it - exception catches are very expensive. And any template that defines a variable or function named "write" will fail. It seems like a lot of complexity for a microscopic speed improvement.

    From my point of view, mako is already running on http://reddit.com and we're plenty fast - bigger speed improvements would probably come from native extensions if anything. There are some speed issues involving the "import='*'" flag in <%namespace> but core Mako performance is excellent (the removal of the unicode() method is more compelling).

    How much does this patch actually speed things up ? Do you have any time-oriented profiling data ?

  5. guest reporter

    I take a real page to profile: http://www.douban.com/

    before applying this patch:

    3436 function calls (3414 primitive calls) in 0.031 CPU seconds

    Ordered by: internal time, call count List reduced from 200 to 40 due to restriction <40>

    ncalls tottime percall cumtime percall filename:lineno(function) 1020 0.002 0.000 0.002 0.000 runtime.py:44(write) 23 0.002 0.000 0.003 0.000 subject.py:700(get_subject) 3 0.002 0.001 0.002 0.001 subject.py:708(get_subjects) 352 0.002 0.000 0.002 0.000 cgi.py:1051(escape) 3 0.001 0.000 0.007 0.002 utils.html.py:354(render_subjects_list) 50 0.001 0.000 0.001 0.000 user.py:934(get_user) 10 0.001 0.000 0.002 0.000 subject.py:200(alt_titles) 352 0.001 0.000 0.003 0.000 filters.py:21(html_escape) 5 0.001 0.000 0.005 0.001 utils.html.py:1122(renderreview_short) 29 0.001 0.000 0.001 0.000 subject.py:214(title_by_language) 3 0.001 0.000 0.002 0.001 chart.py:15(get_popular_subjects) 1 0.000 0.000 0.001 0.001 miniblog.py:48(get_miniblogs) 1 0.000 0.000 0.000 0.000 review.py:276(get_reviews) 252 0.000 0.000 0.000 0.000 subject.py:98(get_attribute) 29 0.000 0.000 0.000 0.000 template.py:139(init) 16 0.000 0.000 0.000 0.000 lookup.py:120(check) 30 0.000 0.000 0.000 0.000 utf_8.py:15(decode) 8 0.000 0.000 0.000 0.000 site_util.ptl:1787(format_mtext)

    after applying this patch:

    2416 function calls (2394 primitive calls) in 0.026 CPU seconds

    Ordered by: internal time, call count List reduced from 199 to 40 due to restriction <40>

    ncalls tottime percall cumtime percall filename:lineno(function) 23 0.002 0.000 0.003 0.000 subject.py:700(get_subject) 352 0.002 0.000 0.002 0.000 cgi.py:1051(escape) 3 0.002 0.001 0.002 0.001 subject.py:708(get_subjects) 50 0.001 0.000 0.001 0.000 user.py:934(get_user) 10 0.001 0.000 0.002 0.000 subject.py:200(alt_titles) 352 0.001 0.000 0.003 0.000 filters.py:20(html_escape) 3 0.001 0.000 0.005 0.002 utils.html.py:361(render_subjects_list) 29 0.001 0.000 0.001 0.000 subject.py:214(title_by_language) 5 0.001 0.000 0.004 0.001 utils.html.py:1151(renderreview_short) 3 0.001 0.000 0.002 0.001 chart.py:15(get_popular_subjects) 1 0.000 0.000 0.000 0.000 miniblog.py:48(get_miniblogs) 1 0.000 0.000 0.000 0.000 review.py:276(get_reviews) 252 0.000 0.000 0.000 0.000 subject.py:98(get_attribute) 16 0.000 0.000 0.000 0.000 lookup.py:121(check) 8 0.000 0.000 0.000 0.000 site_util.ptl:1787(format_mtext) 29 0.000 0.000 0.000 0.000 template.py:138(init) 30 0.000 0.000 0.000 0.000 utf_8.py:15(decode) 1 0.000 0.000 0.000 0.000 cursors.py:270(_do_query) 4 0.000 0.000 0.000 0.000 event.py:505(get_event) 34/32 0.000 0.000 0.001 0.000 runtime.py:201(getattr)

    As we see, it speed up about 15% in whole page rending time (31ms to 26ms). by excluding the data collecting time (outside Mako, about 15ms), it speed up about 30% (16ms to 11ms).

    If the page is more complex, it speed up more.

    If making using FastEncodingBuffer or cStringIO sure, the "try ... except ..." is not neccessay.

    btw, it seems that filter "html_escape" also need some optimization.

  6. guest reporter

    It is clear that Mako is pretty fast, and faster than other Python Template, such as Genshi, but is slower than PTL(Python Template Language), used in Quixote.

    PTL is not a complete template, but it is super fast. Recently we want to replace PTL as Mako, so want to make Mako as fast as PTL.

  7. Mike Bayer repo owner

    OK, can you try the attached patch please ? Same idea, but we leave the `Context` in charge of defining what the "writer" function is and abstraction is maintained. `writer` can also become an instance variable on `Context` instead of a method to reduce more method calls. Also the actual name of the writer is given a non-colliding name.

  8. Log in to comment