turn off unicode

Issue #77 resolved
created an issue

if the input and output are not uincode, then decode and encode cause some overhead, add a choice to turn unicode off could improve the performance a bit.

add a argument in Lookup and Template: ... ,using_unicode = True, ...

when turn off unicode, the compiled module source is saved with the proper charset, and adding

  1. -*- encoding:charset -*-

in head, escape is not needed.

Comments (10)

  1. Michael Bayer repo owner

    hi there -

    im reviewing your patches, thanks for them ! So far this particular one I can't accept:

    - the primary method to turn off the "unicode" conversion step expression matches, which is certainly fairly expensive, is to redefine the default_filter of the template: http://www.makotemplates.org/docs/filtering.html#filtering_expression_defaultfilters

    - the explicit kwargs in Template are to allow checking for valid arguments.

    - the use_unicode flag I don't exactly understand the point of. If it's that you're trying to have a template which contains multibyte characters and you'd like it to go straight through and generate a python file with a "coding" attribute at the top, its not that simple. See #11 for reference. example (fails with the patch, as well as without):

    template = Template("""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »""", input_encoding='utf-8') assert template.render() == """Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""

  2. guest reporter
    • changed status to open

    because strings in the compiled source code are unicode, like u'\xxxx', just removing "unicode" from default_filters does not work, it will causes DecodeError if the data is multibyte string.

    so, strings must stay like in template source code, such as "我们", and add "# -*- encoding:utf-8 -*-" in compiled source code.

    In lexer.py, it try to decode all source code into Unicode, so we need a parameter to turn it off. Then removing "unicode" from default_filters will not cause DecodeError.

    Instead of using Unicode, it must be more complicated, but speeds up a bit. I have used it this way and work fine. If you are interesting in it, I will refine the code and submit it again.

  3. Michael Bayer repo owner

    can you please attach a template file illustrating what you're referring to ? if the idea is just, "unicode is too slow, just pass through utf-8 directly without processing", that historically has not worked with our particular approach (we tried). Like I pointed out in my example, the patch does not work.

  4. guest reporter

    I have updated the patch, and pass all the test cases, including two chinese templates, one using unicode, the other one using utf-8 directly for better performance.

    If unicode is not neccessary, Can Mako turn off unicode at default or no unicode at all?

  5. Michael Bayer repo owner

    this part of the patch:

    @@ -563,7 +566,7 @@ "try:") self.write_source_comment(node) self.printer.writelines( - "context.write(unicode(%s))" % node.attributes['expr'], + "context.write(%s)" % node.attributes['expr'], "finally:", "context.caller_stack.nextcaller = None", None

    should be calling upon the `default_filters` in the way that `visitExpression` does, since a `%call` approximates saying `${foo()}` - so we wouldn't hardcode `unicode()`, but would instead pull from `default_filters`. It's a bug on my part, can you work that in to the patch ?

  6. Michael Bayer repo owner

    ...which would also replace default filters with `[str()]`. The point of the default filter of `unicode()` or `str()` is so that people can say `${5 + 7}` and it renders. It of course can be cleared entirely for performance reasons.

  7. guest reporter

    I have updated the patch:

    add default filters to %call tag.

    replace disable_unicode as "disable_unicode"

    set default_filters as ["str"] while disable_unicode is True.

  8. Michael Bayer repo owner

    thanks. Committed a modified version in d5f83e6918fc188c90afdd01f4adaaa40710a954 which retains identical Mako behavior if the flag is off, which is the default setting for both Template and TemplateLookup. Also added new documentation for this mode. Since not using unicode is against Mako's general philosophy, the docs warn against using this flag unless users are absolutely sure they want it (if anyone reports UnicodeDecode errors with this flag, they're using it wrong and will be urged to stop using it), and it's almost certain that this feature will not be available in the Python 3000 version since Py3K standardizes on unicode strings everywhere.

  9. Log in to comment