Commits

Mike Bayer  committed d5f83e6

- added "bytestring passthru" mode, via `disable_unicode=True`
argument passed to Template or TemplateLookup. All
unicode-awareness and filtering is turned off, and template
modules are generated with the appropriate magic encoding
comment. In this mode, template expressions can only
receive raw bytestrings or Unicode objects which represent
straight ASCII, and render_unicode() may not be used.
[ticket:77] (courtesy anonymous guest)

  • Participants
  • Parent commits 1cb9f21

Comments (0)

Files changed (13)

   directly to the %page or %def, i.e.
   <%def name="foo(x)" cached="True" cache_key="${x}"/>
   [ticket:78]
-  
+- added "bytestring passthru" mode, via `disable_unicode=True`
+  argument passed to Template or TemplateLookup.  All
+  unicode-awareness and filtering is turned off, and template 
+  modules are generated with the appropriate magic encoding
+  comment.  In this mode, template expressions can only
+  receive raw bytestrings or Unicode objects which represent
+  straight ASCII, and render_unicode() may not be used. 
+  [ticket:77]  (courtesy anonymous guest)
+
 0.1.10
 - fixed propagation of 'caller' such that nested %def calls
   within a <%call> tag's argument list propigates 'caller'

File doc/build/content/filtering.txt

 
 **New in version 0.1.2**
 
-In addition to the `expression_filter` argument, the `default_filters` argument to both `Template` and `TemplateLookup` can specify filtering for all expression tags at the programmatic level.  This array-based argument defaults to `["unicode"]`:
+In addition to the `expression_filter` argument, the `default_filters` argument to both `Template` and `TemplateLookup` can specify filtering for all expression tags at the programmatic level.  This array-based argument, when given its default argument of `None`, will be internally set to `["unicode"]`, except when `disable_unicode=True` is set in which case it defaults to `["str"]`:
 
     {python}
     t = TemplateLookup(directories=['/tmp'], default_filters=['unicode'])
 
     {python}
     t = TemplateLookup(directories=['/tmp'], default_filters=['decode.utf8'])
+
+To disable `default_filters` entirely, set it to an empty list:
+
+    {python}
+    t = TemplateLookup(directories=['/tmp'], default_filters=[])
     
 Any string name can be added to `default_filters` where it will be added to all expressions as a filter.  The filters are applied from left to right, meaning the leftmost filter is applied first.
 

File doc/build/content/unicode.txt

 
 There is one operation that Python *can* do with a non-ascii bytestring, and its a great source of confusion:  it can dump the bytestring straight out to a stream or a file, with nary a care what the encoding is.  To Python, this is pretty much like dumping any other kind of binary data (like an image) to a stream somewhere.  So in a lot of cases, programs that embed all kinds of international characters and encodings into plain byte-strings (i.e. using `"hello world"` style literals) can fly right through their run, sending reams of strings out to whereever they are going, and the programmer, seeing the same output as was expressed in the input, is now under the illusion that his or her program is Unicode-compliant.  In fact, their program has no unicode awareness whatsoever, and similarly has no ability to interact with libraries that *are* unicode aware.
 
-Particularly, some template languages like Cheetah, as well as earlier versions of Myghty, will treat expressions in this manner..they just go right through.  Theres nothing "incorrect" about this, but Mako, since it deals with unicode internally, usually requires explicitness when dealing with non-ascii encodings.  Additionally, if you ever need to handle unicode strings and other kinds of encoding conversions more intelligently, the usage of raw bytestrings quickly becomes a nightmare, since you are sending the Python interpreter collections of bytes for which it can make no intelligent decisions with regards to encoding.
+The "pass through encoded data" scheme is what template languages like Cheetah and earlier versions of Myghty do by default.  Mako as of version 0.2 also supports this mode of operation using the "disable_unicode=True" flag.  However, when using Mako in its default mode of unicode-aware, it requires explicitness when dealing with non-ascii encodings.  Additionally, if you ever need to handle unicode strings and other kinds of encoding conversions more intelligently, the usage of raw bytestrings quickly becomes a nightmare, since you are sending the Python interpreter collections of bytes for which it can make no intelligent decisions with regards to encoding.
 
-In Mako, all parsed template constructs and output streams are handled internally as Python `unicode` objects.  Its only at the point of `render()` that this unicode stream is rendered into whatever the desired output encoding is.  The implication here is that the template developer must ensure that the encoding of all non-ascii templates is explicit, that all non-ascii-encoded expressions are in one way or another converted to unicode, and that the output stream of the template is handled as a unicode stream being encoded to some encoding.
+In normal Mako operation, all parsed template constructs and output streams are handled internally as Python `unicode` objects.  Its only at the point of `render()` that this unicode stream is rendered into whatever the desired output encoding is.  The implication here is that the template developer must ensure that the encoding of all non-ascii templates is explicit, that all non-ascii-encoded expressions are in one way or another converted to unicode, and that the output stream of the template is handled as a unicode stream being encoded to some encoding.
 
 ### Specifying the Encoding of a Template File
 
 
 When calling `render()` on a template that does not specify any output encoding (i.e. its `ascii`), Python's `cStringIO` module, which cannot handle encoding of non-ascii `unicode` objects (even though it can send raw bytestrings through), is used for buffering.  Otherwise, a custom Mako class called `FastEncodingBuffer` is used, which essentially is a super dumbed-down version of `StringIO` that gathers all strings into a list and uses `u''.join(elements)` to produce the final output - its markedly faster than `StringIO`.
 
+### Saying to Heck with it:  Disabling the usage of Unicode entirely
+
+Some segements of Mako's userbase choose to make no usage of Unicode whatsoever, and instead would prefer the "passthru" approach; all string expressions in their templates return encoded bytestrings, and they would like these strings to pass right through.   The generated template module is also in the same encoding as the template and additionally carries Python's "magic encoding comment" at the top.   The only advantage to this approach is that templates need not use `u""` for literal strings; there's an arguable speed improvement as well since raw bytestrings generally perform slightly faster than unicode objects in Python.  For these users, they will have to get used to using Unicode when Python 3000 becomes the standard, but for now they can hit the `disable_unicode=True` flag, introduced in version 0.2 of Mako, as so:
+
+    {python}
+    # -*- encoding:utf-8 -*-
+    from mako.template import Template
+    
+    t = Template("drôle de petit voix m’a réveillé.", disable_unicode=True, input_encoding='utf-8')
+    print t.code
+    
+The generated module source code will contain elements like these:
+
+    {python}
+    # -*- encoding:utf-8 -*-
+    #  ...more generated code ...
+
+
+    def render_body(context,**pageargs):
+        context.caller_stack.push_frame()
+        try:
+            __M_locals = dict(pageargs=pageargs)
+            # SOURCE LINE 1
+            context.write('dr\xc3\xb4le de petit voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.')
+            return ''
+        finally:
+            context.caller_stack.pop_frame()
+
+Where above you can see that the `encoding` magic source comment is at the top, and the string literal used within `context.write` is a regular bytestring. 
+
+When `disable_unicode=True` is turned on, the `default_filters` argument which normally defaults to `["unicode"]` now defaults to `["str"]` instead.  Setting default_filters to the empty list `[]` can remove the overhead of the `str` call.  Also, in this mode you **cannot** safely call `render_unicode()` - you'll get unicode/decode errors.
+
+**Rules for using disable_unicode=True**
+
+ * don't use this mode unless you really, really want to and you absolutely understand what you're doing
+ * don't use this option just because you don't want to learn to use Unicode properly; we aren't supporting user issues in this mode of operation.  We will however offer generous help for the vast majority of users who stick to the Unicode program.
+ * it's extremely unlikely this mode of operation will be present in the Python 3000 version of Mako since P3K strings are unicode objects by default; bytestrings are relegated to a "bytes" type that is not intended for dealing with text.
+    
+

File doc/build/genhtml.py

     ]
 
 title='Mako Documentation'
-version = '0.1.10'
+version = '0.2.0'
 
 root = toc.TOCElement('', 'root', '', version=version, doctitle=title)
 

File lib/mako/codegen.py

 MAGIC_NUMBER = 2
 
 
-def compile(node, uri, filename=None, default_filters=None, buffer_filters=None, imports=None, source_encoding=None):
+def compile(node, uri, filename=None, default_filters=None, buffer_filters=None, imports=None, source_encoding=None, generate_unicode=True):
     """generate module source code given a parsetree node, uri, and optional source filename"""
-    buf = util.FastEncodingBuffer()
+    
+    if generate_unicode:
+        buf = util.FastEncodingBuffer()  # creates Unicode
+    else:
+        buf = util.StringIO()  # returns whatever was passed in
+
     printer = PythonPrinter(buf)
-    _GenerateRenderMethod(printer, _CompileContext(uri, filename, default_filters, buffer_filters, imports, source_encoding), node)
+    _GenerateRenderMethod(printer, _CompileContext(uri, filename, default_filters, buffer_filters, imports, source_encoding, generate_unicode), node)
     return buf.getvalue()
 
 class _CompileContext(object):
-    def __init__(self, uri, filename, default_filters, buffer_filters, imports, source_encoding):
+    def __init__(self, uri, filename, default_filters, buffer_filters, imports, source_encoding, generate_unicode):
         self.uri = uri
         self.filename = filename
         self.default_filters = default_filters
         self.buffer_filters = buffer_filters
         self.imports = imports
         self.source_encoding = source_encoding
+        self.generate_unicode = generate_unicode
         
 class _GenerateRenderMethod(object):
     """a template visitor object which generates the full module source for a template."""
         module_identifiers.declared = module_ident
         
         # module-level names, python code
+        if not self.compiler.generate_unicode and self.compiler.source_encoding:
+            self.printer.writeline("# -*- encoding:%s -*-" % self.compiler.source_encoding)
+            
         self.printer.writeline("from mako import runtime, filters, cache")
         self.printer.writeline("UNDEFINED = runtime.UNDEFINED")
         self.printer.writeline("_magic_number = %s" % repr(MAGIC_NUMBER))
         )
         if buffered or filtered or cached:
             self.printer.writeline("context.push_buffer()")
-
+        
         self.identifier_stack.append(self.compiler.identifiers.branch(self.node))
         if not self.in_def and '**pageargs' in args:
             self.identifier_stack[-1].argument_declared.add('pageargs')
         def locate_encode(name):
             if re.match(r'decode\..+', name):
                 return "filters." + name
-            elif name == 'unicode':
-                return 'unicode'
             else:
                 return filters.DEFAULT_ESCAPES.get(name, name)
         
             )
         self.write_variable_declares(body_identifiers)
         self.identifier_stack.append(body_identifiers)
+        
         for n in node.nodes:
             n.accept_visitor(self)
         self.identifier_stack.pop()
             "try:")
         self.write_source_comment(node)
         self.printer.writelines(
-                "context.write(unicode(%s))" % node.attributes['expr'],
+                "context.write(%s)" % self.create_filter_callable([], node.attributes['expr'], True),
             "finally:",
                 "context.caller_stack.nextcaller = None",
             None

File lib/mako/filters.py

     'entity':'filters.html_entities_escape',
     'unicode':'unicode',
     'decode':'decode',
+    'str':'str',
     'n':'n'
 }
     

File lib/mako/lexer.py

 _regexp_cache = {}
 
 class Lexer(object):
-    def __init__(self, text, filename=None, input_encoding=None, preprocessor=None):
+    def __init__(self, text, filename=None, disable_unicode=False, input_encoding=None, preprocessor=None):
         self.text = text
         self.filename = filename
         self.template = parsetree.TemplateNode(self.filename)
         self.match_position = 0
         self.tag = []
         self.control_line = []
+        self.disable_unicode = disable_unicode
         self.encoding = input_encoding
         if preprocessor is None:
             self.preprocessor = []
                 raise exceptions.SyntaxException("Keyword '%s' not a legal ternary for keyword '%s'" % (node.keyword, self.control_line[-1].keyword), **self.exception_kwargs)
 
     def escape_code(self, text):
-        if self.encoding:
+        if not self.disable_unicode and self.encoding:
             return text.encode('ascii', 'backslashreplace')
         else:
             return text
             parsed_encoding = self.match_encoding()
         if parsed_encoding:
             self.encoding = parsed_encoding
-        if not isinstance(self.text, unicode):
-
+        if not self.disable_unicode and not isinstance(self.text, unicode):
             if self.encoding:
                 try:
                     self.text = self.text.decode(self.encoding)

File lib/mako/lookup.py

         
 class TemplateLookup(TemplateCollection):
     def __init__(self, directories=None, module_directory=None, filesystem_checks=True, collection_size=-1, format_exceptions=False, 
-    error_handler=None, output_encoding=None, encoding_errors='strict', cache_type=None, cache_dir=None, cache_url=None, 
-    modulename_callable=None, default_filters=['unicode'], buffer_filters=[], imports=None, input_encoding=None, preprocessor=None):
+    error_handler=None, disable_unicode=False, output_encoding=None, encoding_errors='strict', cache_type=None, cache_dir=None, cache_url=None, 
+    modulename_callable=None, default_filters=None, buffer_filters=[], imports=None, input_encoding=None, preprocessor=None):
         if isinstance(directories, basestring):
             directories = [directories]        
         self.directories = [posixpath.normpath(d) for d in directories or []]
         self.modulename_callable = modulename_callable
         self.filesystem_checks = filesystem_checks
         self.collection_size = collection_size
-        self.template_args = {'format_exceptions':format_exceptions, 'error_handler':error_handler, 'output_encoding':output_encoding, 'encoding_errors':encoding_errors, 'input_encoding':input_encoding, 'module_directory':module_directory, 'cache_type':cache_type, 'cache_dir':cache_dir or module_directory, 'cache_url':cache_url, 'default_filters':default_filters, 'buffer_filters':buffer_filters,  'imports':imports, 'preprocessor':preprocessor}
+        self.template_args = {'format_exceptions':format_exceptions, 'error_handler':error_handler, 'disable_unicode':disable_unicode, 'output_encoding':output_encoding, 'encoding_errors':encoding_errors, 'input_encoding':input_encoding, 'module_directory':module_directory, 'cache_type':cache_type, 'cache_dir':cache_dir or module_directory, 'cache_url':cache_url, 'default_filters':default_filters, 'buffer_filters':buffer_filters,  'imports':imports, 'preprocessor':preprocessor}
         if collection_size == -1:
             self.__collection = {}
             self._uri_cache = {}

File lib/mako/runtime.py

     """create a Context and return the string output of the given template and template callable."""
     if as_unicode:
         buf = util.FastEncodingBuffer()
-    elif template.output_encoding:
+    elif not template.disable_unicode and template.output_encoding:
         buf = util.FastEncodingBuffer(template.output_encoding, template.encoding_errors)
     else:
         buf = util.StringIO()

File lib/mako/template.py

     """a compiled template"""
     def __init__(self, text=None, filename=None, uri=None, format_exceptions=False, error_handler=None, 
         lookup=None, output_encoding=None, encoding_errors='strict', module_directory=None, cache_type=None, 
-        cache_dir=None, cache_url=None, module_filename=None, input_encoding=None, default_filters=['unicode'], 
+        cache_dir=None, cache_url=None, module_filename=None, input_encoding=None, disable_unicode=False, default_filters=None, 
         buffer_filters=[], imports=None, preprocessor=None):
         """construct a new Template instance using either literal template text, or a previously loaded template module
         
             self.module_id = "memory:" + hex(id(self))
             self.uri = self.module_id
         
-        self.default_filters = default_filters
-        self.buffer_filters = buffer_filters
         self.input_encoding = input_encoding
+        self.output_encoding = output_encoding
+        self.encoding_errors = encoding_errors
+        self.disable_unicode = disable_unicode
+        if default_filters is None:
+            if self.disable_unicode:
+                self.default_filters = ['str']
+            else:
+                self.default_filters = ['unicode']
+        else:
+            self.default_filters = default_filters
+        self.buffer_filters = buffer_filters
+            
         self.imports = imports
         self.preprocessor = preprocessor
         
         self.format_exceptions = format_exceptions
         self.error_handler = error_handler
         self.lookup = lookup
-        self.output_encoding = output_encoding
-        self.encoding_errors = encoding_errors
         self.cache_type = cache_type
         self.cache_dir = cache_dir
         self.cache_url = cache_url
         self.buffer_filters = parent.buffer_filters
         self.input_encoding = parent.input_encoding
         self.imports = parent.imports
+        self.disable_unicode = parent.disable_unicode
         self.output_encoding = parent.output_encoding
         self.encoding_errors = parent.encoding_errors
         self.format_exceptions = parent.format_exceptions
         
 def _compile_text(template, text, filename):
     identifier = template.module_id
-    lexer = Lexer(text, filename, input_encoding=template.input_encoding, preprocessor=template.preprocessor)
+    lexer = Lexer(text, filename, disable_unicode=template.disable_unicode, input_encoding=template.input_encoding, preprocessor=template.preprocessor)
     node = lexer.parse()
-    source = codegen.compile(node, template.uri, filename, default_filters=template.default_filters, buffer_filters=template.buffer_filters, imports=template.imports, source_encoding=lexer.encoding)
+    source = codegen.compile(node, template.uri, filename, default_filters=template.default_filters, buffer_filters=template.buffer_filters, imports=template.imports, source_encoding=lexer.encoding, generate_unicode=not template.disable_unicode)
     #print source
     cid = identifier
     module = imp.new_module(cid)
 
 def _compile_module_file(template, text, filename, outputpath):
     identifier = template.module_id
-    lexer = Lexer(text, filename, input_encoding=template.input_encoding, preprocessor=template.preprocessor)
+    lexer = Lexer(text, filename, disable_unicode=template.disable_unicode, input_encoding=template.input_encoding, preprocessor=template.preprocessor)
     node = lexer.parse()
-    source = codegen.compile(node, template.uri, filename, default_filters=template.default_filters, buffer_filters=template.buffer_filters, imports=template.imports, source_encoding=lexer.encoding)
+    source = codegen.compile(node, template.uri, filename, default_filters=template.default_filters, buffer_filters=template.buffer_filters, imports=template.imports, source_encoding=lexer.encoding, generate_unicode=not template.disable_unicode)
     (dest, name) = tempfile.mkstemp()
     os.write(dest, source)
     os.close(dest)

File test/template.py

         template = Template(filename='./test_htdocs/unicode.html', module_directory='./test_htdocs')
         assert template.render_unicode() == u"""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
 
+    def test_unicode_file_lookup(self):
+        lookup = TemplateLookup(directories=['./test_htdocs'], output_encoding='utf-8', default_filters=['decode.utf8'])
+        template = lookup.get_template('/chs_unicode.html')
+        assert flatten_result(template.render(name='毛泽东')) == '毛泽东 是 新中国的主席<br/> Welcome 你 to 北京.'
+
     def test_unicode_bom(self):
         template = Template(filename='./test_htdocs/bom.html', module_directory='./test_htdocs')
         assert template.render_unicode() == u"""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
         val = u"""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
         val = "## coding: utf-8\n" + val.encode('utf-8')
         template = Template(val)
-        #print template.code
+        assert isinstance(template.code, unicode)
         assert template.render_unicode() == u"""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
     
     def test_unicode_text(self):
         lookup = TemplateLookup(directories=['./test_htdocs'], filesystem_checks=True, output_encoding='utf-8')
         template = lookup.get_template('/read_unicode.html')
         data = template.render(path=os.path.join('./test_htdocs', 'internationalization.html'))
-        
+
+    def test_bytestring_passthru(self):
+        lookup = TemplateLookup(directories=['./test_htdocs'], default_filters=[], disable_unicode=True)
+        template = lookup.get_template('/chs_utf8.html')
+        self.assertEquals(flatten_result(template.render(name='毛泽东')), '毛泽东 是 新中国的主席<br/> Welcome 你 to 北京.')
+
+        lookup = TemplateLookup(directories=['./test_htdocs'], disable_unicode=True)
+        template = lookup.get_template('/chs_utf8.html')
+        self.assertEquals(flatten_result(template.render(name='毛泽东')), '毛泽东 是 新中国的主席<br/> Welcome 你 to 北京.')
+        self.assertRaises(UnicodeDecodeError, template.render_unicode, name='毛泽东')
+
+        template = Template("""${'Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »'}""", disable_unicode=True, input_encoding='utf-8')
+        assert template.render() == """Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
+        template = Template("""${'Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »'}""", input_encoding='utf8', output_encoding='utf8', disable_unicode=False, default_filters=[])
+        self.assertRaises(UnicodeDecodeError, template.render)  # raises because expression contains an encoded bytestring which cannot be decoded
+
 
 class PageArgsTest(unittest.TestCase):
     def test_basic(self):

File test_htdocs/chs_unicode.html

+## -*- encoding:utf8 -*-
+<%
+ msg = u'新中国的主席'
+%>
+
+<%def name="welcome(who, place=u'北京')">
+Welcome ${who} to ${place}.
+</%def>
+
+${name} 是 ${msg}<br/>
+${welcome(u'你')}

File test_htdocs/chs_utf8.html

+## -*- encoding:utf8 -*-
+<%
+ msg = '新中国的主席'
+%>
+
+<%def name="welcome(who, place='北京')">
+Welcome ${who} to ${place}.
+</%def>
+
+${name} 是 ${msg}<br/>
+${welcome('你')}