django-dbgettext / docs / parsing.rst

Full commit
Simon Meers c3b3422 

Simon Meers 2b26430 
Simon Meers c3b3422 

Simon Meers a1136ae 
Simon Meers c3b3422 

Simon Meers ea991df 

Simon Meers c3b3422 

.. _parsing:

Parsing Content

.. _html:


django-dbgettext comes with HTML parsing functionality out of the box, allowing translatable strings to be extracted from fields with HTML content. To translate an field containing HTML, simply include its name in the ``parsed_attributes`` dictionary of the registered ``Options`` (see :ref:`options`), (together with ``dbgettext.lexicons.html.lexicon``).

The ``DBGETTEXT_INLINE_HTML_TAGS`` :ref:`setting <settings>` can be used to define which HTML tags are allowed to appear within translatable strings. E.g.::

    This <b>string</b> is <i>translatable</i> by <u>default</u>.

The ``custom_lexicon_rules`` :ref:`option <options>` allow the HTML parsing algorithm to be customised to suit your needs. For example, the following ```` file allows images to appear as moveable placeholders in translatable strings::

    from dbgettext.registry import registry, Options
    from dbgettext.parser import Token
    from dbgettext.lexicons import html
    from models import Text
    from django.utils.translation import ugettext as _
    class ImageToken(Token):
        """ Allows inline images to be 'translated' as %(image:...)s """
    	def __init__(self, raw, src):
	    super(ImageToken, self).__init__('image', raw)
	    self.src = src

	def is_translatable(self):
	    return Token.MAYBE_TRANSLATE

	def get_key(self):
	    return 'image:%s' % self.src

    class LinkToken(Token):
        """ Allows inline links to be translated as %(link:...)s 
        Also demonstrates Token 'inner translation' features using get_raw
    	and get_gettext to translate within token itself.
	def __init__(self, raw, href, content):
	    super(LinkToken, self).__init__('link', raw)
	    self.href = href
	    self.content = content

	def is_translatable(self):
	    return Token.ALWAYS_TRANSLATE

	def get_raw(self):
	    return '<a href="%s">%s</a>' % (_(self.href), _(self.content))

	def get_gettext(self):
	    return [self.href, self.content]

	def get_key(self):
	    return 'link:%s' % self.content  # should sanitize content first

    class TextOptions(Options):
	parsed_attributes = {'body': html.lexicon}

	def image(scanner, token):
	    return ImageToken(token, scanner.match.groups()[0])

	def link(scanner, token):
	    return LinkToken(token, scanner.match.groups()[0],

	custom_lexicon_rules = [
	    (r'<img[^>]+src="([^"]+)"[^>]*>', image),
	    (r'<a[^>]+href="([^"]+)"[^>]*>([^<]+)</a>', link),

    registry.register(Text, TextOptions)

Subclassing Token

    provide this method if your entire ``Token`` should be be displayed as a placeholder -- e.g. ``%(get_key_output_here)s``

    provide this method if your ``Token`` requires inner translation -- it should return ``self.raw`` with any inner translatable parts already gettexted

    this method should return a list of any translatable strings within your ``Token`` (again, only required for inner translation)

.. _custom_parsing:
Other Parsing?
Not using HTML? Want to parse `markdown <http://>`_ or something exotic instead? Simply register your own lexicon function like the example provided in ```` (having read ```` as well). 
Once you've got something you're happy with, you may wish to consider submitting your file for inclusion in ``dbgettext.lexicons``.