Source

pygments-main / docs / src / api.txt

.. -*- mode: rst -*-

=====================
The full Pygments API
=====================

This page describes the Pygments API.

High-level API
==============

Functions from the `pygments` module:

def `lex(code, lexer):`
    Lex `code` with the `lexer` (must be a `Lexer` instance)
    and return an iterable of tokens. Currently, this only calls
    `lexer.get_tokens()`.

def `format(tokens, formatter, outfile=None):`
    Format a token stream (iterable of tokens) `tokens` with the
    `formatter` (must be a `Formatter` instance). The result is
    written to `outfile`, or if that is ``None``, returned as a
    string.

def `highlight(code, lexer, formatter, outfile=None):`
    This is the most high-level highlighting function.
    It combines `lex` and `format` in one function.


Functions from `pygments.lexers`:

def `get_lexer_by_name(alias, **options):`
    Return an instance of a `Lexer` subclass that has `alias` in its
    aliases list. The lexer is given the `options` at its
    instantiation.

    Will raise `pygments.util.ClassNotFound` if no lexer with that alias is
    found.

def `get_lexer_for_filename(fn, **options):`
    Return a `Lexer` subclass instance that has a filename pattern
    matching `fn`. The lexer is given the `options` at its
    instantiation.

    Will raise `pygments.util.ClassNotFound` if no lexer for that filename is
    found.

def `get_lexer_for_mimetype(mime, **options):`
    Return a `Lexer` subclass instance that has `mime` in its mimetype
    list. The lexer is given the `options` at its instantiation.

    Will raise `pygments.util.ClassNotFound` if not lexer for that mimetype is
    found.

def `guess_lexer(text, **options):`
    Return a `Lexer` subclass instance that's guessed from the text
    in `text`. For that, the `analyse_text()` method of every known
    lexer class is called with the text as argument, and the lexer
    which returned the highest value will be instantiated and returned.

    `pygments.util.ClassNotFound` is raised if no lexer thinks it can handle the
    content.

def `guess_lexer_for_filename(filename, text, **options):`
    As `guess_lexer()`, but only lexers which have a pattern in `filenames`
    or `alias_filenames` that matches `filename` are taken into consideration.

    `pygments.util.ClassNotFound` is raised if no lexer thinks it can handle the
    content.

def `get_all_lexers():`
    Return an iterable over all registered lexers, yielding tuples in the
    format::

    	(longname, tuple of aliases, tuple of filename patterns, tuple of mimetypes)

    *New in Pygments 0.6.*


Functions from `pygments.formatters`:

def `get_formatter_by_name(alias, **options):`
    Return an instance of a `Formatter` subclass that has `alias` in its
    aliases list. The formatter is given the `options` at its
    instantiation.

    Will raise `pygments.util.ClassNotFound` if no formatter with that alias is
    found.

def `get_formatter_for_filename(fn, **options):`
    Return a `Formatter` subclass instance that has a filename pattern
    matching `fn`. The formatter is given the `options` at its
    instantiation.

    Will raise `pygments.util.ClassNotFound` if no formatter for that filename
    is found.


Functions from `pygments.styles`:

def `get_style_by_name(name):`
    Return a style class by its short name. The names of the builtin styles
    are listed in `pygments.styles.STYLE_MAP`.

    Will raise `pygments.util.ClassNotFound` if no style of that name is found.

def `get_all_styles():`
    Return an iterable over all registered styles, yielding their names.

    *New in Pygments 0.6.*


Lexers
======

A lexer (derived from `pygments.lexer.Lexer`) has the following functions:

def `__init__(self, **options):`
    The constructor. Takes a \*\*keywords dictionary of options.
    Every subclass must first process its own options and then call
    the `Lexer` constructor, since it processes the `stripnl`,
    `stripall` and `tabsize` options.

    An example looks like this:

    .. sourcecode:: python

        def __init__(self, **options):
            self.compress = options.get('compress', '')
            Lexer.__init__(self, **options)

    As these options must all be specifiable as strings (due to the
    command line usage), there are various utility functions
    available to help with that, see `Option processing`_.

def `get_tokens(self, text):`
    This method is the basic interface of a lexer. It is called by
    the `highlight()` function. It must process the text and return an
    iterable of ``(tokentype, value)`` pairs from `text`.

    Normally, you don't need to override this method. The default
    implementation processes the `stripnl`, `stripall` and `tabsize`
    options and then yields all tokens from `get_tokens_unprocessed()`,
    with the ``index`` dropped.

def `get_tokens_unprocessed(self, text):`
    This method should process the text and return an iterable of
    ``(index, tokentype, value)`` tuples where ``index`` is the starting
    position of the token within the input text.

    This method must be overridden by subclasses.

def `analyse_text(text):`
    A static method which is called for lexer guessing. It should analyse
    the text and return a float in the range from ``0.0`` to ``1.0``.
    If it returns ``0.0``, the lexer will not be selected as the most
    probable one, if it returns ``1.0``, it will be selected immediately.

For a list of known tokens have a look at the `Tokens`_ page.

A lexer also can have the following attributes (in fact, they are mandatory
except `alias_filenames`) that are used by the builtin lookup mechanism.

`name`
    Full name for the lexer, in human-readable form.

`aliases`
    A list of short, unique identifiers that can be used to lookup
    the lexer from a list, e.g. using `get_lexer_by_name()`.

`filenames`
    A list of `fnmatch` patterns that match filenames which contain
    content for this lexer. The patterns in this list should be unique among
    all lexers.

`alias_filenames`
    A list of `fnmatch` patterns that match filenames which may or may not
    contain content for this lexer. This list is used by the
    `guess_lexer_for_filename()` function, to determine which lexers are
    then included in guessing the correct one. That means that e.g. every
    lexer for HTML and a template language should include ``\*.html`` in
    this list.

`mimetypes`
    A list of MIME types for content that can be lexed with this
    lexer.


.. _Tokens: tokens.txt


Formatters
==========

A formatter (derived from `pygments.formatter.Formatter`) has the following
functions:

def `__init__(self, **options):`
    As with lexers, this constructor processes options and then must call
    the base class `__init__`.

    The `Formatter` class recognizes the options `style`, `full` and
    `title`. It is up to the formatter class whether it uses them.

def `get_style_defs(self, arg=''):`
    This method must return statements or declarations suitable to define
    the current style for subsequent highlighted text (e.g. CSS classes
    in the `HTMLFormatter`).

    The optional argument `arg` can be used to modify the generation and
    is formatter dependent (it is standardized because it can be given on
    the command line).

    This method is called by the ``-S`` `command-line option`_, the `arg`
    is then given by the ``-a`` option.

def `format(self, tokensource, outfile):`
    This method must format the tokens from the `tokensource` iterable and
    write the formatted version to the file object `outfile`.

    Formatter options can control how exactly the tokens are converted.

.. _command-line option: cmdline.txt

A formatter must have the following attributes that are used by the
builtin lookup mechanism. (*New in Pygments 0.7.*)

`name`
    Full name for the formatter, in human-readable form.

`aliases`
    A list of short, unique identifiers that can be used to lookup
    the formatter from a list, e.g. using `get_formatter_by_name()`.

`filenames`
    A list of `fnmatch` patterns that match filenames for which this formatter
    can produce output. The patterns in this list should be unique among
    all formatters.


Option processing
=================

The `pygments.util` module has some utility functions usable for option
processing:

class `OptionError`
    This exception will be raised by all option processing functions if
    the type or value of the argument is not correct.

def `get_bool_opt(options, optname, default=None):`
    Interpret the key `optname` from the dictionary `options`
    as a boolean and return it. Return `default` if `optname`
    is not in `options`.

    The valid string values for ``True`` are ``1``, ``yes``,
    ``true`` and ``on``, the ones for ``False`` are ``0``,
    ``no``, ``false`` and ``off`` (matched case-insensitively).

def `get_int_opt(options, optname, default=None):`
    As `get_bool_opt`, but interpret the value as an integer.

def `get_list_opt(options, optname, default=None):`
    If the key `optname` from the dictionary `options` is a string,
    split it at whitespace and return it. If it is already a list
    or a tuple, it is returned as a list.

def `get_choice_opt(options, optname, allowed, default=None):`
    If the key `optname` from the dictionary is not in the sequence
    `allowed`, raise an error, otherwise return it. *New in Pygments 0.8.*