Anonymous avatar Anonymous committed 964a285

[svn] Some fixes, add docs for new features.

Comments (0)

Files changed (7)

 before 0.5
 ----------
 
-- add mimetype attributes
 - improve guess_lexer heuristics (esp. for template langs)
 
 - more unit tests
 
-- documentation for new features (guessing)
-
 - goto label HL support for languages that use it
 
-- tell the PHP and DelphiLexer how to differ between Operators and
-  text.
-
 for 0.6
 -------
 
-- allow multiple token types per regex (done, but awkwardly)
 - allow "overlay" token types (e.g. Diff + X) 
   - highlight specials: nth line, a word etc.
   - dhtml: overlays toggleable by javascript
     * tcl
     * (la)tex
 
+- tell the PHP and DelphiLexer how to differ between Operators and
+  text.
+
 - add a `Punctuation` token type for symbols that are not text
   but also not a symbol (blocks in ruby etc)
 
 
 - docstrings?
 
-for 0.7
--------
+for 0.7 / later
+---------------
 
 - moin parser
 
 
     Will raise `ValueError` if no lexer for that filename is found.
 
+def `get_lexer_for_mimetype(mime, **options):`
+    Return a `Lexer` subclass instance that has `mime` in its mimetype
+    list. The lexer is given the `options` at its instantiation.
+
+    Will raise `ValueError` if not lexer for that mimetype is found.
+
+def `guess_lexer(text, **options):`
+    Return a `Lexer` subclass instance that's guessed from the text
+    in `text`. For that, the `analyze_text()` method of every known
+    lexer class is called with the text as argument, and the lexer
+    which returned the highest value will be instantiated and returned.
+
+    `ValueError` is raised if no lexer thinks it can handle the content.
+
+def `guess_lexer_for_filename(text, filename, **options):`
+    As `guess_lexer()`, but only lexers which have a pattern in `filenames`
+    or `alias_filenames` that matches `filename` are taken into consideration.
+    
+    `ValueError` is raised if no lexer thinks it can handle the content.
 
 Functions from `pygments.formatters`:
 
 
     This method must be overridden by subclasses.
 
+def `analyze_text(text):`
+    A static method which is called for lexer guessing. It should analyze
+    the text and return a float in the range from ``0.0`` to ``1.0``.
+    If it returns ``0.0``, the lexer will not be selected as the most
+    probable one, if it returns ``1.0``, it will be selected immediately.
+
 For a list of known tokens have a look at the `Tokens`_ page.
 
 The lexer also recognizes the following attributes that are used by the
     the lexer from a list.
 
 `filenames`
-    A list of `fnmatch` patterns that can be used to find a lexer for
-    a given filename.
+    A list of `fnmatch` patterns that match filenames which contain
+    content for this lexer. The patterns in this list should be unique among
+    all lexers.
+
+`alias_filenames`
+    A list of `fnmatch` patterns that match filenames which may or may not
+    contain content for this lexer. This list is used by the
+    `guess_lexer_for_filename()` function, to determine which lexers are
+    then included in guessing the correct one. That means that e.g. every
+    lexer for HTML and a template language should include ``\*.html`` in
+    this list.
+
+`mimetypes`
+    A list of MIME types for content that can be lexed with this
+    lexer.
 
 
 .. _Tokens: tokens.txt

docs/src/quickstart.txt

 
 .. sourcecode:: pycon
 
-    >>> from pygments.lexers import get_lexer_by_name, get_lexer_for_filename
+    >>> from pygments.lexers import (get_lexer_by_name,
+    ...     get_lexer_for_filename, get_lexer_for_mimetype)
+
     >>> get_lexer_by_name('python')
-    <pygments.lexers.agile.PythonLexer object at 0xb7bd6d0c>
-    >>> get_lexer_for_filename('spam.py')
-    <pygments.lexers.agile.PythonLexer object at 0xb7bd6b2c>
+    <pygments.lexers.PythonLexer>
 
-The same API is available for formatters: use `get_formatter_by_name` and
-`get_formatter_for_filename` from the `pygments.formatters` module
+    >>> get_lexer_for_filename('spam.rb')
+    <pygments.lexers.RubyLexer>
+
+    >>> get_lexer_for_mimetype('text/x-perl')
+    <pygments.lexers.PerlLexer>
+
+All these functions accept keyword arguments; they will be passed to the lexer
+as options.
+
+A similar API is available for formatters: use `get_formatter_by_name()` and
+`get_formatter_for_filename()` from the `pygments.formatters` module
 for this purpose.
 
 
+Guessing lexers
+===============
+
+If you don't know the content of the file, or you want to highlight a file
+whose extension is ambiguous, such as ``.html`` (which could contain plain HTML
+or some template tags), use these functions:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.lexers import guess_lexer, guess_lexer_for_filename
+
+    >>> guess_lexer('#!/usr/bin/python\nprint "Hello World!"')
+    <pygments.lexers.PythonLexer>
+
+    >>> guess_lexer_for_filename('test.py', 'print "Hello World!"')
+    <pygments.lexers.PythonLexer>
+
+`guess_lexer()` passes the given content to the lexer classes' `analyze_text()`
+method and returns the one for which it returns the highest number.
+
+All lexers have two different filename pattern lists: the primary and the
+secondary one. The `get_lexer_for_filename()` function only uses the primary
+list, whose entries are supposed to be unique among all lexers.
+`guess_lexer_for_filename()`, however, will first loop through all lexers and
+look at the primary and secondary filename patterns if the filename matches.
+If only one lexer matches, it is returned, else the guessing mechanism of
+`guess_lexer()` is used with the matching lexers.
+
+As usual, keyword arguments to these functions are given to the created lexer
+as options.    
+
+
 Command line usage
 ==================
 

docs/src/tokens.txt

 of those token aliases, a number of subtypes exists (excluding the special tokens
 `Token.Text`, `Token.Error` and `Token.Other`)
 
+The `is_token_subtype()` function in the `pygments.token` module can be used to
+test if a token type is a subtype of another (such as `Name.Tag` and `Name`).
+
 
 Keyword Tokens
 ==============

pygments/lexer.py

         self.stripall = get_bool_opt(options, 'stripall', False)
         self.tabsize = get_int_opt(options, 'tabsize', 0)
 
+    def __repr__(self):
+        if self.options:
+            return '<pygments.lexers.%s with %r>' % (self.__class__.__name__,
+                                                     self.options)
+        else:
+            return '<pygments.lexers.%s>' % self.__class__.__name__
+
     def analyse_text(text):
         """
         Has to return a float between ``0`` and ``1`` that indicates

pygments/lexers/__init__.py

         _lexer_cache[cls.name] = cls
 
 
-def _iter_lexers():
-    """
-    Returns an iterator over all lexer classes.
-    """
-    for module_name, name, _, _ in LEXERS.itervalues():
-        if name not in _lexer_cache:
-            _load_lexers(module_name)
-        yield _lexer_cache[name]
-    for lexer in find_plugin_lexers():
-        yield lexer
-
-
 def get_lexer_by_name(_alias, **options):
     """
     Get a lexer by an alias.
     raise ValueError('no lexer for mimetype %r found' % _mime)
 
 
+def _iter_lexerclasses():
+    """
+    Returns an iterator over all lexer classes.
+    """
+    for module_name, name, _, _ in LEXERS.itervalues():
+        if name not in _lexer_cache:
+            _load_lexers(module_name)
+        yield _lexer_cache[name]
+    for lexer in find_plugin_lexers():
+        yield lexer
+
+
 def guess_lexer_for_filename(_fn, _text, **options):
     """
     Lookup all lexers that handle those filenames primary (``filenames``)
     fn = basename(_fn)
     primary = None
     matching_lexers = set()
-    for lexer in _iter_lexers():
+    for lexer in _iter_lexerclasses():
         for filename in lexer.filenames:
             if fnmatch.fnmatch(fn, filename):
                 matching_lexers.add(lexer)
     if not matching_lexers:
         raise ValueError('no lexer for filename %r found' % fn)
     if len(matching_lexers) == 1:
-        return iter(matching_lexers).next()
+        return matching_lexers.pop()(**options)
     result = []
     for lexer in matching_lexers:
         rv = lexer.analyse_text(_text)
     #XXX: i (mitsuhiko) would like to drop this function in favor of the
     #     better guess_lexer_for_filename function.
     best_lexer = [0.0, None]
-    for lexer in _iter_lexers():
+    for lexer in _iter_lexerclasses():
         rv = lexer.analyse_text(text)
         if rv == 1.0:
             return lexer(**options)

pygments/token.py

 Generic   = Token.Generic
 
 
+def is_token_subtype(ttype, other):
+    """Return True if ``ttype`` is a subtype of ``other``."""
+    while ttype is not None:
+        if ttype == other:
+            return True
+        ttype = ttype.parent
+    return False
+
+
 # Map standard token types to short names, used in CSS class naming.
 # If you add a new item, please be sure to run this file to perform
 # a consistency check for duplicate values.
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.