vim modeline support, v1

#97 Declined
Deleted repository
default (cf268444842e)
  1. Michał Górny

(this time on default branch, and with fixed description - today I started to hate mercurial)

  1. added vim_filetypes list to Lexer class, and filled in for most of the lexers (those with corresponding vim syntaxes);

  2. added the filetypes to the tuples in LEXERS. However, I ensure that return value of get_all_lexers() doesn't change so that public API isn't changed;

  3. added a mini-version of vim modeline parser to pygments.modeline;

  4. added get_lexer_by_vim_filetype();

  5. modified guess_lexer() to use the above.

Comments (3)

  1. Tim Hatch

    A few thoughts:

    • I'm not wild about reusing by copy-pasting pymodeline. But at the same time, I don't really want another dependency.
    • I don't think we need full syntax compatibility. Checking for /(?:vi |vim|ex)(?:[<>=]?\d+)?:.*(?:ft|filetype|...)=(\w+)/ should get us most of the way there. The types don't have spaces, so the escaping logic can go away -- we don't care about emulated vim version really either.
    • Most of the vim_filetypes you're setting already exist in the aliases. A couple (gas and pyrex, from a glance back) I think should be incorporated as extra aliases. I haven't checked very thoroughly whether we could do the same for mako and smarty (but vim using those filetypes as html+x would be a gentle indicator that this is what people generally use those engines with).
    • You call splitlines twice, it'd be nice to optimize that a bit more.

    If you're keen with this, I think you can accomplish 95% of the goal with a patch that's only a few tens of lines in a couple of files, which would be much easier to review as well.

    1. Michał Górny author

      Well, the regexp was copy-pasted of laziness. You are correct that we could do with a simpler one, I'm just not sure if it's worth the effort. Especially if it is proven to be wrong someday, then I'll have just to fix one regular expression rather than both.

      Some of the escaping logic still needs to stay, to ensure that some `\:ft=` wouldn't be caught as a real ft. But that's probably a corner case you don't want to care about.

      The filetypes vs aliases, you are probably right. When starting to consider all the types, I wasn't sure how well they fit there. So it's mostly a question whether we want it very cleanse (matching just the right strings), or just working (matching too much).

      And yes, splitlines() is bad there. I guess too much C for me lately.

    2. Michał Górny author

      After reviewing the regexp, I finally got to one, last problem: if someone specifies both 'filetype' and 'syntax', I don't think I can make it prefer 'syntax' over 'filetype'...