1. Georg Brandl
  2. pygments-main
Issue #444 resolved

haXe Lexer

Anonymous created an issue

I've whipped together a lexer for haxe (http://haxe.org/).

I'm sure there may be some haxe constructs which aren't handled by the lexer, and I'm sure there might be some bugs. The lexer is working for a large majority of the language though.

You can see the lexer in action here: http://www.evaryu.com/hg/hxalgo/rev/381004a0c856

A good sample would be - Debug.hx: http://www.evaryu.com/hg/hxalgo/file/381004a0c856/src/util/Debug.hx - OrderedMap.hx: http://www.evaryu.com/hg/hxalgo/file/381004a0c856/src/util/OrderedMap.hx - Throwable.hx: http://www.evaryu.com/hg/hxalgo/file/381004a0c856/src/util/Throwable.hx

Reported by yarias

Comments (13)

  1. Anonymous

    haxe.2.diff also fixes lines longer than 80 characters.

    Sorry for all the changes after the issue was filed; I just noticed the recent ones.

  2. thatch

    Integrated in [073968b39346] my branch. Changes made to the haxe.2.diff as follows:

    • PEP-8 changes (whitespace after commas, end-of-line whitespace)
    • Moved it down to the bottom (I know it was alphabetic but I don't want it between !CssLexer and !HtmlLexer)
    • Renamed `id` to `ident` since `id` is a builtin
    • Made hex literals accept uppercase (I assume haXe does this)
    • Made regex nongreedy (in case you have two regexes on the same line)
    • I'm wary of the combined state `('#pop', 'newstate')` -- Georg?
    • I don't think it needs any flags (correct me if I'm wrong, DOTALL is only used on the multiline comment regex)
  3. Anonymous

    I learned about the existence of pygments and wrote the parser all in the same day, so I would trust your judgment on most of the above.

    - haxe does support uppercase hex literals.

    - I originally started from the AS3 lexer, as the haxe and AS3 syntax is at least somewhat similar. I believe the DOTALL portion came from the AS3 lexer.

    About ('#pop', 'newstate'): I used this pattern when I wanted to essentially transfer control to another context. It was almost always used to handle situations where there is some common portion (like generic information between <>) that can occur in multiple situations. In some situations, it should #pop, and in other situations it should #pop:2. Rather than write two versions of the generic group, I just had the calling context pop itself off the stack before recursing into the generic portion.

    For example, when something has the following possible BNF:

    <root> ::= <block> | <header> <block>

    Where you can either have a block, or you can have a header followed by a block, and after both the header and the block you return back to the root (or whatever the calling context was).

    I believe anywhere there is ('#pop', 'newstate') that ('newstate',"#pop") could be substituted. Or at least that is what I found when I was testing on my examples. Is there a cleaner way to pop yourself off the stack and /then/ recur into another group?

  4. Log in to comment