#765 Open

Bitbucket cannot automatically merge this request.

The commits that make up this pull request have been removed.

Bitbucket cannot automatically merge this request due to conflicts.

Review the conflicts on the Overview tab. You can then either decline the request or merge it manually on your local system using the following commands:

hg update default
hg pull -r m68kasm https://bitbucket.org/lgiordani/pygments-main
hg merge m68kasm
hg commit -m 'Merged in lgiordani/pygments-main/m68kasm (pull request #765)'
  1. Leonardo Giordani

This code adds the Motorola 68k family Assembly language as used for example in the Amiga code and include files. It’s my first PR on this project, with BitBucket, and with Mercurial, so I beg pardon if something is not correct, just let me know.

Comments (17)

  1. Anteru

    Thanks for the PR! Do you have a sample file to test? It also looks like some instructions are missing: https://github.com/aquynh/capstone/blob/master/arch/M68K/M68KInstPrinter.c#L44 and some registers as well: https://github.com/aquynh/capstone/blob/master/arch/M68K/M68KInstPrinter.c#L31 - could you please add them?

    I also learned that M68k has some funky addressing modes, would be nice to see something like move.b ([$7fffffff, a0], d0.w, $12345678), ([$10101010, a0, d0.w], $32323232) covered in a test.

    1. Anteru

      The easiest approach is probably to add a new example file for m68k assembly, and then just run your local copy of pygmentize on that until you’re happy with the highlighting. Does that help?

        1. Anteru

          They should go into the repository, under tests, there's an examplefiles folder (and should be part of the PR) Please put a file in there with M68K assembly. Note that you probably want to provide a analyse_text overload as well so your lexer automatically gets chosen over the more generic assembly lexers, look at the Ca65Lexer for inspiration (the easiest solution is to check for a M68K specific token.)

  2. Leonardo Giordani author

    I need help. I have a compiler directive that is macro, so I defined something like

        tokens = {
            'root': [
                (r'\b(macro)\b', Operator),

    and it works. I see however that this targets any "macro" string in the text. So if I have ifmacrod MACRO the MACRO shouldn't be tagged with Operator.

    So question are:

    1. How can I specify that the directive should be the first thing in a line? I'm a bit confused by how regexp are used here, as something like r'^(\s+)macro' doesn’t seem to work.
    2. How can I "reuse" definitions? I read the documentation, but I'm definitely confused. Let's say that I define (r'#?$?(0x)?[0-9a-f]+\b', Number.Integer) and then I want to say that ds.s 1234 is ds.s Operator + 1234 Number.Integer, without repeating the previous definition.

    Thanks in advance

  3. Leonardo Giordani author

    Just to clarify, I'd like to understand if there is any way to mark something like address numbers in decompiled code: r'[0-9a-f]+:' but optionally present exclusively at the beginning of the line. So I’d like to write a rule like: “There can be an address at the beginning of the line. If it’s there style it, then go on with the usual stuff”.

  4. Anteru

    For line matching, you'll probably (I'm not 100% sure on this) change the default mode from re.MULTILINE to single-line matching, for instance, through an inline flag. See: http://pygments.org/docs/lexerdevelopment/#regex-flags

    Regarding regex reuse, one possible solution is to define the regex as a static class member and just use that, see for instance graphics.py, the PostScriptLexer. That has a bunch of regex expressions ready for reuse.

    And finally, with your macro stuff -- could this be something where you need to push/pop state, i.e. is the desired behavior something like "enter macro" and "leave macro"? In this case, you can change state by using`#push and #pop, see: http://pygments.org/docs/lexerdevelopment/#changing-states -- that should allow you to change the behavior when moving into a new scope (and you can factor out the common stuff into another rule and use`include to share it, see: http://pygments.org/docs/lexerdevelopment/#advanced-state-tricks)

    Does that help? I’m not an expert in writing lexers myself, learning “on the job” here 🙂

  5. Leonardo Giordani author

    Thanks for the answer, I found the documentation about states but I am a bit lost. I’ll give it a try again, however. Thanks for the link to the PostScript lexer, it will be definitely useful. I’ll let you know

  6. Leonardo Giordani author

    OK here we are, this is the latest version of the whole thing. The example file gets parsed correctly. I couldn’t find a good list of examples of addressing modes, but I think we might merge this which works reasonably well. I tested it on the Amiga Kickstart code (both includes and disassembled), and everything was OK. Let me know what you think

    1. Anteru

      Sure, no worries, wasn’t going to merge until the weekend anyways 🙂 Take your time and thanks again for working on this, appreciate the attention to detail!

  7. Leonardo Giordani author

    Well, this is the latest version. It is far from being perfect, but it’s the best I can do now.

    During the development I understood that Pygments doesn’t really fit Assembly languages, as these are not regular languages (as in regular expressions). For starters, we would need to simply split the text in lines and process each one separately (apart for compiler macros, and here we are, it’s not even coherent 🙂). The compiler expressions are a nightmare, as there are many things that can go in an expression that are also valid elsewhere but with a completely different meaning. One example, in M68k * can be the multiplication or an inline comment. Good luck. Not to mention add which is a valid instruction and a valid hexadecimal number (and no, in disassembler dumps hexadecimals are not prepended with 0x). OK, sorry for the long text, but I wanted to share the stress 😹

    Let me know if everything is OK, and in that case this time I believe you can merge. Thanks!