This code adds the Motorola 68k family Assembly language as used for example in the Amiga code and include files. It’s my first PR on this project, with BitBucket, and with Mercurial, so I beg pardon if something is not correct, just let me know.
They should go into the repository, under tests, there's an examplefiles folder (and should be part of the PR) Please put a file in there with M68K assembly. Note that you probably want to provide a analyse_text overload as well so your lexer automatically gets chosen over the more generic assembly lexers, look at the Ca65Lexer for inspiration (the easiest solution is to check for a M68K specific token.)
and it works. I see however that this targets any "macro" string in the text. So if I have ifmacrod MACRO the MACRO shouldn't be tagged with Operator.
So question are:
How can I specify that the directive should be the first thing in a line? I'm a bit confused by how regexp are used here, as something like r'^(\s+)macro' doesn’t seem to work.
How can I "reuse" definitions? I read the documentation, but I'm definitely confused. Let's say that I define (r'#?$?(0x)?[0-9a-f]+\b', Number.Integer) and then I want to say that ds.s 1234 is ds.s Operator + 1234 Number.Integer, without repeating the previous definition.
Just to clarify, I'd like to understand if there is any way to mark something like address numbers in decompiled code: r'[0-9a-f]+:' but optionally present exclusively at the beginning of the line. So I’d like to write a rule like: “There can be an address at the beginning of the line. If it’s there style it, then go on with the usual stuff”.
Regarding regex reuse, one possible solution is to define the regex as a static class member and just use that, see for instance graphics.py, the PostScriptLexer. That has a bunch of regex expressions ready for reuse.
Thanks for the answer, I found the documentation about states but I am a bit lost. I’ll give it a try again, however. Thanks for the link to the PostScript lexer, it will be definitely useful. I’ll let you know
OK here we are, this is the latest version of the whole thing. The example file gets parsed correctly. I couldn’t find a good list of examples of addressing modes, but I think we might merge this which works reasonably well. I tested it on the Amiga Kickstart code (both includes and disassembled), and everything was OK. Let me know what you think
Well, this is the latest version. It is far from being perfect, but it’s the best I can do now.
During the development I understood that Pygments doesn’t really fit Assembly languages, as these are not regular languages (as in regular expressions). For starters, we would need to simply split the text in lines and process each one separately (apart for compiler macros, and here we are, it’s not even coherent ). The compiler expressions are a nightmare, as there are many things that can go in an expression that are also valid elsewhere but with a completely different meaning. One example, in M68k * can be the multiplication or an inline comment. Good luck. Not to mention add which is a valid instruction and a valid hexadecimal number (and no, in disassembler dumps hexadecimals are not prepended with 0x). OK, sorry for the long text, but I wanted to share the stress
Let me know if everything is OK, and in that case this time I believe you can merge. Thanks!