This is the dcputhings, assorted tools for DCPU-16 development. This
repository is maintained by Kang Seonghoon and contains
the following softwares:
dcpuopt.c, my early attempts to build a DCPU-16 emulator.
- DcpuAsm, an Ocaml DSL for DCPU-16 assembly.
DcpuAsm is a DCPU-16 assembler embedded in the Ocaml syntax. It allows
easier DCPU-16 code generation, but it can also be used as an ordinary macro
assembler if you understand a bit of Ocaml.
Basically, the assembly is represented as an Ocaml list:
[ SET A, 4; SET B, 5; %loop: SET PC, %loop; ]
SET A, 4,
SET B, 5 and
%loop: SET PC, %loop evaluates
to the internal representation via Camlp4. Note that Ocaml allows the trailing
semicolon in the brackets, so the last
; is just fine.
This list can be converted to the binary via
let code = ASM [ SET A, 4; SET B, 5; %loop: SET PC, %loop; ];; print_string (DcpuAsm.to_binary_le code)
This will resolve all the labels to the fixed offset.
the static code into the little-endian byte string. The big-endian counterpart
to_binary_be, and you can get an array of words using
ASM assumes the origin at 0x0000. This can be changed using
~origin:0x1000 [...] syntax; in fact,
ASM is just a shorter alias to
Statements / Instructions
DcpuAsm supports the following instructions (and pseudo-instructions):
- Basic opcodes:
- Special opcodes:
- Raw data:
- Syntactic extensions:
- Empty opcode (i.e. no output at all):
Basic opcodes has two arguments, and special opcodes has one of them.
Multiple arguments are separated with
, as much like other assemblers.
DAT has one or more arguments. The argument can be a typical immediate (see
below for the syntax) which occupies exactly one word, or a string which
occupies the same number of words (so that
"ok" equals to
One can also use
_ for placeholders which value can be ignored; it is mostly
_s at the end of the binary will be ignored.
Arguments can have a
TIMES prefix as like
DAT 3 TIMES 0x1234, where the
repeat count can be any expression including labels. Repeating string is also
3 TIMES "hello?").
ORG x is equivalent to
DAT (x-%_) TIMES _, and will set the current
assembly position to
x if possible. It will raise an error if it is
x should be a positive integer.
ALIGN x is equivalent to
DAT ((x-(%_ MOD x)) MOD x) TIMES _, and will set
the current assembly position to the next multiple of
x. (Therefore it will
add at most
NOP is equivalent to
SET A, A. It does nothing but take one cycle. While
DCPU-16 has lots of nops, this encoding is chosen because of the simplicity of
its binary encoding (0x0001). It may change if 0x0000 also turns out to be a
JMP a sets the PC to
a in the fastest or at least shortest way. There are
4 possible encodings for
SET PC, ...,
XOR PC, ...,
AND PC, ...
SUB PC, .... (Among them
XOR is fastest but not applicable for all
cases.) Note that the plain
SET PC, a won't be optimized; you must
JMP a instead.
PUSH a is equivalent to
SET PUSH, a.
POP a is equivalent to
POP. You can also use
[SP] instead of
[SP+...] instead of
RET is equivalent to
SET PC, POP, and used for returning from the
subroutine initiated by
HLT are equivalent to
SUB PC, 1. This forms a simple infinite
loop, and used as a de facto instruction to terminate the emulator.
PASS does not emit the binary at all; it can be used as a placeholder.
DcpuAsm does not support
EQU pseudo-instruction or similar; you can use an
let x = ... in ... construct to define constants, however.
DcpuAsm supports labels. Labels are a valid Ocaml name (always starts with a
lowercase letter or
_) prepended by
are valid labels, for example.
There are two ways to use labels:
- It can occur in the expression, and evaluates to the location pointed by
the label. The instruction may contain labels defined after it.
- It can also occur at the front of the instruction (e.g.
%foo: SET A, 3)
to declare the label. The colon (
:) is optional for predefined
instructions, but you are recommended to keep the colon as it allows
multiple label definitions. Skipping a colon may be natural for
You can define labels at the end of list; the
PASS statement will be
[ JMP %garbage; DAT 1, 2, 3, 4; %garbage: ]
DcpuAsm will automatically resolve labels to the appropriate position. Since
the length of instructions may vary depending on the position of labels,
DcpuAsm runs multiple passes to settle them down. If it is not stabilized
after given number of passes DcpuAsm gives up. The default limit is 50, but
can be configured like
ASM ~maxpass:10 [...].
It is possible to have free (undefined) labels in the assembly. DcpuAsm will
make sure that these labels, while unresolved, will not affect the other parts
of generated code. This is done by forcing all remaining immediates to always
use a longer form.
It is advised to prepend
_ to local labels. DcpuAsm has a special support
for these local labels (see below).
The special label
%_, when used in the expression, resolves to the position
of the current instruction. For example
SET PC, %_ will be same as
SET PC, %_temp. You cannot define a label named
Expression can occur as an instruction's argument. It may contain registers,
numbers, labels, memory references (enclosed in
) and expressions.
DcpuAsm supports all general and special registers:
IA cannot really be used, but it is
there for the better error handling.) It also supports
PICK ...; they cannot be used in the expression.
DcpuAsm supports numbers in base 2 (
0b101), base 8 (
0o337), base 10 and
base 16 (
0x1337) just like Ocaml. Additionally a character literal (
will be equal to its numerical code (i.e.
int_of_char 'A'). All numbers are
treated as built-in Ocaml numbers (31 or 63 bits long depending on the
platform) so you should be aware of it.
DcpuAsm supports all ordinary arithmetic and bitwise operations:
SHR. While arithmetic
operations are permitted for registers, the resulting expression has to be in
register + other expression or
[register + other expression] due
to the constraint of DCPU-16. (The intermediate expression does not have to
[3*A+2*(2*B-A)-(8 DIV 2)*B] will be resolved to
[A], which is
perfectly valid in DCPU-16.)
As mentioned before, labels in the expression evaluate to their positions. You
can do something like this:
[ (* Returns sqrt(A) from the precomputed table. A should be less than 16. * B will contain an integral part and A will contain a fractional part. *) %isqrt: IFG A, (%_fpend-%_fpstart) DIV 2 - 1; (* bound check *) HLT; MUL A, 2; SET B, [%_fpstart+A]; SET A, [%_fpstart+A+1]; JMP POP; %_fpstart: DAT 0x0000, 0x0000; DAT 0x0001, 0x0000; DAT 0x0001, 0x6a0a; DAT 0x0001, 0xbb68; DAT 0x0002, 0x0000; DAT 0x0002, 0x3c6f; DAT 0x0002, 0x7312; DAT 0x0002, 0xa550; DAT 0x0002, 0xd414; DAT 0x0003, 0x0000; DAT 0x0003, 0x298b; DAT 0x0003, 0x510e; DAT 0x0003, 0x76cf; DAT 0x0003, 0x9b05; DAT 0x0003, 0xbddd; DAT 0x0003, 0xdf7c; %_fpend: ]
IMM (e) (note the parentheses) and
PTR [e] is a longer form of
[e], respectively, and you should use them outside the assembly instruction.
Normal Ocaml expression can also be used;
VAL e will evaluate
e as an
Ocaml expression and use its value as an immediate. Similarly,
STR e uses
its value as a string (only useful in
DAT arguments). If the Ocaml
expression is simple enough (e.g. a single identifier) then you can omit
VAL entirely. This is very useful for compile-time constants:
let screen_base = 0x8000 in [ SET [screen_base], 'H'; SET [screen_base+1], 'e'; SET [screen_base+2], 'l'; SET [screen_base+3], 'l'; SET [screen_base+4], 'o'; HLT; ]
DcpuAsm tries to generate the shortest code for given assembly, but you can
override this behavior by
SHORT e will cause an
e does not fit in the range of -1--30 or it is used as a first
operand (cannot use a short literal there), and
LONG e will generate a longer
form of given immediate (not the instruction, so should use it twice for basic
opcodes). This only applies to a literal value; it is silently ignored in other
kind of values.
A special value
NEXT can also be used as a part of an expression. It won't
generate the "next words" required for long literals and register-relative
addressing, so whatever the next instruction is it (or its first word) will be
the next word. It can be used for simple self-modifying programs in combination
SHORT (to ensure that the next instruction is always one word long), for
example. Note that
NEXT is not canonicalized, so
example) is invalid. Only a form of
[NEXT+reg] is valid.
DcpuAsm supports a block as a unit of assembly instructions. They can be used
as a building block:
(* Warning: not tail-recursive. Illustration purpose only. *) let copyn src dst n = if n = 0 then PASS else if n = 1 then SET [dst], [src] else BLOCK [ SET [dst], [src]; copyn (src+1) (dst+1) (n-1); ] in [ (* save and restore the video memory *) copyn 0x8000 0x4000 (16*32); copyn 0x4000 0x8000 (16*32); HLT; ]
BLOCK e evaluates
e as an Ocaml expression (which includes,
incidentally, a list containing assembly instructions) and makes an
instruction block out of it. You don't have use blocks if you have exactly one
instruction, as the case
n = 1 of the above code suggests.
More interesting use of blocks involves local labels:
let case k = BLOCK [ (* ... *) ] in [ BLOCK LOCAL [ IFE A, 1; JMP %_next; JMP %_skip; %_next: case 1; %_skip: ]; BLOCK LOCAL [ IFE A, 2; JMP %_next; JMP %_skip; %_next: case 2; %_skip: ]; HLT; ]
BLOCK LOCAL will make all defined labels starting with
_ local. It is not
possible to access these local labels outside of the block (unless you use a
nasty hack). Non-local blocks (in this case,
case 1 and
case 2) will not
affect this procedure. This is very useful for generated codes.
You can manually give a list of local labels using
BLOCK LOCAL %a, %b, %c
syntax; or by an Ocaml list using
BLOCK LOCAL *["a"; "b"; "c"].
DcpuAsmExample.ml contains some extreme example of local blocks.
There are no separate macro feature in DcpuAsm, but you can trivially make a
simple macro with local blocks and Ocaml
DcpuAsm does support some additional features for macros. While
STR allows the insertion of arbitrary immediate value or string, you cannot
insert registers or other expression in this way. Therefore DcpuAsm supports a
#e of an Ocaml expression, which can appear in:
- Expressions (e.g.
3 + #reg). The expression should evaluate to the
internal representation of expression; an immediate should be quoted using
- Labels (e.g.
%#labelname). The expression should evaluate to a string.
While you can use any character in the label name (even an empty string is
permitted), you should restrict yourself to the normal identifier. Local
labels, for example, start with
function can be used to generate unique symbols.
Caveats / To-do List
As always you should expect the following caveats:
- You cannot use codes like
[(%label1, ...); (%label2, ...); ...]in the
normal Ocaml code because
(%will be treated as one token. (
etc. will work.) The assembly syntax is specially crafted to separate those
two, however. Any suggestions about this problem are welcomed.