Commits

Cat's Eye Technologies  committed f3709ca

Convert documentation to Markdown.

  • Participants
  • Parent commits 0296649

Comments (0)

Files changed (3)

File README.markdown

+The Emmental Programming Language
+=================================
+
+November 2007, Chris Pressey, Cat's Eye Technologies
+
+Introduction
+------------
+
+Emmental is a self-modifying programming language. That is not to say
+that is a language in which programs are self-modifying; rather, it is
+the language itself, as defined by a meta-circular interpreter, that can
+be modified during the course of a running program. Indeed, this is how
+Emmental, without conventional conditional and repetition/recursion
+constructs, achieves Turing-completeness.
+
+Meta-circular Interpreters
+--------------------------
+
+One way to attempt to define a language is by giving what's called a
+*meta-circular interpreter* (often shortened to "MCI" in this document.)
+This is an interpreter for some language which is written in that same
+language (or at least, a language which is very close to it.)
+
+Meta-circular interpreters were a popular way to the describe the
+semantics of programming languages, especially LISP-like languages, and
+especially before the advent of denotational semantics. The term
+"meta-circular" was apparently coined by John C. Reynolds in his paper
+"Definitional Interpreters for Higher-Order Programming Languages" (1972
+Proceedings ACM National Conference.)
+
+Of course, in the real world, MCI's are not often used. They certainly
+*can* be used: if you have a working Scheme interpreter that came with
+your computer system, there is nothing stopping you from writing another
+Scheme interpreter in Scheme, and running your programs on your
+interpreter (which is itself running on your system's interpreter.)
+However, this is quite a bit less efficient due to the duplication of
+effort. A somewhat more realistic case might be if your system came
+with, say, a Scheme compiler. You might then feed your Scheme
+interpreter (written in Scheme) through that to make a native Scheme
+interpreter, and use that to interpret your programs. (In this setup,
+the interpreter is usually described as "self-hosting" rather than
+"meta-circular".)
+
+But, as should be obvious, you already need an implementation of Scheme
+for your Scheme interpreter written in Scheme to be of much practical
+use to you. If your meta-circular interpreter is all you have, you won't
+be able to use it to run or understand Scheme programs. Because the MCI
+is defined in terms of itself, you'll need some other source of
+"understanding how it works" to make it complete. This understanding
+might come from an implementation in some other programming language, or
+a specification in some formal language, or a description in some
+natural language, or simply from intuition — but it has to come from
+somewhere.
+
+Assuming that we do have that external source of understanding, the
+meta-circular interpreter can come in quite handy in codifying the
+semantics of the language. And, in Emmental's case, *altering* those
+semantics: Emmental's MCI supports operations which instruct Emmental's
+MCI to modify its behaviour.
+
+Interpreter Structure
+---------------------
+
+To describe the structure of Emmental's MCI, we first examine the
+general structure of interpreters. If you've ever written a virtual
+machine executor in, say, C, you've noticed that it has the general form
+
+        pc = start;
+        while (!done) {
+            switch (instruction[pc]) {
+                case INSTR_ONE:
+                    /* implement semantics of INSTR_ONE */
+                    pc += advance(INSTR_ONE);
+                    break;
+                case INSTR_TWO:
+                    /* implement semantics of INSTR_TWO */
+                    pc += advance(INSTR_TWO);
+                    break;
+                /* ... */
+                case INSTR_HALT:
+                    done = 1;
+                    break;
+                default:
+                    perror("Invalid opcode");
+            }
+        }
+
+Note that `advance()` is some function that computes how far the program
+counter is advanced on that instruction. This value is typically +1 for
+most instructions, but more or less than 1 (and dependent on the state
+of the program) for a handful of "branch" instructions. Note also that
+`advance()` would not typically be implemented in C as a function; I'm
+just showing it like this to emphasize the regular structure.
+
+From this we infer that the basic structure of an interpreter is a
+*dictionary* or *map* that associates program symbols with operations.
+There is some extra housekeeping like the fetch-execute cycle that
+surrounds this dictionary, but this can (hopefully) be handled mostly
+automatically, freeing us to concentrate on *symbols* and *operations*.
+
+The symbols could be taken from any finite alphabet, but in Emmental, to
+keep things relatively simple, we'll just use the ASCII character set.
+(Actually, to be precise, this is the full 8-bit ASCII character set.
+Non-printable control characters are allowed, as are characters between
+128 and 255, and each has a distinct meaning. But their representations
+are not defined.)
+
+The operations can be thought of, abstractly, as functions which
+transform program states. Or they can be thought of, concretely, as
+segments of code — mini-programs which implement these functions. In the
+case of a meta-circular interpreter, these mini-programs would be
+written *in the language being interpreted*.
+
+To extend this idea to a *self-modifying* meta-circular interpreter, the
+operations can be thought of as functions which transform both program
+states *and* interpreter definitions. (Alternatively, the interpreter
+definition could be thought of as part of the program state, although I
+feel that's a bit gorier a way to look at it, and I prefer the other
+view, at least for Emmental.)
+
+In Emmental, most operations leave the interpreter definition unchanged.
+However, there is one operation which alters the interpreter definition,
+and it is this altered definition that is used to interpret the
+remainder of the program.
+
+Emmental Semantics (in Emmental)
+--------------------------------
+
+Emmental is essentially a stack-based language. (There's also a queue,
+but it's not as central.) All operations implicitly get data from, and
+implicitly deposit results back onto, a single stack. For
+orthogonality's sake, this stack may contain only ASCII symbols. (And
+note that trying to pop an empty stack, or dequeue an empty queue, is an
+error that aborts the program.)
+
+Note that because we've established that an interpreter (at least,
+insofar as Emmental ever needs to know) is simply a map that takes
+symbols to operations, and that operations in Emmental are defined
+(meta-circularly) as Emmental programs, we can use the following
+notation to describe interpreters:
+
+    % → XYZ+*!
+    & → 'ap'ag'ag
+
+That is, the symbol `%`, when encountered in an Emmental program,
+indicates an operation that is defined by the Emmental program `XYZ+*!`,
+and so forth.
+
+When a main Emmental program begins execution for the first time, it
+starts with what's called the *initial Emmental interpreter*. (This
+fact, of course, doesn't apply to any further point of execution inside
+an Emmental program, or execution of operations defined in Emmental's
+MCI, since these would be considered subprograms of a sort. These cases
+use whichever interpreter happens to be in force in that point in time.)
+
+The inital Emmental interpreter is defined as follows:
+
+    a → a
+    b → b
+    c → c
+    ...
+
+That is, for every symbol x in the ASCII set, x `→` x.
+
+Doesn't tell us a lot about Emmental's semantics, does it? No. Nothing
+at all, really. But remember what I said about needing an external
+source of understanding, in order to actually get full value out of an
+MCI. And remember the purpose of Emmental's MCI: it is not there so much
+to help us understand Emmental, but to allow us to *change* Emmental,
+from inside an Emmental program.
+
+And, for all that our description of the initial Emmental interpreter is
+unhelpfully tautological, it is not incorrect: the semantics of `a` can
+in fact be thought of as being defined by an Emmental program that
+consists of only one instruction, `a`. This happy state of affairs comes
+about because Emmental is stack-based; the "signature" (the requirements
+for the "before" and "after" stacks) of the symbol `a` is the same as
+the signature of the program containing the single symbol `a`. No extra
+syntax to specify arity and the like is necessary.
+
+Above all, don't panic: we *will* describe what symbols like `a`
+actually mean in Emmental, we'll just need to do it in something besides
+Emmental. In fact, let's do that right now.
+
+Emmental Semantics (in English)
+-------------------------------
+
+### Primitive Arithmetic
+
+`#` pushes the symbol NUL (ASCII 0) onto the stack.
+
+The symbols `0`, `1`, ... `9` pop a symbol from the stack, multiply its
+ASCII value by 10 modulo 256, add the value 0, 1, ... 9 (respectively)
+to that value modulo 256, and push the resulting symbol back onto the
+stack.
+
+The upshot of these 11 operations is that you can push arbitrary symbols
+onto the stack by spelling out their ASCII values in decimal. For
+example, `#64` pushes a `@` onto the stack.
+
+`+` pops two symbols off the stack, adds together their ASCII values
+modulo 256, and pushes the symbol with the resultant ASCII value back
+onto the stack.
+
+`-` pops two symbols off the stack, subtracts the ASCII value of the
+first popped from the ASCII value of the second popped modulo 256, and
+pushes the symbol with the resultant ASCII value back onto the stack.
+
+`~` ("log") pops a symbol from the stack, computes the discrete base-2
+logarithm of the ASCII value of that symbol, and pushes the symbol with
+the resultant ASCII value back onto the stack. The discrete base-2
+logarithm of a number is the floor or integer part of the base-2
+logarithm of that number. Alternately, it is the number of the highest
+bit position (starting with the LSB = bit position 0) with a bit set
+when the number is viewed as binary. Because the base-2 logarithm of the
+number 0 itself is undefined, the number 0 is treated as 256 for this
+operation; its discrete base-2 logarithm is 8.
+
+### Stack and Queue Manipulation
+
+`^` ("enqueue a copy") pops a symbol off the stack, makes a copy of it,
+pushes it back onto the stack, and enqueues the copy onto the queue.
+
+`v` ("dequeue") dequeues a symbol from the queue and pushes it onto the
+stack.
+
+Using these operations in combination, one can form "discard",
+"duplicate", "swap", and other more advanced stack manipulation
+operations. For example, assuming an empty queue and more than two
+elements on the stack, "swap" can be accomplished with the code
+`^v^-+^^v^v^v-+^v-+^v-+vv`.
+
+Despite this fact, the operation `:` duplicates the top value of the
+stack. (Emmental is not an absolutely minimal language; note, for
+instance, that it has all ten decimal digits as operations when these
+could surely have been defined in terms of only one or two operations.
+The reasons for a seperate `:` operation are given below in the section
+on Computational Class.)
+
+### I/O
+
+`.` pops a symbol off the stack and sends it to the standard output as
+an ASCII symbol.
+
+`,` waits for an ASCII symbol to arrive on standard input, and pushes it
+onto the stack.
+
+### Interpreter Modification and Reflection
+
+First let's define what it means to *pop a string* off the stack.
+Symbols are popped off the stack until a `;` symbol is found on the
+stack. The symbols popped off are considered a string in the reverse
+order they were popped; i.e. the last symbol popped is the first symbol
+of the string. The `;` symbol is popped off the stack, but is not made
+part of the string; it is simply discarded.
+
+Since an Emmental program is a string, popping a program is the same as
+popping a string, just that the string is interpreted as a program.
+
+`!` (sometimes called "supplant") pops a symbol, which we call s, off
+the stack. Then it pops a program t. It then inserts the association s
+`→` t into the interpreter definition. This overwrites whatever mapping
+of s might have been in the interpreter definition previously. This new
+interpreter definition is used for all subsequent execution (until it is
+changed again, of course.)
+
+Note that `!` does *early binding*. That is, the meaning of each symbol
+in this program t is the meaning of that symbol *at the time `!` is
+executed*. If some subsequent `!` operation later changes the meaning of
+one of the symbols that occurs in t, the meaning of t is not changed.
+The semantics of t are "captured" or "frozen". This implies, among other
+things, that supplanting some symbol z with itself (a program consisting
+only of the symbol z) is a no-op, because z's meaning, at the time that
+`!` is executed, is invariably z.
+
+`?` (sometimes called "eval") pops a symbol, which we call s, off the
+stack. It then executes that symbol (interprets it as an operation) with
+the interpreter currently in effect.
+
+Note that `?` does *late binding*. That is, in contrast with `!`, `?`
+never "freezes" the semantic definition of the thing that it is
+executing. This is true even when `?` occurs in a operation redefinition
+(i.e. the program that supplanted some symbol's semantics when an `!`
+was executed.) This implies, among other things, that supplanting some
+symbol z with the program that consists of instructions that push the
+ASCII value of z onto the stack, followed by a `?` instruction, creates
+a *cyclic meaning* for z. This is because the z that will be executed by
+the `?` will always be the present z, that is, the z that is executing
+the `?`.
+
+For convenience's sake, `;` pushes the symbol `;` onto the stack.
+
+All other symbols are no-ops.
+
+Computational Class
+-------------------
+
+I believe Emmental is Turing-complete with only the operations that have
+been given so far, but I haven't proved it yet. All the elements are
+there, and although some of them are somewhat "cramped", they look
+viable.
+
+If you want to try thinking about how you could write real programs
+(like a universal Turing-machine simulator) in Emmental, you might want
+to skip this section, since it contains "spoilers".
+
+Repetition can be accomplished by assigning a symbol a cyclic semantics,
+by use of a `?` within a `!`. For example, we can redefine the semantics
+of `0` to be `#48?`. This is simply a program that pushes the symbol `0`
+onto the stack and executes it with the current interpreter, and, since
+`0` has been redefined to mean `#48?` in the current interpreter, this
+will loop forever. The entire program to do this to `0` and run the
+infinite loop is:
+
+    ;#35#52#56#63#48!0
+
+This technique can also be used to "jump" from one definition to
+another, by using `?` to execute some *other* symbol at the end of a
+definition (that is, some symbol other than the symbol being defined.)
+
+Conditionals are a little more restrictive. The trick to them is,
+strangely, the discrete log operator `~` in combination with the eval
+operator `?`. Since `~` maps all symbols into a set of nine symbols, and
+`?` executes the symbol on the stack, `~?` will execute one of the
+symbols from ASCII 0 (NUL) to ASCII 8 (BS). We can then, for instance,
+define NUL to do one thing, define SOH through BEL to do the same as
+NUL, and define BS to do some other thing; this essentially
+distinguishes between 0 (which executes BS) and every other value (which
+executes NUL). Further, we can use this in conjunction with `-` to
+compare two values for equality. So, for example, a program which inputs
+a character, and outputs Y if the character is M and N otherwise, would
+be:
+
+    #59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~?
+
+In case NUL through BS are in use for some reason, we can always add 9
+to the result of the log (`~#9+?`) to map the answer onto HT through
+DC1. Or, of course, any of a great number of other arithmetical mappings
+of our choosing. The most severe constraint is that there be 9 available
+symbols to act as "destinations" for our "branch". Even if we never
+overwrite one "branch" with another (and we can do that in Emmental!)
+and even if we allocate one extra symbol to be the "launch point" of the
+branch, we still have room for 25 branches in the ASCII character set.
+
+So these parts look good. If there's a problem, it's with the queue.
+Specifically, the problem seems to be the need to know the present size
+of the queue in order to do stack housework like "duplicate" and the
+subsequent need for "duplicate" to achieve "discard." (Duplicate can be
+defined as `^v`, but this only works when the queue is empty. Discard
+can be defined as duplicate plus `-+`, but this only works when there
+are other elements below the element being discarded. [This last point
+is not generally a problem since we can push arbitrary values onto the
+stack before any program.])
+
+However, if it turns out that we need "duplicate" or "discard" in order
+to write routines that can handle a variable-sized queue — and that
+strikes me as likely — then it looks like we have a severe problem.
+
+Here's one way I could try to deal with it. I could say that the queue
+is *local* to the operation being defined (or the main program.) Then
+you could define `:` to be `^v`, and inside `:`'s definition, the queue
+would always initially be empty, so the definition would work.
+
+But... we need the queue to store our global data. For example, if we
+are going to simulate a Turing machine, we'd need to use the queue to
+store the tape (perhaps "doubled up", with one symbol of each pair
+telling us "the next symbol is a simulated tape symbol" or "the next
+symbol is some housekeeping value.") We can't store the tape on just one
+stack. And, once you are looping in Emmental, you've left the "main
+program" forever; you're jumping from definition to definition, and each
+has their own queue. At best, you'd need to "dump" the queue onto the
+stack each time you switched definitions, and even then you'd need a
+loop to do that, and to loop you need to switch definitions. It's a
+royal mess.
+
+So here's how I will deal with it. I will add a primitive duplicate
+operation, `:`, to Emmental. Proving that Emmental is Turing-complete is
+still, then, a challenge, although a doable-seeming challenge. I will
+then propose a more formidable challenge: prove that the language formed
+by removing the `:` operation from Emmental is Turing-complete.
+(Equivalently, prove that the set of Emmental programs that begin with
+`;#0#58!` is Turing-complete. The nice thing about Emmental is that you
+can always shoot yourself in the foot — until you erase your pistol,
+that is.)
+
+And if you *really* like a challenge, try proving that Emmental without
+`~` is Turing-complete. I don't think that it is, although it's possible
+for it to compute parity, at least (input a symbol, output E if its
+ASCII value is even, and O if it's odd. To accomplish this, multiply the
+input's ASCII value by 128 by adding 127 copies of it to it; this is
+modulo 256, so the only results can be 0 or 128. Define those operations
+to print out E and O respectively. But that's as far as I've gotten.)
+
+Discussion
+----------
+
+### Design Decisions
+
+I would've liked to have given Emmental a `'` or `"` instruction similar
+to Funge's "quote" and "quote-mode" instructions; instructions that
+treat one or more of the following symbols in the program literally,
+pushing them, as symbols, onto the stack, instead of executing them.
+However, such instructions are somewhat problematic, both theoretically
+and (for the approach I took implementing Emmental) practically. There
+are two ways of thinking about the problems that arise.
+
+One is that the function which implements `'` is given access to the
+program text itself, and possibly the position within the program, and
+it uses these to extract the "immediate mode" symbol past the `'`. This
+information could be available because these pieces of information are
+considered extra arguments to the function, or because they are (gorily)
+considered part of the overall program state. Either way, this operation
+is given a lot of information to work with, and for consistency (since
+we want to be all nice and neat and say that all operations have the
+same signature so that our dictionary has a nice uniform type,) *all*
+operations have access to this information. This is almost too much
+information; that is, it is so much that operations don't really *need*
+the dictionary. We could just say there is *one* operation, defined by a
+function, and that function is given the current symbol and has to
+decide what it means through whatever means it likes.
+
+This approach is very powerful, of course, but it's just not the style
+that Emmental embodies. (In fact, the idea to view interpreters as
+dictionaries was one of the foundational design choices for Emmental, to
+the point where I started constructing a "little theory of interpreters
+as maps." It really wasn't exploited as much as I think it could have
+been. If an interpreter is a map of symbols to strings of symbols, it's
+much more tractable than an opaque function would be; you can define all
+sorts of operations on them, for example concatenating two interpreters
+(for all symbols s in interpreter a and interpreter b, c[s] `→` a[s]b[s]
+— that sort of thing,) computing union or intersection of interpreters,
+Cartesian product, etc.)
+
+The other way of looking at it is to say that there are in fact
+*multiple* meta-circular interpreters available inside Emmental, and
+symbols like `'` switch temporarily to an alternate MCI. This alternate
+MCI interprets every symbol as "push this symbol", then reinstates the
+previous MCI. I like this explication better than the one above — MCIs
+begin to look a bit like continuations! — but to do it justice would
+take some work. I envision a language where the program has fine control
+over which MCI is in effect, possibly by keeping a map from symbols to
+MCIs, or maybe even being able to push MCIs onto the stack. This is a
+wee bit much for Emmental too though, and perhaps I'll explore these
+possibilities in a future language.
+
+### Turing-completeness
+
+You can make the argument that Emmental's way of being Turing-complete
+is really nothing new: when you redefine some symbol, you're really just
+defining a new function, and when you use `?` to execute that symbol
+from within its own definition, you're just making a recursive function
+call.
+
+Well, yes, you can make that argument. But it has to do with how you
+think about "what is a language", I think. Does a Pascal program
+fragment which defines a procedure called `PrintFibonacci` represent
+another programming language, one different from Pascal? You could
+certainly say that it does — it's the language Pascal where the token
+`PrintFibonacci` has some meaning that it doesn't have in Pascal.
+
+In that view, any language where you can define procedures, or
+functions, or standard libraries, or the like, is an extensible
+language. But even languages where you *can't* define new procedures or
+functions is arguably an extensible language. Take some initial
+Brainfuck program fragment, for instance. After it executes, it leaves
+the Brainfuck tape and while-stack in some state that depends on the
+input. Any Brainfuck fragment that executes after that, will execute in
+that environment, and that environment is arguably a version of the
+language Brainfuck, suitably extended.
+
+You don't normally think of it that way, I bet, but you *could* — and
+you would need to, to some degree, to claim that Emmental is "just"
+defining new functions. The reason you don't typically look at languages
+like this (unless you are very strange) is because it's much more useful
+to divide the world into "languages" and "programs." And Emmental *does*
+make this division, it just makes it in a slightly different place than
+usual.
+
+As far as I'm concerned, if I describe what Emmental does as modifying
+the Emmental language via its MCI, and what Emmental actually does is
+consistent with the idea of modifying the Emmental language via its MCI,
+then what Emmental effectively does is modify the Emmental language via
+its MCI. And if it needs to do this in a certain way in order to
+simulate a universal Turing machine, then that difference (however
+slight) sets it apart from systems where this simulation needs to be
+done by defining recursive functions.
+
+Implementation
+--------------
+
+`emmental.hs` is a reference interpreter for Emmental written in
+Haskell. Run the function `emmental` on a string; you can also run
+`debug` on a string to view the state of the program (stack & queue)
+during execution. (Note that `debug` is *not* able to show program
+states that occur internal to an operation.)
+
+Happy interpreter-redefining!  
+Chris Pressey  
+Chicago, IL  
+November 11, 2007

File doc/emmental.html

-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<!-- encoding: UTF-8 -->
-<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
-<title>The Emmental Programming Language</title>
-  <!-- begin html doc dynamic markup -->
-  <script type="text/javascript" src="/contrib/jquery-1.6.4.min.js"></script>
-  <script type="text/javascript" src="/scripts/documentation.js"></script>
-  <!-- end html doc dynamic markup -->
-</head>
-<body>
-
-<h1>The Emmental Programming Language</h1>
-
-<p>November 2007, Chris Pressey, Cat's Eye Technologies</p>
-
-<h2>Introduction</h2>
-
-<p>Emmental is a self-modifying programming language.  That is not to say
-that is a language in which programs are self-modifying; rather, it is
-the language itself, as defined by a meta-circular interpreter, that can be
-modified during the course of a running program.  Indeed, this is how
-Emmental, without conventional conditional and repetition/recursion constructs,
-achieves Turing-completeness.</p>
-
-<h2>Meta-circular Interpreters</h2>
-
-<p>One way to attempt to define a language is by giving what's called
-a <em>meta-circular interpreter</em> (often shortened to "MCI" in this document.)
-This is an interpreter for some language which is written in that same language
-(or at least, a language which is very close to it.)</p>
-
-<p>Meta-circular interpreters were a popular way to the describe the
-semantics of programming languages, especially
-LISP-like languages, and especially before the advent of denotational
-semantics.  The term "meta-circular" was apparently coined by John C. Reynolds
-in his paper "Definitional Interpreters for Higher-Order Programming
-Languages" (1972 Proceedings ACM National Conference.)</p>
-
-<p>Of course, in the real world, MCI's are not often used.
-They certainly <em>can</em> be used: if you have a working Scheme interpreter
-that came with your computer system, there
-is nothing stopping you from writing another Scheme interpreter in Scheme,
-and running your programs on your interpreter (which is itself running on your
-system's interpreter.)  However, this is quite a bit
-less efficient due to the duplication of effort.  A somewhat more realistic
-case might be if your system came with, say, a Scheme compiler.  You might then feed your
-Scheme interpreter (written in Scheme) through that to make a native Scheme
-interpreter, and use that to interpret your programs.
-(In this setup, the interpreter is usually described as "self-hosting"
-rather than "meta-circular".)</p>
-
-<p>But, as should be obvious, you already need an implementation of Scheme
-for your Scheme interpreter written in Scheme to be of much practical
-use to you.  If your meta-circular interpreter is all you have, you won't
-be able to use it to run or understand Scheme programs.  Because the MCI is
-defined in terms of itself, you'll need some other source of "understanding
-how it works" to make it complete.  This understanding might come
-from an implementation in some other programming language, or a specification
-in some formal language, or a description in some natural language, or simply
-from intuition — but it has to come from somewhere.</p>
-
-<p>Assuming that we do have that external source of understanding, the
-meta-circular interpreter can come in quite handy in codifying the semantics
-of the language.  And, in Emmental's case, <em>altering</em> those semantics:
-Emmental's MCI supports operations which instruct Emmental's MCI to modify
-its behaviour.</p>
-
-<h2>Interpreter Structure</h2>
-
-<p>To describe the structure of Emmental's MCI, we first examine the
-general structure of interpreters.  If you've ever written a virtual machine
-executor in, say, C, you've noticed that it has the general form</p>
-
-<pre>
-    pc = start;
-    while (!done) {
-        switch (instruction[pc]) {
-            case INSTR_ONE:
-                /* implement semantics of INSTR_ONE */
-                pc += advance(INSTR_ONE);
-                break;
-            case INSTR_TWO:
-                /* implement semantics of INSTR_TWO */
-                pc += advance(INSTR_TWO);
-                break;
-            /* ... */
-            case INSTR_HALT:
-                done = 1;
-                break;
-            default:
-                perror("Invalid opcode");
-        }
-    }
-</pre>
-
-<p>Note that <code>advance()</code> is some function that computes how far the
-program counter is advanced on that instruction.  This value is typically
-+1 for most instructions, but more or less than 1 (and dependent on the
-state of the program) for a handful of "branch" instructions.
-Note also that <code>advance()</code> would not typically be implemented in C
-as a function; I'm just showing it like this to emphasize the
-regular structure.</p>
-
-<p>From this we infer that the basic structure of an interpreter is
-a <em>dictionary</em> or <em>map</em> that associates program symbols with operations.
-There is some extra housekeeping like the fetch-execute cycle that surrounds
-this dictionary, but this can (hopefully) be handled mostly automatically,
-freeing us to concentrate on <em>symbols</em> and <em>operations</em>.</p>
-
-<p>The symbols could be taken from any finite alphabet, but in Emmental,
-to keep things relatively simple, we'll just use the ASCII character set.
-(Actually, to be precise, this is the full 8-bit ASCII character set.
-Non-printable control characters are allowed, as are characters between
-128 and 255, and each has a distinct meaning.  But their representations
-are not defined.)</p>
-
-<p>The operations can be thought of, abstractly, as functions which
-transform program states.  Or they can be thought of, concretely, as
-segments of code — mini-programs which implement these functions.
-In the case of a meta-circular interpreter, these mini-programs would be written
-<em>in the language being interpreted</em>.</p>
-
-<p>To extend this idea to a <em>self-modifying</em> meta-circular interpreter,
-the operations can be thought of as functions which transform
-both program states <em>and</em> interpreter definitions.
-(Alternatively, the interpreter definition could be thought of as
-part of the program state, although I feel that's a bit gorier a way
-to look at it, and I prefer the other view, at least for Emmental.)</p>
-
-<p>In Emmental, most operations leave the interpreter definition unchanged.
-However, there is one operation which alters the interpreter
-definition, and it is this altered definition that is used to interpret
-the remainder of the program.</p>
-
-<h2>Emmental Semantics (in Emmental)</h2>
-
-<p>Emmental is essentially a stack-based language.  (There's also a queue,
-but it's not as central.)  All operations implicitly
-get data from, and implicitly deposit results back onto, a single
-stack.  For orthogonality's sake, this stack may contain only ASCII symbols.
-(And note that trying to pop an empty stack, or dequeue an empty queue,
-is an error that aborts the program.)</p>
-
-<p>Note that because we've established that an interpreter (at least,
-insofar as Emmental ever needs to know) is simply a map that takes
-symbols to operations, and that operations in Emmental are defined
-(meta-circularly) as Emmental programs, we can use the following notation
-to describe interpreters:</p>
-
-<pre>
-% → XYZ+*!
-&amp; → 'ap'ag'ag
-</pre>
-
-<p>That is, the symbol <code>%</code>, when encountered in an Emmental
-program, indicates an operation that
-is defined by the Emmental program <code>XYZ+*!</code>, and so forth.</p>
-
-<p>When a main Emmental program begins execution for the first time,
-it starts with what's called the <em>initial Emmental interpreter</em>.
-(This fact, of course, doesn't apply to any further point of execution inside
-an Emmental program, or execution of operations defined in
-Emmental's MCI, since these would be considered subprograms of a sort.
-These cases use whichever interpreter happens to be in force in that point in time.)</p>
-
-<p>The inital Emmental interpreter is defined as follows:</p>
-
-<pre>
-a → a
-b → b
-c → c
-...
-</pre>
-
-<p>That is, for every symbol <var>x</var> in the ASCII set,
-<var>x</var> <code>→</code> <var>x</var>.</p>
-
-<p>Doesn't tell us a lot about Emmental's semantics, does it?  No.
-Nothing at all, really.  But remember what I said about needing an external
-source of understanding, in order to actually get full value out of an MCI.
-And remember the purpose of Emmental's MCI: it is
-not there so much to help us understand Emmental, but to allow us to
-<em>change</em> Emmental, from inside an Emmental program.</p>
-
-<p>And, for all that our description of the initial Emmental interpreter
-is unhelpfully tautological, it is not incorrect: the semantics of
-<code>a</code> can in fact be thought of as being defined by an Emmental
-program that consists of only one instruction, <code>a</code>.  This
-happy state of affairs comes about because Emmental is stack-based;
-the "signature" (the requirements for the "before" and "after" stacks)
-of the symbol <code>a</code> is the same as the signature of the program
-containing the single symbol <code>a</code>.  No extra syntax to specify
-arity and the like is necessary.</p>
-
-<p>Above all, don't panic: we <em>will</em> describe what symbols like
-<code>a</code> actually mean in Emmental, we'll just need to do it in something
-besides Emmental.  In fact, let's do that right now.</p>
-
-<h2>Emmental Semantics (in English)</h2>
-
-<h3>Primitive Arithmetic</h3>
-
-<p><code>#</code> pushes the symbol NUL (ASCII 0) onto the stack.</p>
-
-<p>The symbols <code>0</code>, <code>1</code>, ... <code>9</code>
-pop a symbol from the stack, multiply its ASCII value by 10 modulo 256, add
-the value 0, 1, ... 9 (respectively) to that value modulo 256, and
-push the resulting symbol back onto the stack.</p>
-
-<p>The upshot of these 11 operations is that you can push arbitrary
-symbols onto the stack by spelling out their ASCII values in decimal.
-For example, <code>#64</code> pushes a <code>@</code> onto the stack.</p>
-
-<p><code>+</code> pops two symbols off the stack, adds together their
-ASCII values modulo 256, and pushes the symbol with the resultant ASCII value
-back onto the stack.</p>
-
-<p><code>-</code> pops two symbols off the stack, subtracts the ASCII value
-of the first popped from the ASCII value of the second popped modulo 256,
-and pushes the symbol with the resultant ASCII value back onto the stack.</p>
-
-<p><code>~</code> ("log") pops a symbol from the stack, computes the discrete
-base-2 logarithm of the ASCII value of that symbol, 
-and pushes the symbol with the resultant ASCII value back onto the stack.
-The discrete base-2 logarithm of a number is the floor or integer part
-of the base-2 logarithm of that number.  Alternately, it is the number of
-the highest bit position (starting with the LSB = bit position 0)
-with a bit set when the number is viewed as binary.
-Because the base-2 logarithm of the number 0 itself is undefined,
-the number 0 is treated as 256 for this operation; its discrete base-2
-logarithm is 8.</p>
-
-<h3>Stack and Queue Manipulation</h3>
-
-<p><code>^</code> ("enqueue a copy") pops a symbol off the stack, makes a copy of it, pushes
-it back onto the stack, and enqueues the copy onto the queue.</p>
-
-<p><code>v</code> ("dequeue") dequeues a symbol from the queue and pushes it onto the
-stack.</p>
-
-<p>Using these operations in combination, one can form "discard", "duplicate",
-"swap", and other more advanced stack manipulation operations.  For example,
-assuming an empty queue and more than two elements on the stack, "swap" can
-be accomplished with the code <code>^v^-+^^v^v^v-+^v-+^v-+vv</code>.</p>
-
-<p>Despite this fact, the operation <code>:</code> duplicates the top value
-of the stack.  (Emmental is not an absolutely minimal language; note, for
-instance, that it has all ten decimal digits as operations when these
-could surely have been defined in terms of only one or two operations.
-The reasons for a seperate <code>:</code> operation are given below in the
-section on Computational Class.)</p>
-
-<h3>I/O</h3>
-
-<p><code>.</code> pops a symbol off the stack and sends it to the
-standard output as an ASCII symbol.</p>
-
-<p><code>,</code> waits for an ASCII symbol to arrive on standard input,
-and pushes it onto the stack.</p>
-
-<h3>Interpreter Modification and Reflection</h3>
-
-<p>First let's define what it means to <em>pop a string</em> off the stack.
-Symbols are popped off the stack until a <code>;</code> symbol is
-found on the stack.  The symbols popped off are considered a string
-in the reverse order they were popped; i.e. the last symbol popped
-is the first symbol of the string.  The <code>;</code> symbol is
-popped off the stack, but is not made part of the string; it is simply discarded.</p>
-
-<p>Since an Emmental program is a string, popping a program is the same
-as popping a string, just that the string is interpreted as a program.</p>
-
-<p><code>!</code> (sometimes called "supplant")
-pops a symbol, which we call <var>s</var>, off the stack.
-Then it pops a program <var>t</var>.
-It then inserts the association <var>s</var> <code>→</code> <var>t</var> into the
-interpreter definition.  This overwrites whatever mapping of <var>s</var> might have
-been in the interpreter definition previously.  This new interpreter definition
-is used for all subsequent execution (until it is changed again, of course.)</p>
-
-<p>Note that <code>!</code> does <em>early binding</em>.  That is,
-the meaning of each symbol in this program <var>t</var> is
-the meaning of that symbol <em>at the time <code>!</code> is executed</em>.
-If some subsequent <code>!</code> operation later changes the meaning of one
-of the symbols that occurs in <var>t</var>, the meaning of <var>t</var>
-is not changed.  The semantics of <var>t</var> are "captured" or "frozen".
-This implies, among other things, that supplanting some
-symbol <var>z</var> with itself (a program consisting only of the symbol <var>z</var>)
-is a no-op, because <var>z</var>'s meaning, at the time that <code>!</code>
-is executed, is invariably <var>z</var>.</p>
-
-<p><code>?</code> (sometimes called "eval") pops a symbol, which we call <var>s</var>, off the stack.
-It then executes that symbol (interprets it as an operation)
-with the interpreter currently in effect.</p>
-
-<p>Note that <code>?</code> does <em>late binding</em>.  That is,
-in contrast with <code>!</code>, <code>?</code> never "freezes" the semantic
-definition of the thing that it is executing.  This is true even when
-<code>?</code> occurs in a operation redefinition (i.e. the program
-that supplanted some symbol's semantics when an <code>!</code> was executed.)
-This implies, among other things, that supplanting some symbol <var>z</var>
-with the program that consists of instructions that push the ASCII value of
-<var>z</var> onto the stack, followed by a <code>?</code> instruction,
-creates a <em>cyclic meaning</em> for <var>z</var>.
-This is because the <var>z</var> that will be executed by the <code>?</code>
-will always be the present <var>z</var>, that is, the <var>z</var>
-that is executing the <code>?</code>.</p>
-
-<p>For convenience's sake, <code>;</code> pushes the symbol <code>;</code>
-onto the stack.</p>
-
-<p>All other symbols are no-ops.</p>
-
-<h2>Computational Class</h2>
-
-<p>I believe Emmental is Turing-complete with only the operations that
-have been given so far, but I haven't proved it yet.
-All the elements are there, and although some of them are somewhat "cramped",
-they look viable.</p>
-
-<p>If you want to try thinking about how you could write real programs
-(like a universal Turing-machine simulator) in Emmental, you might want
-to skip this section, since it contains "spoilers".</p>
-
-<p>Repetition can be accomplished by assigning a symbol a cyclic semantics,
-by use of a <code>?</code> within a <code>!</code>.  For example, we can
-redefine the semantics of <code>0</code> to be <code>#48?</code>.  This is
-simply a program that pushes the symbol <code>0</code> onto the stack and
-executes it with the current interpreter, and, since <code>0</code> has been
-redefined to mean <code>#48?</code> in the current interpreter, this will
-loop forever.  The entire program to do this to <code>0</code> and run the
-infinite loop is:</p>
-
-<pre>
-;#35#52#56#63#48!0
-</pre>
-
-<p>This technique can also be used to "jump" from one definition to
-another, by using <code>?</code> to execute some <em>other</em> symbol
-at the end of a definition (that is, some symbol other than the symbol
-being defined.)</p>
-
-<p>Conditionals are a little more restrictive.  The trick to them is,
-strangely, the discrete log operator <code>~</code> in combination
-with the eval operator <code>?</code>.  Since <code>~</code> maps
-all symbols into a set of nine symbols, and <code>?</code> executes
-the symbol on the stack, <code>~?</code> will execute one of the
-symbols from ASCII 0 (NUL) to ASCII 8 (BS).  We can then, for instance,
-define NUL to do one thing, define SOH through BEL to do the same
-as NUL, and define BS to do some other thing; this essentially
-distinguishes between 0 (which executes BS) and every other value
-(which executes NUL).  Further, we can use this in conjunction
-with <code>-</code> to compare two values for equality.  So, for
-example, a program which inputs a character, and outputs Y if
-the character is M and N otherwise, would be:</p>
-
-<pre>
-#59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~?
-</pre>
-
-<p>In case NUL through BS are in use for some reason, we can always
-add 9 to the result of the log (<code>~#9+?</code>)
-to map the answer onto HT through DC1.  Or, of course, any of
-a great number of other arithmetical mappings of our choosing.
-The most severe constraint is that there be 9 available symbols
-to act as "destinations" for our "branch".  Even if we never
-overwrite one "branch" with another (and we can do that in Emmental!)
-and even if we allocate one extra symbol to be the "launch point"
-of the branch, we still have room for 25 branches in the ASCII
-character set.</p>
-
-<p>So these parts look good.  If there's a problem, it's
-with the queue.  Specifically, the problem seems to be
-the need to know the present size of the queue in order to do stack housework
-like "duplicate" and the subsequent need for "duplicate" to achieve
-"discard."  (Duplicate can be defined as <code>^v</code>, but this only
-works when the queue is empty.  Discard can be defined as duplicate plus
-<code>-+</code>, but this only works when there are other elements below
-the element being discarded.  [This last point is not generally a problem
-since we can push arbitrary values onto the stack before any
-program.])</p>
-
-<p>However, if it turns out that we need "duplicate" or "discard" in order
-to write routines that can handle a variable-sized queue — and that
-strikes me as likely — then it looks like we have a severe problem.</p>
-
-<p>Here's one way I could try to deal with it.  I could say that the queue
-is <em>local</em> to the operation being defined (or the main program.)
-Then you could define <code>:</code> to be <code>^v</code>, and inside
-<code>:</code>'s definition, the queue would always initially be empty,
-so the definition would work.</p>
-
-<p>But... we need the queue to store our global data.  For example, if
-we are going to simulate a Turing machine, we'd need to use the queue
-to store the tape (perhaps "doubled up", with one symbol of each
-pair telling us "the next symbol is a simulated tape symbol" or
-"the next symbol is some housekeeping value.")  We can't store the
-tape on just one stack.  And, once you are looping in Emmental, you've
-left the "main program" forever; you're jumping from definition to
-definition, and each has their own queue.  At best, you'd need to
-"dump" the queue onto the stack each time you switched definitions,
-and even then you'd need a loop to do that, and to loop you need to
-switch definitions.  It's a royal mess.</p>
-
-<p>So here's how I will deal with it.  I will add a primitive duplicate
-operation, <code>:</code>, to Emmental.  Proving that Emmental is Turing-complete
-is still, then, a challenge, although a doable-seeming challenge.
-I will then propose a more formidable challenge: prove that
-the language formed by removing the <code>:</code> operation from
-Emmental is Turing-complete.  (Equivalently, prove that the set of
-Emmental programs that begin with <code>;#0#58!</code> is Turing-complete.
-The nice thing about Emmental is that you can always shoot yourself in
-the foot — until you erase your pistol, that is.)</p>
-
-<p>And if you <em>really</em> like a challenge, try proving that Emmental
-without <code>~</code> is Turing-complete.  I don't think that it is,
-although it's possible for it to compute parity, at least (input a
-symbol, output E if its ASCII value is even, and O if it's odd.
-To accomplish this, multiply the input's ASCII value by 128 by adding 127
-copies of it to it; this is modulo 256, so the only results can be
-0 or 128.  Define those operations to print out E and O respectively.
-But that's as far as I've gotten.)</p>
-
-<h2>Discussion</h2>
-
-<h3>Design Decisions</h3>
-
-<p>I would've liked to have given Emmental a <code>'</code> or <code>"</code>
-instruction similar to Funge's "quote" and "quote-mode" instructions;
-instructions that treat one or more of the following symbols in the program
-literally, pushing them, as symbols, onto the stack, instead of executing them.
-However, such instructions are somewhat problematic, both theoretically and
-(for the approach I took implementing Emmental) practically.  There are
-two ways of thinking about the problems that arise.</p>
-
-<p>One is that the function which implements <code>'</code> is given
-access to the program text itself, and possibly the position within the program,
-and it uses these to extract the "immediate mode" symbol past the <code>'</code>.  This information
-could be available because these pieces of information are considered extra arguments
-to the function, or because they are (gorily) considered part of the overall
-program state.  Either way, this operation is given a lot of information to work with,
-and for consistency (since we want to be all nice and neat and say that all operations
-have the same signature so that our dictionary has a nice uniform type,) <em>all</em> operations
-have access to this information.  This is almost too much information; that is, it
-is so much that operations don't really <em>need</em> the dictionary.  We could just say
-there is <em>one</em> operation, defined by a function, and that function is given the
-current symbol and has to decide what it means through whatever means it likes.</p>
-
-<p>This approach is very powerful, of course, but it's just not the style that
-Emmental embodies.  (In fact, the idea to view interpreters as dictionaries
-was one of the foundational design choices for Emmental, to the point where I
-started constructing a "little theory of interpreters as maps."  It really
-wasn't exploited as much as I think it could have been.  If an interpreter is
-a map of symbols to strings of symbols, it's much more tractable than an
-opaque function would be; you can define all sorts of operations on them,
-for example concatenating two interpreters (for all symbols <var>s</var>
-in interpreter <var>a</var> and interpreter <var>b</var>,
-<var>c</var>[<var>s</var>] <code>→</code> <var>a</var>[<var>s</var>]<var>b</var>[<var>s</var>] —
-that sort of thing,) computing union or intersection of interpreters,
-Cartesian product, etc.)</p>
-
-<p>The other way of looking at it is to say that there are in fact
-<em>multiple</em> meta-circular interpreters available inside Emmental, and symbols
-like <code>'</code> switch temporarily to an alternate MCI.  This alternate MCI
-interprets every symbol as "push this symbol", then reinstates the previous MCI.
-I like this explication better than the one above — MCIs begin
-to look a bit like continuations! — but to do it justice would take some
-work.  I envision a language where the program has fine control
-over which MCI is in effect, possibly by keeping a map from symbols to MCIs,
-or maybe even being able to push MCIs onto the stack.
-This is a wee bit much for Emmental too though, and
-perhaps I'll explore these possibilities in a future language.</p>
-
-<h3>Turing-completeness</h3>
-
-<p>You can make the argument that Emmental's way of being Turing-complete
-is really nothing new: when you redefine some symbol, you're really just
-defining a new function, and when you use <code>?</code> to execute that
-symbol from within its own definition, you're just making a recursive
-function call.</p>
-
-<p>Well, yes, you can make that argument.  But it has to do with how you think
-about "what is a language", I think.  Does a Pascal program fragment which
-defines a procedure called <code>PrintFibonacci</code> represent another
-programming language, one different from Pascal?  You could certainly
-say that it does — it's
-the language Pascal where the token <code>PrintFibonacci</code> has
-some meaning that it doesn't have in Pascal.</p>
-
-<p>In that view, any language where you can define procedures, or
-functions, or standard libraries, or the like, is an extensible language.
-But even languages where you <em>can't</em> define new procedures or
-functions is arguably an extensible language.  Take some initial Brainfuck
-program fragment, for instance.  After it executes, it leaves the Brainfuck
-tape and while-stack in some state that depends on the input.
-Any Brainfuck fragment that executes after that, will execute in that
-environment, and that environment is arguably a version of the language
-Brainfuck, suitably extended.</p>
-
-<p>You don't normally think of it that way, I bet, but you
-<em>could</em> — and you would need to, to some degree,
-to claim that Emmental is "just" defining new functions.
-The reason you don't typically look at languages like this
-(unless you are very strange) is because it's much more useful to
-divide the world into "languages" and "programs."  And Emmental
-<em>does</em> make this division, it just makes it in a slightly
-different place than usual.</p>
-
-<p>As far as I'm concerned, if I describe what Emmental does as
-modifying the Emmental language via its MCI, and what Emmental actually does is
-consistent with the idea of modifying the Emmental language via its MCI, then
-what Emmental effectively does is modify the Emmental language via its
-MCI.  And if it needs to do this in a certain way in order to simulate
-a universal Turing machine, then that difference (however slight)
-sets it apart from systems where this simulation needs to be done by defining
-recursive functions.</p>
-
-<h2>Implementation</h2>
-
-<p><code>emmental.hs</code> is a reference interpreter for Emmental
-written in Haskell.  Run the function <code>emmental</code> on a
-string; you can also run <code>debug</code> on a string to view
-the state of the program (stack &amp; queue) during execution.
-(Note that <code>debug</code> is <em>not</em> able to show
-program states that occur internal to an operation.)</p>
-
-<p>Happy interpreter-redefining!
-<br/>Chris Pressey
-<br/>Chicago, IL
-<br/>November 11, 2007</p>
-
-</body>
-</html>

File src/emmental.hs

File contents unchanged.