Commits

Serge A. Zaitsev  committed 934e35d

added blog posts about cucu

  • Participants
  • Parent commits 160a320

Comments (0)

Files changed (3)

File input/blog/cucu-part1.mkd

+title: cucu: a compiler you can understand (1/3)
+post: a compiler you can unserstand (1/3)
+description: Compilers is fun. Want to write your own one?
+date: 2012-10-23
+---
+
+cucu: a compiler you can understand (part 1)
+=================================================
+
+Let talk about the compilers. Have you ever thought of writing your own one?
+
+I will try to show you how simple it is. The first part will be pretty much
+theoretical, so keep patience.
+
+what we're going to achieve?
+----------------------------
+
+CUCU is a toy compiler for a toy language. I want it to be as close to ANSI C 
+as possible, so that every valid CUCU program could be compiled with a C
+compiler without any errors. Of course, the support of the whole ANSI C
+standard is very difficult, so I picked a very small C language subset. 
+
+For example, here's a valid CUCU code snippet:
+
+	int cucu_strlen(char *s) {
+		int i = 0;
+		while (s[i]) {
+			i = i + 1;
+		}
+		return i;
+	}
+
+grammar
+-------
+
+We're about to define a grammar for our language. It's an important step, 
+because everything in our compiler design depends on it.
+
+So, let's go from top to bottom. Our source file contains a **program**.
+What is a program? We know it's a list of **variable declarations**, **function
+declarations** and **function definitions**, e.g:
+
+	int func(char *s, int len); /* function declaration */
+	int i;                      /* variable declaration */
+
+	int func(char *s, int len) { /* function definition */
+		...
+	}
+
+Let's try to write it down in [EBNF][1] form (it's absilutely ok, if you don't
+know what EBNF is, it's really intuitive):
+
+	<program> ::= { <var-decl> | <func-decl> | <func-def> } ;
+
+This notation says: "a program is a repeating sequence of variable declarations,
+function declarations and function definitions. But what is all those
+declarations and definitions? Ok, let's go deeper:
+
+	<var-decl> ::= <type> <ident> ";"
+	<func-decl> ::= <type> <ident> "( <func-args> ")" ";"
+	<func-def> ::= <type> <ident> "(" <func-args> ")" <func-body>
+	<func-args> ::= { <type> <ident> "," }
+	<type> ::= "int" | "char *"
+
+So, variable declaration is simple: it's a type name followed by the identifier,
+and followed by a semicolon, like we usually do in C, e.g.:
+
+	int i;
+	char *s;
+
+Function declaration is a bit more complicated. It's a "type + identifier", 
+and an optional list of function arguments `<func-args>` inside the braces.
+
+Function arguments list, in it's turn, is a sequence of comma-separated 
+"type + identifier", like:
+
+	char *s, int from, int to
+
+_Actually, trailing comma in CUCU is still allowed, but not required. It will
+just simplify our parser code._
+
+The supported data types are only `int` and `char *`. Identifier is a sequence
+of letters, digits and an underscore symbol.
+
+The only thing left is `<func-body>`. But let's talk about **statements** and 
+**expressions** first.
+
+*Statement* is a smallest standalone element of the language. Here are valid 
+statements in out language:
+
+	/* These are simple statements */
+	i = 2 + 3; /* assignment statement */
+	my_func(i); /* function call stament */
+	return i; /* return statement */
+
+	/* These are compound statements */
+	if (x > 0) { .. } else { .. }
+	while (x > 0) { .. }
+
+*Expression* is a smaller part of the statement. As opposed to stamenents,
+expressions always return a value.  Usually, it's just the arithmetics. For
+example in the statement `func(x[2], i + j)` the expressions are `x[2]` and
+`i+j`.
+
+So, looking back at `<func-body>`. It's just a valid statement (which is
+usually a block statement).
+
+Here's what we have:
+
+	<func-body> ::= <statement>
+	<statement> ::= "{" { <statement> } "}"                /* block statement */
+	                | [<type>] <ident> [ "=" <expr> ] ";"  /* assignment */
+									| "return" <expr> ";"
+									| "if" "(" <expr> ")" <statement> [ "else" <statement> ]
+									| "while" "(" <expr> ")" <statement>
+									| <expr> ";"
+
+Here are possible expressions in CUCU language:
+
+	<expr> ::= <bitwise-expr> 
+	           | <bitwise-expr> = <expr>
+	<bitwise-expr> ::= <eq-expr>
+	                   | <bitwise-expr> & <eq-expr>
+										 | <bitwise-expr> | <eq-expr>
+	<eq-expr> ::= <rel-expr>
+	              | <eq-expr> == <rel-expr>
+								| <eq-expr> != <rel-expr>
+	<rel-expr> ::= <shift-expr>
+	               | <rel-expr> < <shift-expr>
+	<shift-expr> ::= <add-expr>
+	                 | <shift-expr> << <add-expr>
+									 | <shift-expr> >> <add-expr>
+	<add-expr> ::= <postfix-expr>
+	               | <add-expr> + <postfix-expr>
+								 | <add-expr> - <postfix-expr>
+	<postfix-expr> ::= <prim-expr>
+	                   | <postfix-expr> [ <expr> ]
+										 | <postfix-expr> ( <expr> { "," <expr> } )
+	<prim-expr> := <number> | <ident> | <string> | "(" <expr> ")"
+
+That's it. Did you notice the recursion in the expression notation?  Basically,
+the notation above shows us the precedence of the operators from bottom to top:
+parens and square brackets are evaluated first, and assignment goes the last.
+
+For example, according to this grammar an expression `8>>1+1`
+will be evaluated to 2 (like in `8>>(1+1)`), not to 5 (like in `(8>>1)+1`),
+because `>>` has lower prece than `+`.
+
+lexer
+-----
+
+Now we are done with grammar, and are ready to start. The first thing to do is
+a lexer. Our compiler takes a stream of bytes as an input, and lexer allows to
+split that stream into smaller tokens, that can be processed later. It gives us
+some level of abstraction and simplifies out parser.
+
+For example, a sequance of bytes "int i = 2+31;" will be split into tokens:
+
+	int
+	i
+	=
+	2
+	+
+	31
+	;
+
+_In normal grown-up lexers every lexeme (token) has a type and a value, so 
+instead of the list above we will get a list of pairs: `<TYPE:int>,<ID:i>,
+<ASSIGN:=>,<NUM:2>,<PLUS:+>,<NUM:31>,<SEMI:;>`. We are going to detect lexeme
+type by its value, which is not academical at all!_
+
+The major problem with the lexer is that once a byte is read from the stream -
+it can not be "un-read". And what if we've read a byte that can not be added to
+the current token? Where should we store it until the current token is
+processed?
+
+Almost every lexer need to read ahead. Our grammar is simple enough, so all we 
+need to have is a single byte buffer - `nextc`. It stores a byte, that was read
+from the stream, but has not yet been pushed to the token string.
+
+Also, I need to warn you - I use global variables a lot in CUCU code. I know
+it's a bad style, but if I pass all values as function arguments the compiler
+would loose it's simplicity.
+
+The whole lexer is just a single function `readtok()`. The algorithm is simple:
+
+* skip leading spaces
+* try to read an identifier (a sequence of letters, digits and `_`)
+* if it's not an identifier - try to read a sequence of special operators, like 
+	`&, |, <, >, =, !`.
+* if it's not an operator - try to read a string literal "...." or '....'
+* if failed - maybe it's a comment, like `/* ... */`?
+* if failed again - just read a single byte. It might be another single-byte
+	token, like "(" or "[".
+
+Here's the whole CUCU sources we've got so far:
+
+	#include <stdio.h> /* for vpritnf */
+	#include <stdarg.h> /* for va_list */
+	#include <stdlib.h> /* for exit() */
+	#include <ctype.h> /* for isspace, isalpha... */
+
+	#define MAXTOKSZ 256
+	static FILE *f; /* input file */
+	static char tok[MAXTOKSZ];
+	static int tokpos;
+	static int nextc;
+
+	void readchr() {
+		if (tokpos == MAXTOKSZ - 1) {
+			tok[tokpos] = '\0';
+			fprintf(stderr, "token too long: %s\n", tok);
+			exit(EXIT_FAILURE);
+		}
+		tok[tokpos++] = nextc;
+		nextc = fgetc(f);
+	}
+
+	void readtok() {
+		for (;;) {
+			while (isspace(nextc)) {
+				nextc = fgetc(f);
+			}
+			tokpos = 0;
+			while(isalnum(nextc) || nextc == '_') {
+				readchr();
+			}
+			if (tokpos == 0) {
+				while (nextc == '<' || nextc == '=' || nextc == '>'
+						|| nextc == '!' || nextc == '&' || nextc == '|') {
+					readchr();
+				}
+			}
+			if (tokpos == 0) {
+				if (nextc == '\'' || nextc == '"') {
+					char c = nextc;
+					readchr();
+					while (nextc != c) {
+						readchr();
+					}
+					readchr();
+				} else if (nextc == '/') {
+					readchr();
+					if (nextc == '*') {
+						nextc == fgetc(f);
+						while (nextc != '/') {
+							while (nextc != '*') {
+								nextc = fgetc(f);
+							}
+							nextc = fgetc(f);
+						}
+						nextc = fgetc(f);
+					}
+				} else if (nextc != EOF) {
+					readchr();
+				}
+			}
+			break;
+		}
+		tok[tokpos] = '\0';
+	}
+
+	int main() {
+		f = stdin;
+		nextc = fgetc(f);
+
+		for (;;) {
+			readtok();
+			printf("TOKEN: %s\n", tok);
+			if (tok[0] == '\0') break;
+		}
+		return 0;
+	}
+
+If we put a C file as the compiler input - it will print us a list of tokens, 
+one per line.
+
+Ok, have a cup of coffee, and let's [go further &rarr;][2]
+
+_You can check [part3](/blog/cucu-part3.html) to know how the story ended._
+
+Posted on {{ page.date }} {{ rss }}
+
+{{ disqus }}
+
+[1]: http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form
+[2]: /blog/cucu-part2.html
+

File input/blog/cucu-part2.mkd

+title: cucu: a compiler you can understand (2/3)
+post: a compiler you can unserstand (2/3)
+description: Compilers is fun. Want to write your own one?
+date: 2012-10-24
+---
+
+cucu: a compiler u can understand (part&nbsp;2)
+==============================================
+
+So far, we have defined language grammar and have written a lexer. In this part 
+we will write a parser for our language. Before we start, we need some helper 
+functions:
+
+	int peek(char *s) {
+		return (strcmp(tok, s) == 0);
+	}
+
+	int accept(char *s) {
+		if (peek(s)) {
+			readtok();
+			return 1;
+		}
+		return 0;
+	}
+
+	int expect(char *s) {
+		if (accept(s) == 0) {
+			error("Error: expected '%s'\n", s);
+		}
+	}
+
+`peek()` returns non-zero value if the next token is equal to the given string.
+`accept()` reads the next token, if it's equal to the given string, otherwise it
+ returns 0. And `expect()` helps us to check language syntax.
+
+the harder part
+---------------
+
+As you can see from the language grammar, stataments and various expression 
+types are strongly interconnected. It means we have to write all parser 
+functions at once, keeping in mind the recursion. Let's go again from top
+to bottom. Here's our top-level compiler() functions:
+
+	static int typename();
+	static void statement();
+
+	static void compile() {
+		while (tok[0] != 0) { /* until EOF */
+			if (typename() == 0) {
+				error("Error: type name expected\n");
+			}
+			DEBUG("identifier: %s\n", tok);
+			readtok();
+			if (accept(";")) {
+				DEBUG("variable definition\n");
+				continue;
+			} 
+			expect("(");
+			int argc = 0;
+			for (;;) {
+				argc++;
+				typename();
+				DEBUG("function argument: %s\n", tok);
+				readtok();
+				if (peek(")")) {
+					break;
+				}
+				expect(",");
+			}
+			expect(")");
+			if (accept(";") == 0) {
+				DEBUG("function body\n");
+				statement();
+			}
+		}
+	}
+
+It reads type name, then an indentifier. If it's followed by a semicolon -
+it's a variable declaration. If it's followed by a paren - it's a function.
+Function scans function arguments one by one, and if function is not
+followed by a semicolon - it's a definition (function with a body), otherwise - 
+it's just a declaration (just function name and prototype).
+
+Here, `typename()` is function that just skips the valid type name. We accept
+only `int` and `char` and varoius pointers to them (`char *`):
+
+	static int typename() {
+		if (peek("int") || peek("char")) {
+			readtok();
+			while (accept("*"));
+			return 1;
+		}
+		return 0;
+	}
+
+The most interesting part is the `statement()` function. It parses a single 
+statement, which can be a block, a local variable definition/declaration, 
+a `return` statement etc. Here how it should look like:
+
+	static void statement() {
+		if (accept("{")) {
+			while (accept("}") == 0) {
+				statement();
+			}
+		} else if (typename()) {
+			DEBUG("local variable: %s\n", tok);
+			readtok();
+			if (accept("=")) {
+				expr();
+				DEBUG(" :=\n");
+			}
+			expect(";");
+		} else if (accept("if")) {
+			/* TODO */
+		} else if (accept("while")) {
+			/* TODO */
+		} else if (accept("return")) {
+			if (peek(";") == 0) {
+				expr();
+			}
+			expect(";");
+			DEBUG("RET\n");
+		} else {
+			expr();
+			expect(";");
+		}
+	}
+
+So, if it's a block `{ .. }` - just read statements until end of block is met.
+If it starts with a type name - it's a local variable. Conditional statements 
+("if/then/else") and loops are just stubs for now. Think of how you would
+implement them according to the grammar we use.
+
+Anyway, most of the statement contain expressions inside. So, we need to make a
+function that parses an expression. Expression parser is a recursive descent 
+parser, so it's a number of functions that call each other recursively until 
+primary expression is found. Primary expression as we can see from the grammar
+is a number (constant) or an indentifier (variable or function).
+
+	static void prim_expr() {
+		if (isdigit(tok[0])) {
+			DEBUG(" const-%s ", tok);
+		} else if (isalpha(tok[0])) {
+			DEBUG(" var-%s ", tok);
+		} else if (accept("(")) {
+			expr();
+			expect(")");
+		} else {
+			error("Unexpected primary expression: %s\n", tok);
+		}
+		readtok();
+	}
+
+	static void postfix_expr() {
+		prim_expr();
+		if (accept("[")) {
+			expr();
+			expect("]");
+			DEBUG(" [] ");
+		} else if (accept("(")) {
+			if (accept(")") == 0) {
+				expr();
+				DEBUG(" FUNC-ARG\n");
+				while (accept(",")) {
+					expr();
+					DEBUG(" FUNC-ARG\n");
+				}
+				expect(")");
+			}
+			DEBUG(" FUNC-CALL\n");
+		}
+	}
+
+	static void add_expr() {
+		postfix_expr();
+		while (peek("+") || peek("-")) {
+			if (accept("+")) {
+				postfix_expr();
+				DEBUG(" + ");
+			} else if (accept("-")) {
+				postfix_expr();
+				DEBUG(" - ");
+			}
+		}
+	}
+
+	static void shift_expr() {
+		add_expr();
+		while (peek("<<") || peek(">>")) {
+			if (accept("<<")) {
+				add_expr();
+				DEBUG(" << ");
+			} else if (accept(">>")) {
+				add_expr();
+				DEBUG(" >> ");
+			}
+		}
+	}
+
+	static void rel_expr() {
+		shift_expr();
+		while (peek("<")) {
+			if (accept("<")) {
+				shift_expr();
+				DEBUG(" < ");
+			}
+		}
+	}
+
+	static void eq_expr() {
+		rel_expr();
+		while (peek("==") || peek("!=")) {
+			if (accept("==")) {
+				rel_expr();
+				DEBUG(" == ");
+			} else if (accept("!=")) {
+				rel_expr();
+				DEBUG("!=");
+			}
+		}
+	}
+
+	static void bitwise_expr() {
+		eq_expr();
+		while (peek("|") || peek("&")) {
+			if (accept("|")) {
+				eq_expr();
+				DEBUG(" OR ");
+			} else if (accept("&")) {
+				eq_expr();
+				DEBUG(" AND ");
+			}
+		}
+	}
+
+	static void expr() {
+		bitwise_expr();
+		if (accept("=")) {
+			expr();
+			DEBUG(" := ");
+		}
+	}
+
+It's a big piece of code, but don't be afraid - it's really simple.
+Every function that parses expression type first tries to call a
+more prioritized expression parser. Then, if an expected operator is found - 
+it calls more prioritized expression parser again. Now it has parsed both
+parts of a binary expression (like x+y, or x&y, or x==y), so it can perform
+an operation and return. Some expression can be "chained" (like a+b+c+d), so 
+we parse them with loops.
+
+We put debug output after every expression parser function. This will give us 
+an interesting result. For example, if we parse this piece of code:
+
+	int main(int argc, char **argv) {
+		int i = 2 + 3;
+		char *s;
+		func(i+2, i == 2 + 2, s[i+2]);
+		return i & 34 + 2;
+	}
+
+we will get this ouput:
+
+	identifier: main
+	function argument: argc
+	function argument: argv
+	function body
+	local variable: i
+	 const-2  const-3  +  :=
+	local variable: s
+	 var-func  var-i  const-2  +  FUNC-ARG
+	 var-i  const-2  const-2  +  ==  FUNC-ARG
+	 var-s  var-i  const-2  +  []  FUNC-ARG
+	 FUNC-CALL
+	 var-i  const-34  const-2  +  AND RET
+
+All our expressions are written in a postfix form (instead of `2+3` it's `2 3
++`).  This is a natural form for stack machines, when operands are placed on
+the stack, then a function called pops up the operands, processes them and puts
+the result back on the stack.
+
+Though it might not be an optimal architecture for most modern CPUs, which are
+register-based, it's still very simple and fits our compiler needs.
+
+symbols
+-------
+
+Ok, we are good. We've got a lexer and a parser in less than 300 lines of code.
+What we need to do is to add some functions to work with the symbols (like
+variable names, or functions). A compiler should have a table of symbols to
+quickly find their addresses, so when you write "i = 0" - it means put zero
+into the location at address 0x1234 in RAM (if symbol "i" has address 0x1234 in
+memory).
+Also, when you call "func()" it means - jump to address 0x5678 (if symbol "func"
+has value of 0x5678).
+
+We use the following structure for symbols:
+
+	struct sym {
+		char type;
+		int addr;
+		char name[];
+	};
+
+Here `type` has special meaning. We use a single-letter codes to detect symbol 
+type:
+
+* `L` - is a local variable. `addr` stores variable location on the stack
+* `A` - function argument. `addr` also stores the location on the stack
+* `U` - undefined global variable. `addr` stores absolute address in RAM.
+* `D` - defined global valiable. Same as above.
+
+So far, I've added two functions: `sym_find(char *s)` to find symbol by its 
+name, and `sym_declare()` to add a new symbol.
+
+Now we're are ready to develop [backend architecture &rarr;][1]
+
+_Check [part 1](/blog/cucu-part1.html) if you missed something_
+
+Posted on {{ page.date }} {{ rss }}
+
+{{ disqus }}
+
+[1]: /blog/cucu-part3.html

File input/blog/cucu-part3.mkd

+title: cucu: a compiler you can understand (3/3)
+post: a compiler you can unserstand (3/3)
+description: Compilers is fun. Want to write your own one?
+date: 2012-10-25
+---
+
+Let's talk about compiler backends. C should be a portable language, and there
+is no need to rewrite the whole compiler if you want to port it to the new CPU
+architecture.
+
+Backend is a part of the compiler that generates low-level byte code. Compiler
+itself just calls backend functions. Good backend design makes the compiler
+highly portable.
+
+I wanted CUCU to be a portable compiler (actually, a cross-compiler).
+So, I deciced to move backend code generator to a separate module.
+
+But before we dive into the backend code generation, let's think of how we will
+test it.
+
+minimal target architecture
+---------------------------
+
+Our minimal target has two registers (let's call them A and B), and a stack.
+Register A is an accumulator. We could use fixed-size instruction codes, as
+many RISC CPUs do, but there's not much fun in decoding hexadecimal numbers to
+find out what the code actually does.
+
+I have chosen a simpler way. Every instruction is 8 bytes long (yes, it's huge,
+but who cares - it's a test imaginary architecture). And the first 7 bytes of
+the instrction are just ASCII symbols, and the last one is 0x10 ('\n').
+
+This allows us to use human-readable instruction codes, like `A:=A+B`,
+`A:=1ef8`, or `push A`. These seem to be self-explanatory ("add register B 
+to register A", "put 0x1ef8 to register A" and "push the value of register A
+to the stack").
+
+* `A:=NNNN` - put value of 0xNNNN to register A
+* `A:=m[A]` - put value at address stored in register A to register A (as byte)
+* `A:=M[A]` - put value at address stored in register A to register A (as int)
+* `m[B]:=A` - store the value of A to address stored in B (as byte)
+* `M[B]:=A` - store the value of A to address stored in B (as int)
+* `push A` - push the value of A on the stack
+* `pop B` - pop the valud from the stack to B
+* `A:=B+A` - add A and B
+* `A:=B-A` - subtract A and B
+* `A:=B&A` - bitwise AND operation
+* `A:=B|A` - bitwise OR operation
+* `A:=B!=A` - A is 1 if B!=A, and 0 otherwise
+* `A:=B==A` - A is 1 if B==A, and 0 otherwise
+* `A:=B<<A` - shift left the value of B to A bits
+* `A:=B>>A` - shift right the value of B to A bits
+* `A:=B<A` - A is 1 if B&lt;A, and 0 othersize
+* `popNNNN` - drop NNNN items from the stack
+* `sp@NNNN` - put the value at address NNNN on the stack to the register A
+* `jmpNNNN` - jump to address NNNN
+* `jpzNNNN` - jump to address NNNN if A is zero
+* `call A` - call function at address stored in A
+* `ret`    - return from the function
+
+cucu backend design
+-------------------
+
+We include "gen.c" file, which is actually a backend implementation.
+Let's start with simple functions: `gen_start()` and `gen_finish()`.
+They are used to generate application prologue (like PE header, or ELF header)
+and to post-process byte code.
+
+Compiler provides the function `emit()`, that emits byte code to the `code[]`
+array. At the very end, `code[]` conatins a ready-to-use compiled program.
+
+So, compiler calls backend function, and backend just calls emit() with the
+specific byte-codes, and this is how we get compiled machine code.
+
+Now we need to define what are the most common constructions, that backend 
+should implement. Let's start with this simple program:
+
+	int main() {
+		return 0;
+	}
+
+Let's talk about calling convention. This is how arguments are passed to the 
+function and how the return value is fetched. We stated before, that arguments
+are placed on the stack (the 1st argument is pushed 1st). Let's make a deal, 
+that the value of the register A when function returns is its return value.
+
+Actually, we will store all values to register A. Register B will be used as
+a temporary register.
+
+For the program above I would expect the byte code to be something like:
+	
+	A:=0000
+	ret
+
+So, we need a function to put immediate numeric value to the register A, and a 
+function to do the return. We will call them `gen_const(int)` and `gen_ret()`.
+
+`gen_const` will be called every time the compiler finds a primary expression 
+which is a number, and `gen_ret` is called when the compiler finds a return
+statement. Though, some functions can be `void`, and thus have no explicit
+`return` statement. So, for safety and simplicity we will add an extra
+`get_ret()` at the end of every function, even if there has been an explicit
+return before.
+
+_Our compiler never claimed to be optimal or fast or safe, so double-return is
+fine for us_
+
+maths
+-----
+
+Now let's compile some arithmetic expressions. They are all similar, so I'll
+show how it's done with an example of addition. Remember how parser works? It
+parses (and theoretically, compiles) the left part of the expression, then the
+right part, and then performs an operation.
+
+This is how a typical math expression is compiled (remember a joke about an
+elefant, a giraffe and a fridge?):
+
+	..evaluate left part
+	push A
+	..evaluate right path
+	pop B
+	A:=A+B
+
+After we compiler the left part we need to store the results (register A)
+somewhere. Stack is the most simple way to do it. So, an expression 
+`1+2+3` will be compiled to:
+
+	A:=0001  -+     -+
+	push A    |      |
+	A:=0002   | 1+2  |
+	pop B     |      |
+	A:=A+B   -+      | +3
+	push A           |
+	A:=0003          |
+	pop B            |
+	A:=A+B       ----+
+
+other stuff
+-----------
+
+Work with symbols is easy, too.
+
+To call a function we put its address to register A,
+then to a `gen_call()` which emits `call A`.
+
+Accessing local symbols is done using `gen_stack_addr` which
+return the address of the given item on the stack.
+
+Accessing global symbols is done using `gen_sym_addr()`.
+Also, when a new symbols is created the compiler might need to 
+generate some code (e.g. when generation assembler code).
+`gen_sym` is used for such cases.
+
+`gen_pop` drops N items from the stack (increases stack pointer).
+
+`gen_unref` is used to unreference pointers. Depending on the type (byte or int)
+it generates `A:=m[A]` or `A:=M[A]` code.
+
+`gen_array` puts the constant array on the stack. Or maybe if there is a
+separate segment for data it should store the array there.
+
+Finally, `gen_patch()` is used to patch jump address when compiling if/while
+statement. The reason is that when we meet such statement we must generate a 
+jump instruction, but the address is not known yet - it depends on how many code
+is compiled for the body statetment. So, after the body is compiled
+we patch jump address with the correct value.
+
+We are done. Let's try the following program:
+
+	int main() {
+		int k;
+		k = 3;
+		if (k == 0) {
+			return 1;
+		}
+		return 0;
+	}
+
+	jmp0008 # by gen_start(): jump to main(), which is the next instruction at 0x8
+	push A  # add space for local variable "k"
+	sp@0000 # get the address of the local variable #0 ("k")
+	push A  # store it
+	A:=0003 # put 3 to A
+	pop B   # get the stored earlier address of "k"
+	M[B]:=A # put the value of A to "k" as int
+	sp@0000 # get the address of "k"
+	A:=M[A] # get its value as int
+	push A  # store it
+	A:=0000 # put 0 to A
+	pop B   # get the value of "k" stored earlies
+	A:=B==A # compare A and B ("k" and zero)
+	jmz0090 # if false (A!=B, k!=0) - jump to 0x90
+	A:=0001 # store 1 to A as return value
+	pop0001 # free space allocated for "k" on the stack
+	ret     # and return
+	jmp0090 # "else" branch should be here, instead jump to 0x90 (next instruction)
+	A:=0000 # store 0 to A as return value
+	pop0001 # free space allocated for "k" on the stack
+	ret     # and return
+	ret     # remember about double-return for safery? ;)
+
+Yeah, the code is so dirty and bloated. But it works. And which is more
+important, you know now how compilers work and how create your own one.
+
+But I should warn you...
+
+warning
+-------
+
+Never, please, never do it this way! If you want to write your own compiler -
+use real grown-up tools:
+	
+* flex/lex/jlex/...
+* yacc/bison/cup...
+* ANTLR
+* Ragel
+* and many others
+
+Also, it's worth checking real literature, like [Dragon Book][1]. Maybe the
+courses from [coursera.org][2] will be useful for you, too.
+
+If you need to port existing languages for your own architecture - 
+you'd better learn how to write LLVM backends or GCC backends.
+
+If you want to read more about toy compilers - check [SmallC][3].
+
+If you want to write compiler for a simple language - check PL/0 or Basic or C.
+
+But please, never write compilers manually for real tasks.
+
+final word
+----------
+
+The full sources of the compiler are available on [my bitbucket page][4].  They
+are licensed under MIT, feel free to use and modify.  I'm not sure if there is
+any sense in submitting issues to this project, so do it only if you know how
+to fix them :)
+
+Anyway, compilers is fun. I hope you liked it!
+
+_Check [part 1](/blog/cucu-part1.html) or [part 2](/blog/cucu-part2.html) if
+you missed something_
+
+Posted on {{ page.date }} {{ rss }}
+
+{{ disqus }}
+
+[1]: http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
+[2]: https://www.coursera.org/course/compilers
+[3]: http://en.wikipedia.org/wiki/Small-C
+[4]: https://bitbucket.org/zserge/cucu