Toy C#-ish compiler /

Filename Size Date modified Message
80 B
5.0 KB
12.0 KB
7.0 KB
5.1 KB
3.4 KB
12.3 KB
697 B
2.0 KB
300 B
6.0 KB
1.7 KB
2.8 KB
1.2 KB
1.9 KB
1.0 KB
23 B
34 B
95.2 KB
4.0 KB
         C# (or maybe C++) compiler by Anton Golov (3809277)

Frivolous changes to existing code:
 * Removed unused keywords/types, including try/catch, floating point types, and
   array support in the grammar.

Core features implemented (1-6):
 * Boolean and character constants are handled and have the correct type.
   This makes the latter effectively useless, as they can only be stored and
   printed; arithmetic on chars is not implemented.
 * The operators have the correct priorities.  All operators except the
   assignment and comparison operators are left-associative.  Assignment is
   right-associative, and comparison is non-associative.
 * Static method calls work fine.  At the moment, only static methods of the
   current class can be called.  The caller is responsible for cleanup.
 * The print syntax works.  For efficiency reasons, the arguments are
   evaluated right-to-left.
 * Methods can have a result.  It is returned via the result register.
 * Local variables can be used.

Bonus features implemented (7, 9, 11, 12, 13, 14):
 * Lexer comments of both types are discarded.
 * The for statement is implemented in the parser, and internally treated as
   a while statement.
 * The code generator evaluates both logical operators lazily.
 * Static member functions and variables may be defined and using the C++ syntax
   of ClassName::FunctionName(args) or ClassName::VariableName.  An unqualfied
   function call is assumed to be a static function call to a function of the
   current class.  All other kinds of member access must be qualified.

   Static members are placed near the vtable, and lead to "code modified"
   warnings.  Please ignore.

   Derived classes have their own instances of statics.

   The current implementation of statics is suboptimal as all but the first
   static member requires addition to access.  This could be optimised out by
   generating a label for every static member or allowing the assembler to
   handle statements like LDC label+3.
 * The following error messages are produced:
   * Undeclared local.
   * Undeclared static function.
   * Undeclared static member.
   * Undeclared non-static function.
   * Undeclared non-static member
   * Nonexistent type.
   * Usage of rvalues as lvalues.
   * Type mismatch.
   The error-reporting itself is lacking; lack of position information and
   crude checks means the messages aren't particularly useful.  Oh, and they
   only come one at a time, and stop all further compilation.
 * Objects can be created using the new syntax.  New requires parentheses but
   does not allow for passing arguments (constructors are not implemented).
   Objects are stored on a heap, the size of which can be adjusted with the
   --heapsize flag.  Every allocated object has an overhead of 1 byte. Objects
   can be destroyed using delete, as in C++.  Any kind of silly usage of the
   heap is undefined behaviour, no diagnostic required. :)

Bonus bonus features implemented:
 * Local variables may be initialised at the point of declaration.
 * The type of such variables may be left for the compiler to deduce using
 * A number of optimisations are applied.
 * Classes may inherit from other classes using the standard C# syntax.  The
   derived class receives all members of the base class, and a reference to a
   derived class object can be converted to a reference to a base class object.
   Deletion via base class objects works correctly.

   Virtual function access suffers the same problem as static member access.
 * All non-static functions are by default polymorphic, and can be overriden.
 * C++-style lambda functions are supported.  The syntax is

     [captures](parameters) -> return_type { body }

   Where captures is a comma-separated list of locals to capture.  The return
   type may be omitted, together with the arrow, in which case void is

   All captures happen by-value.  Captures are treated as locals within the
   lambda body; they may be assigned to, but this will not affect the value
   in subsequent calls.

   A lambda expression allocates memory, which can be reclaimed by deleting
   the resulting function reference.
 * Lambdas have the type @rt(args), where rt is the return type, and the
   arguments are not given names.  Lambdas may be passed around as any
 * Conversions of function types take into account conversions of the
   parameter and return types; if D is derived from B, @D(B) will convert
   into a @B(D) just fine (but not the other way around!).
 * Converting member functions to a lambda type will fail because the name
   won't be found. :)  (Ran out of time before I could implement this.)
 * A lambda x can be called using the syntax %x(args).

Some comments on the architecture:
 * Due to the overwhelming complexity, the algebra has been split into three
   * The first phase (CSharpSTBuild) builds the symbol table and does some
     basic sanity checking at class-level.
   * The second phase (CSharpTypeCheck) does type checking and extracts all
     information for generating expressions from the global environment.
   * The first phase generates the actual code.
 * The second algebra transforms all expressions and some statements into a
   slightly different form.  All references to types are removed, and instead
   any global level information is filled directly in.  This allows for a much
   simpler implementation of the code generator.
 * There are a number of optimisers.  These are implemented as parsers that
   run over the SSM code, matching anything they can optimise and replacing
   it with the updated version, while keeping anything not matched the same.
 * The heap is implemented as a circular single linked list.  Only free
   blocks are linked; they contain their size and a pointer to the next
   block.  Allocation involves splitting off the front of a free block and
   removing it from the linked list.  Both allocation and deallocation are
   linear in the size of the list in the worst case.