Exercise 4 (Compulsory)

You objective in this exercise is to work with the code we've built in the lectures, and turn it into a (reasonably) finished implementation of a simple language.

When you're done, you should be comfortable with:

Passing around environments for context information
Building basic evaluators and typecheckers
Program execution with and without a store
The distinction between abstract and concrete syntax

You may also realise the reasoning behind some design choices in languages you know (such as why variables from the surrounding scope must be final in Java lambdas and anonymous classes).

Tasks

This is what you have to do. Read below for helpful information about how to do it. You are free to make your own decisions (including doing parts in another language if you wish), and even simplify the tasks if you need to. Once you're done, you show and explain your solution to Anya or Anna who will let you know whether it's approved or if you need further improvements. Deadline: you should get one us to review your solution before we leave for a conference at the end of the month, i.e., October 27th at the latest.

We may/will expand the hints section as we go along, based on your questions. Please ask at the lectures, by email or via Facebook message. Feel free to discuss with other students as well.

Retrieve the code simple language implementation from the lectures, and check that you can run it.
Write an evaluator for abstract syntax trees of our language (as defined in SimpleAST.rsc). It should follow the basic style of the evaluator from Simple.rsc (i.e., a set of recursive functions with an environment passed around), except it works on the AST datatype instead of concrete syntax. If you want a bigger challenge, write the evaluator for the imperative language (ImperativeAST.rsc) instead, where you also have to deal with a store / memory.
Finish the type checker in ImperativeTypeChecker.rsc. You should finish these features:
- The current typechecker allocates store locations sequentially. This means that recursive invocations of a function will overwrite local variables and parameters. Instead, use the environment to keep track of how many variables you've made so far within the current function, and then make storage locations relative to the stack frame.
- Our language only supports single-argument functions. Make it work also with arbitrary-length argument lists.
- Overloaded functions are nice. Make it so that functions are selected not just based on name, but also based on the types of the arguments. In the AST, you can distinguish the functions based on the store location.
Anonymous functions (lambdas) are all the rage these days – even Java and C++ have them now. Add this to the language, and update the evaluator and typechecker.
You should probably expand the language a little bit. In particular, if is missing, so it's difficult to make interesting programs. You should also consider changing the syntax of Program so that a program can have a list of function definitions before the expression to be evaluated (instead of having to make lots of nested let/let funs).

Detailed hints

1.

You find the code at this Git URL: https://bitbucket.org/sle-uib/inf225.git

To import in Eclipse, choose File -> Import -> Git -> Projects from Git, choose Clone URI and enter the URL. Click next a few times. It'll ask you about which project to import, and the project name is inf225.h16 (is probably preselected). Hit finish, and the code should show up in your project explorer.

For example, these things should work in the Console (provided you start the console in the inf225.h16 project):

rascal>import SimpleTypeChecker;
ok
rascal>import Simple;
ok
rascal>typecheck((Expr)`let x = 5 in x end`, ())
...
rascal>eval((Expr)`let x = 5 in x end`, ())
...

2. Evaluator

Getting Started

Make a new Rascal module (File -> New), name it SimpleEvaluator or something.
You probably need the same imports as in Simple.rsc, plus you need import SimpleAST itself.
Your evaluator functions should look something like this:
```
public Value eval(Plus(e1, e2), Env env) { ... }
```
Copy relevant definitions for Name, Value and Env from Simple.rsc
For testing, you can write programs directly in abstract syntax. For example, Let("x", Int(5), Plus(Var("x"),Var("x"))).

Hints

Note that you're now working in abstract syntax, rather than the concrete syntax in the evaluator we had in the lectures. This means we no longer care about program text; there's no parsing going on, and we don't need a grammar. (We assume that another component in our system has taken care of parsing and converting to an abstract syntax tree.)

Values and patterns look like this: Int(5), Int(a), Times(a, Int(b)) and so on, rather than (Expr)`5`, (Expr)`<NUM a>` and (Expr)`<Expr a>*<NUM b>`.
You don't need to worry about converting numbers from strings to ints, since they are already ints in the AST.
You can also make eval functions with a normal argument list, rather than using pattern matching. E.g., int eval(ExprTree t, Env env), and then use a switch statement to deal with the various cases of AST nodes.
To support other types than int, you must update the return value of your eval functions. You best option is to make a new data type data Value = IntVal(int i) | StrVal(str s) | ...;, and use that as your return values. This makes it easy to distinguish values of different types. For example, if you have two values v1 and v2, you can check if they are both ints: if(<Int(i1), Int(i2)> := <v1, v2>) { return Int(i1+i2); }

You can write any program directly in the abstract syntax; it's just a bit more cumbersome than when you have a nice concrete syntax. You may find it easier to obtain abstract syntax trees if you repurpose the old concrete syntax-based evaluator and make it return ExprTree nodes – i.e., convert from concrete syntax to abstract syntax.

If you want to try and make an imperative (store-oriented, rather than value-oriented) language, there are two approaches:
- For very simple updatable variables, you can get by with each evaluation step returning an updated environment. This is what Rascal does internally, and allows variables to change (i.e., be rebound to different values) without values/objects to change. See this short hint on semantics of updatable variables
- In the more advanced case, you need to have a storage location allocated to each variable. Your store can be a list of values, and whenever you declare a new variable you must find a new storage location for it. If you work with the ImperativeAST output from the typechecker (which you'll need to finish), the work of finding storage locations has already been done.

3.

Getting Started

Read the stack frame chapter hand-out.
The point is that each variable should have a store location on the current function's stack frame. The stack frame has space for all local variables in a function; when a function is called (which would happen in e.g. an evaluator), space on the stack is made available for the stack frame, and local variables are accessed relative to a frame pointer (which points to the stack frame in memory).
Regarding store locations, there are two approaches: for a language with only local variables, you can keep track of the currently allocated store locations in the environment (e.g., just make a fake variable with a special name such as "__store_top", for instance). With global variables (for instance, storing a list of top-level functions), you can use the same scheme as the typechecker already uses (a global counter).
For multi-argument functions, you also need to update the syntax. Remember that you can use {Expr ","}* to indicate a comma-separated list of zero or more arguments. You use the same syntax in patterns for your type-checking functions, e.g., typecheck(`<ID f>(<{Expr ","}* args>`), env). The arguments will then be a list, which you can process with a for-loop or list comprehension ([f(x) | x <- args]).
For overloading, the environment will contain a list or set of functions rather than a single one. You must first determine the argument types, and then go through the list of candidates and pick the one where the argument types match.
If you use abstract syntax instead of concrete syntax, assume that the type of variables are not yet determined. E.g., that variables are Var("x", Unknown()) and you're supposed to resolve them to Var("x", Int()).

Hints

If you have functions nested within function (which the language syntax allows), and the inner function can access the outer function's variables, this exercise becomes a lot harder. You don't have to solve this issue, but at least try to think about it – could this be one of the reasons why Java requires that local variables that are used in an inner class are marked as final?
What happens if you have allocated space for an object on the stack, and the function returns a reference to it?
A function bundled with its environment and variable storage (if any) is known as a closure. Languages such as Scheme allow functions to update non-local variables (e.g., variables from an outer scope). This makes the simple stack scheme we've used here unusable, since the life time of a variable will be dynamic rather than determined simply by the nesting of function calls. This is a useful feature – you can use it, e.g, to implement object orientation, but it can be tricky to implement. This is an example that makes use of closures with updatable variables:
```
let makeCounter() = // makes a function that gives a sequentially increasing number
   let counter = 0 in // a local counter variable
      let f() = {
            counter = counter + 1; 
        return counter;
      } in
         f  // return the inner function
      end
   end
end
```

4.

Getting Started

Note that anonymous functions are essentially equivalent to making a regular function and then returning it.

Wiki

inf225 / 2016 / Exercise 4

Exercise 4 (Compulsory)

Tasks

Detailed hints

1.

2. Evaluator

Getting Started

Hints

3.

Getting Started

Hints

4.

Getting Started

Hints