Wiki
inf225public / Exercise 2
Note: Feel free to skip stuff you don't understand yet; you can catch up with it later. And do also feel free to study the manuals and experiment on your own if you have the time. Please let Anya know if you find any mistakes (yes, something being unclear or incomprehensible is a mistake in the exercise, not in you).
2.1 Introduction
Finish 1.3 and 1.4 of Exercise 1 if you haven't yet done so.
2.2 Grammar ambiguities
The following grammar is motivated by declarations in C:
Declaration ::= Type Declarator ";" Type ::= "int"  "char" Declarator ::= "*" Declarator  Declarator "[" number "]"  Declarator "(" Type ")"  "(" Declarator ")"  name

Prove the syntactic ambiguity of this grammar by finding a string that has more than one parse tree. Draw the parse trees.

The constructs
[
number]
and(
Type)
can be thought of as being postfix operators with a declarator as an operand. Suppose that*
has lower precedence than these operators. White an unambiguous grammar that generates the same strings as this grammar and makes it easy to identify the components of a declarator. 
Suppose that the first production for Declarator is changed to make
*
a postfix operator. Why is the resulting grammar unambiguous?
2.3 The Simple Language
We'll now experiment a bit with grammars and syntax in the Rascal metaprogramming language.

Note: Whenever you edit a Rascal program in Eclipse and save it, it will be automatically reloaded in the console. (Unless it contains errors. Who knows what will happen then.)

Note: Rascal may sometimes pop up an annoying (syntax) error message; in most cases you can just click OK, and continue editing your program, and eventually it'll stop appearing...
Consider again the grammar of the Simple language from Exercise 1:
start syntax Program = Expr; syntax Expr = ID // variables  NUM // integers  Expr "+" Expr // addition  Expr "*" Expr // multiplication  ID "(" Expr ")" // function call  "(" Expr ")" // parentheses ; // identifiers // y !<< x means 'x' must not be preceeded by 'y' // x !>> y means 'x' must not by followed by 'y' // so, this means that an identifier is a sequence of one // or more letters or underscores, with no additional // letters or underscores before or after lexical ID = [azAZ_] !<< [azAZ_]+ !>> [azAZ_]; // numbers lexical NUM = [09] !<< [09]+ !>> [09];
Edit the Simple.rsc
module again, and add the following code:
public str pretty(Program p) { if(Expr e := p) return pretty(e); else return "[unknown program: \"<e>\"]"; } public default str pretty(Expr e) { return "[unknown expr: \"<e>\"]"; } public str pretty((Expr)`<ID i>`) { return "<i>"; } public str pretty((Expr)`<NUM n>`) { return "<n>"; } public str pretty((Expr)`<Expr e1>+<Expr e2>`) { return "(<pretty(e1)>+<pretty(e2)>)"; } public str pretty((Expr)`<Expr e1>*<Expr e2>`) { return "(<pretty(e1)>*<pretty(e2)>)"; } public str pretty((Expr)`(<Expr e>)`) { return "(<pretty(e)>)"; } public str pretty((Expr)`<ID f>(<Expr e>)`) { return "<f>(<pretty(e)>)"; }
This defines a simple prettyprinter for Simple programs. Important things to note:
 The functions are overloaded with pattern matching on the arguments. In most of the cases, we use concrete syntax patterns.
 The pattern
(Expr)`<Expr e1>*<Expr e2>
` denotes something which is anExpr
; the concrete syntax of what we're matching is contained in the backquotes. The subexpressions<Expr e1>
will match anyExpr
and bind it to a metavariablee1
. The*
is the multiplication symbol from ourSimple
grammar.  Inside the strings we return, we use string interpolation. Anything within
<...>
in a string is a Rascal expression which is evaluated and converted to a string. Converting a parse tree (as inID i
) yields its program text; for the other cases we do a recursive call topretty
.  You can try out a concrete syntax pattern using the pattern matching operator
:=
in Rascal:
rascal>(Expr)`<Expr e1>*<Expr e2>` := `x*y` bool: true rascal>if((Expr)`<Expr e1>*<Expr e2>` := `(a+b)*y`) println(e1); (a+b) ok
2.3.1 Fixing Ambiguities
First, try the pretty
function on some programs. Does it always work, or are there things for which you get an error? In particular, try it with expressions such as ``\
a+b*c````.
The errors you (hopefully) get is due to some expressions being ambiguous. You can explore the ambiguities with Dr. Ambiguity:
rascal>import Ambiguity; ok rascal>diagnose(`a+b+c`) list[Message]: [ info("Ambiguity cluster with 2 alternatives",stdin:///(10,5,<1,10>,<1,15>)), info("The alternatives use the same productions",stdin:///(10,5,<1,10>,<1,15>)), info("The alternatives have the same production at the top: Expr = Expr \"+\" Expr ",stdin:///(10,5,<1,10>,<1,15>)), error("This rule [Expr = Expr \"+\" Expr ] may be missing an associativity declaration (left, right, nonassoc)",stdin:///(10,5,<1,10>,<1,15>)) ]
 a) Try it yourself with a few expressions. The problems are due to operator precedence and associativity, as we've discussed in the lectures.
The grammar should have different priorities for addition and multiplication. Priority is expressed by using >
instead of 
when separating alternatives in the grammar. This gives multiplication higher priority (or precedence as it is also called):
...  Expr "*" Expr // multiplication > Expr "+" Expr // addition ...
Consult the Rascal priority rules for more information.
 b) Introduce priorities into the grammar, and try the ambiguity diagnosis again on a few expressions. Is the situation better? Does
`x+y*z`
work? What about`x+y+z`
?
Associativity is expressed using the left
, right
or nonassoc
keywords. You can use it either on a single alternative in a production:
 left Expr "*" Expr
or multiple ones, if you want operators to be mutually associative:
 left (Expr "+" Expr  Expr "" Expr)

c) Make multiplication and addition associative.

d) Use
pretty
to check the nesting of expressions; it will put parentheses around each expression, so you can easily see which ones belong together. Change the associativity toright
and/ornonassoc
and study the effect. 
e) Add alternatives for subtraction and division to the grammar, and make the mutually associative with addition and multiplication.

f) To make
pretty
work for subtraction and division, you'll have to add two more overloaded cases of thepretty
function.
2.3.2 Spaces and layout
You may have noticed that you can't write programs with spaces in them. That's because you haven't defined syntax rules for spaces. You do this with the layout
construct in the Rascal syntax definition.
For example:
// layout is lists of whitespace characters layout MyLayout = [\t\n\ \r\f]*;
Each terminal or nonterminal in the righthand side of a production may be separated by layout. This rule says that layout is zero or more tabs, newlines, normal spaces, carriage returns or form feeds. This is typical in modern programming languages.
Comments are typically also considered layout.
We will look into this more in the future.
Updated