Wiki

Clone wiki

inf225public / Exercise 2

Note: Feel free to skip stuff you don't understand yet; you can catch up with it later. And do also feel free to study the manuals and experiment on your own if you have the time. Please let Anya know if you find any mistakes (yes, something being unclear or incomprehensible is a mistake in the exercise, not in you).

2.1 Introduction

Finish 1.3 and 1.4 of Exercise 1 if you haven't yet done so.

2.2 Grammar ambiguities

The following grammar is motivated by declarations in C:

Declaration ::= Type Declarator ";"
       Type ::= "int" | "char"
 Declarator ::= "*" Declarator
              | Declarator "[" number "]"
              | Declarator "(" Type ")"
              | "(" Declarator ")"
              | name
1. Prove the syntactic ambiguity of this grammar by finding a string that has more than one parse tree. Draw the parse trees.

  1. The constructs [ number ] and ( Type ) can be thought of as being postfix operators with a declarator as an operand. Suppose that * has lower precedence than these operators. White an unambiguous grammar that generates the same strings as this grammar and makes it easy to identify the components of a declarator.

  2. Suppose that the first production for Declarator is changed to make * a postfix operator. Why is the resulting grammar unambiguous?

2.3 The Simple Language

We'll now experiment a bit with grammars and syntax in the Rascal meta-programming language.

  • Note: Whenever you edit a Rascal program in Eclipse and save it, it will be automatically reloaded in the console. (Unless it contains errors. Who knows what will happen then.)

  • Note: Rascal may sometimes pop up an annoying (syntax) error message; in most cases you can just click OK, and continue editing your program, and eventually it'll stop appearing...

Consider again the grammar of the Simple language from Exercise 1:

start syntax Program = Expr;

syntax Expr
    = ID                  // variables
    | NUM                 // integers
    | Expr "+" Expr       // addition
    | Expr "*" Expr       // multiplication
    | ID "(" Expr ")"     // function call
    | "(" Expr ")"        // parentheses
    ;

// identifiers
//    y !<< x means 'x' must not be preceeded by  'y'
//    x !>> y means 'x' must not by followed by 'y'
// so, this means that an identifier is a sequence of one
// or more letters or underscores, with no additional
// letters or underscores before or after
lexical ID = [a-zA-Z_] !<< [a-zA-Z_]+ !>> [a-zA-Z_];

// numbers
lexical NUM = [0-9] !<< [0-9]+ !>> [0-9];

Edit the Simple.rsc module again, and add the following code:

public str pretty(Program p) {
    if(Expr e := p)
        return pretty(e);
    else
        return "[unknown program: \"<e>\"]";
}

public default str pretty(Expr e) {
    return "[unknown expr: \"<e>\"]";
}


public str pretty((Expr)`<ID i>`) {
    return "<i>";
}

public str pretty((Expr)`<NUM n>`) {
    return "<n>";
}

public str pretty((Expr)`<Expr e1>+<Expr e2>`) {
    return "(<pretty(e1)>+<pretty(e2)>)";
}

public str pretty((Expr)`<Expr e1>*<Expr e2>`) {
    return "(<pretty(e1)>*<pretty(e2)>)";
}

public str pretty((Expr)`(<Expr e>)`) {
    return "(<pretty(e)>)";
}

public str pretty((Expr)`<ID f>(<Expr e>)`) {
    return "<f>(<pretty(e)>)";
}

This defines a simple pretty-printer for Simple programs. Important things to note:

  • The functions are overloaded with pattern matching on the arguments. In most of the cases, we use concrete syntax patterns.
  • The pattern (Expr)`<Expr e1>*<Expr e2>` denotes something which is an Expr; the concrete syntax of what we're matching is contained in the back-quotes. The sub-expressions <Expr e1> will match any Expr and bind it to a meta-variable e1. The * is the multiplication symbol from our Simple grammar.
  • Inside the strings we return, we use string interpolation. Anything within <...> in a string is a Rascal expression which is evaluated and converted to a string. Converting a parse tree (as in ID i) yields its program text; for the other cases we do a recursive call to pretty.
  • You can try out a concrete syntax pattern using the pattern matching operator := in Rascal:
rascal>(Expr)`<Expr e1>*<Expr e2>` := `x*y`
bool: true

rascal>if((Expr)`<Expr e1>*<Expr e2>` := `(a+b)*y`) println(e1);
(a+b)
ok

2.3.1 Fixing Ambiguities

First, try the pretty function on some programs. Does it always work, or are there things for which you get an error? In particular, try it with expressions such as ``\a+b*c````.

The errors you (hopefully) get is due to some expressions being ambiguous. You can explore the ambiguities with Dr. Ambiguity:

rascal>import Ambiguity;
ok

rascal>diagnose(`a+b+c`)
list[Message]: [
  info("Ambiguity cluster with 2 alternatives",|stdin:///|(10,5,<1,10>,<1,15>)),
  info("The alternatives use the same productions",|stdin:///|(10,5,<1,10>,<1,15>)),
  info("The alternatives have the same production at the top: Expr =  Expr  \"+\"  Expr ",|stdin:///|(10,5,<1,10>,<1,15>)),
  error("This rule [Expr =  Expr  \"+\"  Expr ] may be missing an associativity declaration  (left, right, non-assoc)",|stdin:///|(10,5,<1,10>,<1,15>))
]

  • a) Try it yourself with a few expressions. The problems are due to operator precedence and associativity, as we've discussed in the lectures.

The grammar should have different priorities for addition and multiplication. Priority is expressed by using > instead of | when separating alternatives in the grammar. This gives multiplication higher priority (or precedence as it is also called):

    ...
    | Expr "*" Expr       // multiplication
    > Expr "+" Expr       // addition
    ...

Consult the Rascal priority rules for more information.

  • b) Introduce priorities into the grammar, and try the ambiguity diagnosis again on a few expressions. Is the situation better? Does `x+y*z` work? What about `x+y+z`?

Associativity is expressed using the left, right or non-assoc keywords. You can use it either on a single alternative in a production:

    | left Expr "*" Expr

or multiple ones, if you want operators to be mutually associative:

    | left (Expr "+" Expr | Expr "-" Expr)

  • c) Make multiplication and addition associative.

  • d) Use pretty to check the nesting of expressions; it will put parentheses around each expression, so you can easily see which ones belong together. Change the associativity to right and/or non-assoc and study the effect.

  • e) Add alternatives for subtraction and division to the grammar, and make the mutually associative with addition and multiplication.

  • f) To make pretty work for subtraction and division, you'll have to add two more overloaded cases of the pretty function.

2.3.2 Spaces and layout

You may have noticed that you can't write programs with spaces in them. That's because you haven't defined syntax rules for spaces. You do this with the layout construct in the Rascal syntax definition.

For example:

// layout is lists of whitespace characters
layout MyLayout = [\t\n\ \r\f]*;
Each terminal or non-terminal in the right-hand side of a production may be separated by layout. This rule says that layout is zero or more tabs, newlines, normal spaces, carriage returns or form feeds. This is typical in modern programming languages.

Comments are typically also considered layout.

We will look into this more in the future.

Updated