Wiki

Clone wiki

inf225 / 2017 / ex / Exercise 3

Exercise 3 – JSON parsing

JSON is a data language, used to transmit data as key-value pairs. You can read more about it and see examples at:

Goals for this exercise

  • Make a useful grammar on your own
  • Learn some useful Rascal features: list pattern matching, comprehensions, built-in data structures
  • Learn how to test your parser and do round-trip testing
  • Use recursive tree traversal to "evaluate" a parse tree

Please complete (as much as possible of) the exercise before Tuesday October 3rd, then we'll work through a solution during the lecture

3.1. Make a Grammar for JSON in Rascal

Make a new Rascal module (call it "Json", for example). Study either examples of JSON or the JSON grammar from the standard, and develop a syntax definition in Rascal.

  • You will need to make both lexical productions (for the syntax of strings, numbers etc) and syntax productions, and also layout productions (for spacing). Name the non-terminal for values JsonValue.
  • Start simple, e.g., with just numbers and strings, and once you know they work OK, add more complicated stuff like arrays and objects.
  • Always test everything. You can try things on the console, e.g. with:

    import ParseTree;
    import Json;

    parse(#JsonValue, "25")
* You can make unit tests by writing boolean functions and marking them with test. If you type :test in the console, all the tests in the current module will be run (there's also a "Run Tests" in the menu). For example:

test bool parseTest1() {
    parse(#JsonValue, "\"abc\""); // remember to import ParseTree for this to work
    return true; // we'll get an exception if the parse fails
}

Tips

  • There's a nice JSON grammar on the web page – your grammar can follow much the same structure (although you'll also need to deal with whitespace/layout). Do note that the grammar notation they've used doesn't have notation for lists, so you'll see things like elements = value | value "," elements instead of the *-list notation available in Rascal. The RFC contains a complete grammar for JSON – have a look at this also, and you'll see that whitespace (ws) is included directly in the productions (you'll also notice that the grammar notation is a bit different than you're used to (the RFC uses an EBNF variant): with / instead of |, *(A / B) instead of (A | B)*, [ A ] instead of A?, and literal characters in ASCII hex notation.)
  • You can make comma-separated lists with a special separated-by syntax: {NonTerminal Separator}*, where NonTerminal is, e.g., JsonValue, and the Separator is a terminal, e.g., ",". Use * or + for repeating 0/1 or more times.
  • When you're making lexical syntax, Rascal has a slightly different syntax for inverting character classes than what is usual in regular expressions. Use ![a] for everything except a, instead of [^a] (which will mean, a hat or an a).
  • For the lexical syntax, it is usually best to force greedy matching, e.g., with [0-9]+ !>> [0-9] (one or more 0-9, not followed by another 0-9).

3.2. Convert ParseTree to Rascal Datastructures

Make recursive functions that convert the parsetree to Rascal data structures, following the same principle as the eval function from Simpl.rsc. For example, if you encounter a number or string, this is your base case and you return it directly – but if you encounter a JSON array, you'd recursively decode each element, and return a list of the decoded values.

Use value as your return type – this includes any Rascal type, similar to how you might use Object in Java for storing objects of any class. You'll need this because you don't know in advance what the type of the object you're reading will be (it could be e.g., just a number, or it could be a list of objects, or anything).

You can call your functions jsonToValue, for example. Make one default function for the case where you find something unexpected in the parse tree:

public default value jsonToValue(JsonValue val) {
   throw "Unexpected parse tree node: <unparse(val)>";
}

Also make a jsonToValue function that takes text and runs it through the parser before calling jsonToValue on the tree:

public value jsonToValue(str text) { return jsonToValue(parse(#JsonValue, text)); }

Useful Rascal data types:

  • value – the mother of all types (i.e., values of any type can be stored in a variable of type value, similar to how Object works in Java)
  • list[value] – list of values. Literal syntax: [a, b, c, d]. To add an element to a list l, do l += newElement; or l = l + newElement; or l = [*l, newElement];
  • map[str,value] – map strings to values; this is how JSON objects work. Literal syntax: ("foo" : 1234). To add a new element to a map, do m[key] = val; or m += (key : val);.
  • str – plain strings. Literal syntax: "a string". Converting strings is a little bothersome; remember that the JSON syntax for strings includes quotes around the string and possibly escapes inside the string. These must be removed/converted also (though it's fine to wait until later to deal with this).
  • num, int, real – Rascal has several types for numbers; num is the supertype for all of them. Rascal doesn't have floats, but uses precise decimal numbers (real). There are also rational numbers (fractions); rat. Import the String module to get access to toInt which converts strings to integers (and toReal which converts to reals).

For example, the following JSON object:

{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25
}

would correspond to this in Rascal:

map[str,value] obj = (
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25
);

Tips

You'll need a little bit of extra Rascal knowledge to decode trees that contain list of elements (such as the JSON arrays and objects). To pattern-match against a list of something, you include the * or + in the pattern (e.g., (NonTerminal)`{<A* as>}` to match a sequence of As between braces). If you have a separated list (as in JSON), you include the separator syntax: (JsonValue)`[<{JsonValue ","}* vals>]` – this will match a comma-separated list of values between brackets (e.g., [a, b, c]), and bind the list of items to the variable vals. You can then iterate over the list in a for-loop, for example: for(v <- vals).

Better than using a loop is using a comprehension, which will build a list/map/set/relation datastructure on the fly. The syntax for comprehensions is:

  • [ f(element) | element <- collection ] – pick each element from the collection (you can use pattern-matching here), and construct a list (indicated by the outer brackets) where the elements are f(element) for each element.
  • For instance, for decoding JSON arrays, we could use something like:
// first, let vals match the list of elements of a JSON list [..,..,,..]
public value jsonToValue( (JsonValue)`[<{JsonValue ","}* vals>]` ) {
  // construct a list of decoded elements:
  return [jsonToValue(v) | v <- vals];
}
  • Comprehensions also work with maps and other data structures. For example, ( key : val | <key,val> <- listOfPairs) to make a map.
  • You can pattern-match when picking elements from a collection, including with concrete syntax patterns. For example, you might use something similar to this to decode the colon-separated pairs of JSON key-values: ( ... | (JsonPair)`<JsonString key>:<JsonValue val>` <- vals)
  • You can use the visit-statement, together with pattern-matching to decode/remove escapes from strings, since visit also allows you to visit and replace substrings. Use a regular expression to match, and replace by a string:
    s = visit(s) {
      case /\\n/ => "\n"  // replace escaped newlines with actual newlines
      case /\"/ => ""  // note that this will remove *all* quotes, including escaped ones. You should only remove quotes at the beginning and end of the string
    }
    

3.3 Convert Rascal Datastructures back JSON

Make a set of recursive functions public str valueToJson(value val) that converts a Rascal datastructure into a valid JSON string. As before, you can make a default function for cases you're unable to handle.

  • You can make strings using string interpolation. Just put <expr> into your string, and it will be converted to a string and inserted into the string. The expression can be just a variable or could also be a complicated Rascal expression. E.g., "<key> : <valueToJson(val)>".

3.4 More Testing

An important property of parsing and unparsing (converting back to text) is that they are inverses of each other. I.e.,

parse(unparse(x)) == x

unparse(parse(y)) == y

With the caveat that layout/formatting may be different, so that if you parse a JSON file and convert it to Rascal data, and then back to JSON, the string won't be an exact match. What should be true in our case, however, is that:

jsonToValue(valueToJson(val)) == val

jsonToValue(valueToJson(jsonToValue(text))) == jsonToValue(text)

I.e., as long as we compare the structure of the data, it should be preserved through the conversion to/from text.

  • Make a boolean function that encodes these properties, and several test cases that check that the properties hold.

3.5 JSON Data

Example code to make unit tests

Here's a bunch of code to set up tests for the parser with random strings

You'll need these imports (at the top of your Rascal file) to make it work:

import ParseTree;
import String;
import IO;
import util::Math;
import List;

first, a collection of useful basic JSON strings

list[str] jsonIntegers = [
    "0", "1", "2", "3232892", "-43"
];


list[str] jsonNumbers = [
    *jsonIntegers,
    "21.321", ".232", "0.45343e-21",
    "-43", "-.232", "-43222.213e44"
];


list[str] jsonStrings = [
    "\"abc\"", "\"\"", "\"ab\\\"c\"",    
    "\"ab\\nc\"", "\"\\r\"", "\"a\\u1234b\\\"c\""
];

list[str] jsonBasicValues = [*jsonNumbers, *jsonStrings];

Functions to generate random values

str randomJson(int chance) {
    r = arbInt(chance);
    switch(r) {
    case 0:
        return randomJsonList(chance+10);
    case 1:
        return randomJsonObject(chance+10);
    }

    return getOneFrom(jsonIntegers);
}

str randomJsonList(int chance) {
    return "[<intercalate(",", [randomJson(chance) | _ <- [0 .. arbInt(100)]])>]";
}
str randomJsonObject(int chance) {
    return "{}";  // TODO: fill in, similar to randomJsonList
}

Utility function for test-parsing a list of strings

bool testList(list[str] ss) {
    r = true;
    for(s <- ss) {
        if(s != "") {
            try {
                parse(#JsonValue, s);
            }
            catch _: {
                println("Failed to parse: <s>");
                r = false;
            }
        }
    }
    return r;
}

Make your own tests

test bool parseTestIntegers() {
    return testList(jsonIntegers);
}

test bool parseTestNumbers() {
    return testList(jsonNumbers);
}

test bool parseTestStrings() {
    return testList(jsonStrings);
}


test bool parseTestBasics() {
    return testList(jsonBasicValues);
}

Updated