Overview

EDSL For Parsing S-Expressions

This is an experiment. When opening Sexp_parser_edsl you get a Meta_parser module implementing the META_PARSER interface, which provides construction functions for building S-Expression parsing specifications in an embeded DSL.

Example

Demo

Let's say we define an AST for a basic language:

type expr =
  | Int of int
  | Float of float
  | Binop_plus of expr * expr

We can define a parser specification:

open Sexp_parser_edsl
let parser =
  let open Meta_parser in
  let rec expr_gram =
    lazy (Lazy.force (  (* necessary hack to write recursive values *)
        try_in_order ~name:"DSL Expression" [
          apply integer ~f:(fun i ~loc -> `Ok (Int i));
          apply float   ~f:(fun s ~loc -> `Ok (Float s));
          apply (tagged ~tag:"+" (sequence expr_gram))
            ~f:(fun (plusses: expr list) ~loc ->
                match plusses with
                | [] -> `Error (`wrong_plus_expression loc)
                | one :: [] -> `Ok (one) (* We optimize `(+ one)` to `one` *)
                | one :: more ->
                  `Ok (ListLabels.fold_left ~init:one more ~f:(fun prev another ->
                      (Binop_plus (prev, another)))))
        ]
      )) in
  expr_gram

OCaml types this specification as:

val parser :
  (expr,
   _[> `wrong_plus_expression of Sexp_parser_edsl.Meta_parser.location ])
  Sexp_parser_edsl.Meta_parser.t = <lazy>

which means that parser is parser specification (Meta_parser.t) that parses to expr, and the standard parsing errors (see META_PARSER.syntax_error) will be augmented with the wrong_plus_expression case.

Then, we define the parse_expr function using the parser specification:

let parse_expr input =
  Meta_parser.parse_string parser input
     ~syntax_error:(fun loc e -> `Syntax (loc, e))
val parse_expr :
  string ->
  (expr,
   _[> `Syntax of
         Sexp_parser_edsl.Meta_parser.location *
         _[> `no_matching_rule of string
           | `not_a_float of string
           | `not_an_integer of string
           | `nothing_left_to_try of string option * string
           | `sexp of exn ]
     | `wrong_plus_expression of Sexp_parser_edsl.Meta_parser.location ])
  Sexp_parser_edsl.result = <fun>

and we can start testing:

let e1 = parse_expr "(+ 42 (+ 4.2 51) (+ 1))"
val e1 :
  (expr, ...)
  Sexp_parser_edsl.result =
  `Ok
    (Binop_plus (Binop_plus (Int 42, Binop_plus (Float 4.2, Int 51)), Int 1))

we see that the "(+ 1)" has been parsed to Int 1 (see special case above).

Another example, when the parsing fails:

let e2 = parse_expr "(+ 42 (+ 4.2 51) (+) (+ 1))"

the error value gives the location of the whole Try_in_order expression, more refined locations are future work:

val e2 :
  ... Sexp_parser_edsl.result =
  `Error
    (`Syntax
       ({Sexp_parser_edsl.Meta_parser.start = (1, 0);
         Sexp_parser_edsl.Meta_parser.stop = (1, 26)},
        `nothing_left_to_try
          (Some "DSL Expression",
           "[DSL Expression]: try [{int}] then [{float}] then \
           [{{parse [Kwd +] and continue with [sequence [...]]}}]")))

Details

Let's see some details about the previous example:

let parser =
  let open Meta_parser in
  let rec expr_gram =
    lazy (Lazy.force (
  ...

the value describing the grammar in the EDSL is recursive since the grammar is receursive, it uses let rec and hence needs a lazy (Lazy.force ...) hack to be accepted.

Then we use the functions of the Meta_parser module to build the “program”:

# Meta_parser.try_in_order;;
- : ?name:string ->
    ('a, 'b) Sexp_parser_edsl.Meta_parser.t list ->
    ('a, 'b) Sexp_parser_edsl.Meta_parser.t
    = <fun>

The function try_in_order takes a list of grammars and tries them sequentially until one succeeds.

# Meta_parser.apply;;
- : ('a, 'error) Sexp_parser_edsl.Meta_parser.t ->
    f:('a ->
       loc:Sexp_parser_edsl.Meta_parser.location ->
       ('c, 'error) Sexp_parser_edsl.result) ->
    ('c, 'error) Sexp_parser_edsl.Meta_parser.t
= <fun>

To convert a the result of the parsing of a given grammar one must apply a function to it.

# let natural x ~loc = if x >= 0 then `Ok x else `Error (`not_positive (loc, x));;
val natural :
  int -> loc:'a -> [> `Error of [> `not_positive of 'a * int ] | `Ok of int ] =
  <fun>
# let natural_parser = Meta_parser.( apply integer ~f:natural);;
val natural_parser :
  (int, _[> `not_positive of Sexp_parser_edsl.Meta_parser.location * int ])
  Sexp_parser_edsl.Meta_parser.t = <lazy>

hence natural_parser is a parser which tries to parse positive integers and adds not_positive to the set of possible errors.