1. camlspotter
  2. orakuda


orakuda / README.rst


Available at http://bitbucket.org/camlspotter/orakuda/ .

ORakuda is a library, CamlP4 extensions (and an optional tiny patch to CamlP4) which provides a handy way to write OCaml scripts a la Perl (or other scripting language). Its main features are:

  • PCRE expression and matching of Perl like syntax $/.../:

    str |! $/regexp/ -> ... | _ -> ...
  • Variable and expression references in string $"...":

    $"Your are ${name} and %{age}02d years old."
  • Sub-shell call by back-quotes $`...`:

    let status = $`wc` ~f:handle_output in ...
  • Easy hashtbl access tbl${key}:

    tbl${key} <- value for Hashtbl.replace tbl key value.

Some are reimplementations of existing CamlP4 extensions. ORakuda's main contribution is to provide more natural lexical interface for users of Perl and other script languages.

Name of the project

ORakuda has two meanings in Japanese:

  • 大(O)駱駝(Rakuda): 大(big) 駱駝(dromedary/bactrian camel)
  • おお(Oh)楽だ(Rakuda): "Oh, it's easy!"

A good name for Perlish OCaml, isn't it ?

How to build

Requirements: OCaml 3.12.0, findlib, pcre-ocaml, omake, spotlib

  • check out the repo:

    hg clone https://bitbucket.org/camlspotter/orakuda

  • (OPTIONAL) Apply patch/camlp4-lexer-plugin-0.5.patch to OCaml 3.12.0 source, then compile+install the compiler. The patch should work for upto 4.00.0 with trivial fixes.

  • build the library and pa_* extensions: yes no | omake + omake install

  • omake top_test launches a toplevel with the extension

How to use

For easier access to functions, users are recommend to declare:

open Rakuda.Std

at the begining of their source file.

(OPTIONAL) $ Prefix

If you extend your CamlP4, ORakuda extends OCaml's lexer and introduces new lexer rules prefixed by character $. $ means Perl (and of course money, you know). The character $ can be still used to define operators, as far as it does not conflict with ORakuda extensions.

If you use the original CamlP4, you cannot use $ Prefix sytanx, but still you can use the normal CamlP4 quoatations to write the same things.

Perl like PCRE $/.../ (or <:m<...>>)


$/regular expression/flags      (with patched P4)
<:m<regular expression>>
<:m<regular expression/flags>>

flags ::= [imsxU8]*    8 is for `UTF8

$/.../ expression creates a PCRE expression. You can write regexps more naturally than Pcre.create "..." where you have to escape '\' characters: Pcre.create "function\\(arg\\)" can be written more simply as $/function\(arg\)/ or <:m<function\(arg\)>>.

PCRE creation $/.../ or <:m<...>>

The type of $/.../ expression is not Pcre.regexp but 'a Regexp.t, where the type parameter 'a encodes accessor information of the regexp's groups. See GROUP OBJECT METHODS for details:

# $/(hello)world(?P<name>[a-z]+)/;;
- : < _0 : string; _1 : string; _2 : string; _group : int -> string;
      _groups : string array; _left : string;
      _named_group : string -> string;
      _named_groups : (string * string) list; _right : string;
      _unsafe_group : int -> string; name : string >

In non-toplevel environment, a regular expression by $/.../ is defined just ONCE at the top of the source file, no matter where it is declared. Therefore uses of $/.../ expressions inside frequently called functions have NO runtime penalty.


Regexp.exec_exn, (=~) and Regexp.exec are for simple regexp matching and return group objects if matches are successful. The matched groups can be retrieved through them:

# let res = "123 + variable12;;" =~ $/([a-z_][_A-Za-z0-9]*)/;;
val res :
< _0 : string; _1 : string; _group : int -> string; _groups : string array;
  _left : string; _named_group : string -> string;
  _named_groups : (string * string) list; _right : string;
  _unsafe_group : int -> string > =


_0 .. _9
Correspond with $0 .. $9 in Perl regexp match
_left, _right
Correspond with $` and $'
Accessor for named groups, defined by Python extension of named groups: (?P<name>regexp)
_named_gruop, _named_gruops, _unsafe_group
More primitive group accessor methods

Perl like PCRE case match $/.../ -> ...


$/regular expression/flags as var -> e
| $/regular expression/flags as var -> e
| ...
| _ -> e

Multiple regular expression pattern match cases can be written using PCRE case match expression $// -> ....

The case match expression is a |-separated list of cases $/regular expression/flags as var -> e which can be ended by an optional default case _ -> e. The entire case match expression is a function which takes a string then matches it against each regexp from the top to bottom. If one of the regexps of a case $/regexp/flags as var -> e matches with the string, then the expression e is evaluated with a binding the group object by var. If the expression e has no use of var, the binding can be omitted like $/regexp/flags -> e. In the case when none of the regexps matches with the string, and if the default case _ -> e exists, then e is evaluated. If there is no default case, the function raises Not_found.

PCRE case match expression is recommended to use with the pipe operator (|!) or (|>) (available Jane Street Core or OCaml Battries Included):

# "variable123"
  |! $/^[0-9]+$/ as v -> `Int (int_of_string v#_0)
  |  $/^[a-z][A-Za-z0-9_]*$/ as v -> `Variable v#_0
  |  _ -> failwith "parse error";;

- : [> `Int of int | `Variable of string ] = `Variable "variable123"

Perl like PCRE substitution $s/.../.../ or <:s<.../...>>


$s/regular expression/template/flags      with extended P4
<:s<regular expression/template>>
<:s<regular expression/template/flags>>

Perl like sprintf. $"..." or <:qq<...>>


$"..."         (with patched P4)

Short hand of Printf.sprintf "..." with inlined variable and expression embed by $-notation. It runs faster than Printf.sprintf, since the interpretation of the format string is done at compile time.


$"... $foo123 ..."
    Equivalent to Printf.sprintf "... %s ..." foo123

$"... ${Hashtbl.find tbl k} ..."
    Equivalent to Printf.sprintf "... %s ..." (Hashtbl.find tbl k)

    Equivalent to Printf.sprintf "...%02d..." var

    Equivalent to
        fun s -> Printf.sprintf "...%s...%02d..." s var

Perl like sub-shell call $`...` or <:qx<...>>


$`command line`         (with patched P4)
<:qx<command line>>

Sub-shell call of command line and retrieves its stdout/err outputs by function func. The string is replaced by an execution of the string by a function named command. You need to provide this command function in the context. For example, you can use Spotlib.Spot.Unix.shell_command.

Perl like hashtbl access tbl${key}

Macros provide Perl like syntax for Hashtbl.xxx functions.


tbl${key}             Hashtbl.find tbl key
tbl${key} <- data     Hashtbl.replace tbl key data

tbl$+{key}            Hashtbl.find_all tbl key data
tbl$+{key} <- data    Hashtbl.add tbl key data

tbl$?{key}            Hashtbl.mem tbl key