1. camlspotter
  2. ocaml-indent

Source

ocaml-indent / docs / ocaml-indent.rst

The default branch has multiple heads

OCaml-indent: OCaml source code indenter in OCaml

Code available at https://bitbucket.org/camlspotter/ocaml-indent/wiki/Home .

Motivation

To reindent OCaml program source code in editors, there are already several tools are available:

  • ocaml-mode for Emacs
  • tuareg-mode for Emacs
  • ocaml.vim for Vim
  • omlet.vim for Vim

All they are great, but are written in non OCaml language, even though they are for OCaml programming.

The drawback is clear: they are hard to customize (for OCaml programmers). Every OCaml user has his own taste in OCaml source code indentation. If an existing indenter provide configurable options for his taste, it is ok, but otherwise, he need to hack it, but it is not written in OCaml... Or, he must obey someone else's taste. :-(

For example, I myself use tuareg for years, and modified it for my style, but Elisp hacking is not so easy for me.

So, I wanted to have something more OCaml friendly: OCaml source code indenter written in OCaml itself. It should be much easier for me to fix it for my style, and probably easier for you to hack, too.

Design - Simple as possible

As an external helper

ocaml-indent is a simple command line tool, which takes OCaml source code text (and some command line options, of course), then prints out reindented code.

Each editor (Emacs, Vim, ...) must communicate with ocaml-indent for interactive reindentation, and of course someone must prepare an extension for the editor, but the coding should be minimum. For example, ocaml-indent.el for Emacs is just around 60LoC.

Lexer based

ocaml-indent is lexer based, and uses OCaml's lexer.mll.

OCaml is still an evolving language. At each version, its syntax is enriched. CamlP4 modules also extend OCaml syntax. Therefore, ocaml-indent must be flexible against these syntax (parser) changes, and cannot rely on some specific parser.mly. (Parser.mly also drops all the parentheses at parsing, and it is another drawback.)

On the other hand, its lexer (lexer.mll) is pretty stable. Lexer.mll ignores comments but it is very easy to modify it to preserve them.

The indent analysis of ocaml-indent is a small state machine, which observes a stream of lexer tokens, and updates its state including the indentation level of each source line. The analysis does not know the complete syntax of OCaml language, but a little, vaguely: for example, with must be paired with match, try, {, type or exception. (The coupling of type and exception is for type-conv P4 macros.) So far, the state update rules look enough simple and easy to fix/update/customize.

No backward parsing

ocaml-indent does not perform backward parsing. It parses (lexes) from the head of the source file.

Usually source code indenters are implemented as backward parsers: they parse source codes backward from the position of the reindentation target until enough information for the reindentation is obtained. Thus they minimize the amount of parsing.

In my adventure of parser combinators in OCaml ( http://camlspotter.blogspot.com/2011/05/planck-small-parser-combinator-library.html ), I have found OCaml's lexer (ocamllex and lexer.mll) is extremely fast. For example, OCaml lexer can parse all the *.ml and *.mli files in OCaml source tree in less than 1 second. It is more than 400000 lines/sec. The hugest FP source code I have ever seen in production is around 4000 lines (Don't let me tell where I saw it :-) Of course, I just cut it down into several files immediately.), and even it can be parsed in 0.01 sec.

Apparently, for ocaml-indent, there is no need of backward parsing(lexing). It can just use the good old lexer.mll.

So what's now ?

  • Testing, testing and testing.
  • Better editor interactivity. For example, if there is no change required, it should not try to modify the editing contents.

Future work ?

Now I have got a general lexer based reindetation framework, which is easy to modify and customize. This means that I can easily build indentation mode of my own future language!