camlspotter committed f23bfb7


  • Participants
  • Parent commits eceb3b0

Comments (0)

Files changed (3)

+- If there is no change required, it should be noticed to the editor, in order not to modify the buffer unnecessarily.
+- Cursor move after reindent.


+OCaml-indent: OCaml source code indenter in OCaml
+Code available at .
+To reindent OCaml program source code in editors, there are already several tools are available:
+- ocaml-mode for Emacs
+- tuareg-mode for Emacs 
+- ocaml.vim for Vim
+- omlet.vim for Vim
+All they are great, but are written in non OCaml language, even though they are for OCaml programming.
+The drawback is clear: they are hard to customize (for OCaml programmers).
+Every OCaml user has his own taste in OCaml source code indentation.
+If an existing indenter provide configurable options for his taste, it is ok, but
+otherwise, he need to hack it, but it is not written in OCaml... Or, he must obey someone else's taste. :-(
+For example, I myself use tuareg for years, and modified it for my style, 
+but Elisp hacking is not so easy for me.
+So, I wanted to have something more OCaml friendly: OCaml source code indenter written in OCaml itself.
+It should be much easier for me to fix it for my style, and probably easier for you to hack, too.
+As an external helper
+``ocaml-indent`` is a simple command line tool, which takes OCaml source code text
+(and some command line options, of course), then prints out reindented code.
+Each editor (Emacs, Vim, ...) must communicate with ``ocaml-indent`` 
+for interactive reindentation, and 
+of course someone must prepare an extension for the editor,
+but the coding should be minimum. 
+For example, ``ocaml-indent.el`` for Emacs is just around 60LoC.
+Lexer based
+``ocaml-indent`` is lexer based, and uses OCaml's ``lexer.mll``.
+OCaml is still an evolving language. 
+At each version, its syntax is enriched. CamlP4 modules also extend OCaml syntax. 
+Therefore, ``ocaml-indent`` must be flexible against these syntax (parser) changes,
+and cannot rely on some specific ``parser.mly``. 
+(Parser.mly also drops all the parentheses at parsing, and it is another drawback.)
+On the other hand, its lexer (``lexer.mll``) is pretty stable.
+Lexer.mll ignores comments but it is very easy to modify it to preserve them.
+The indent analysis of ``ocaml-indent`` is a small state machine, which 
+observes a stream of lexer tokens, and updates its state including the indentation level of each source line.
+The analysis does not know the complete syntax of OCaml language, but a little, vaguely:
+for example, ``with`` must be paired with ``match``, ``try``, ``{``,  ``type`` or ``exception``.
+(The coupling of ``type`` and ``exception`` is for type-conv P4 macros.)
+So far, the state update rules look enough simple and easy to fix/update/customize.
+No backward parsing
+``ocaml-indent`` does not perform backward parsing. It parses (lexes) from the head of the source file.
+Usually source code indenters are implemented as backward parsers:
+they parse source codes backward from the position of the reindentation target
+until enough information for the reindentation is obtained. 
+Thus they minimize the amount of parsing.
+In my adventure of parser combinators in OCaml
+( ), 
+I have found OCaml's lexer (``ocamllex`` and ``lexer.mll``) is extremely fast. 
+For example, OCaml lexer can parse all the ``*.ml`` and ``*.mli`` files in OCaml source tree
+in less than 1 second. It is more than 400000 lines/sec.
+The hugest FP source code I have ever seen in production is around 4000 lines
+(Don't let me tell where I saw it :-) Of course, I just cut it down into several files immediately.),
+and even it can be parsed in 0.01 sec.
+Apparently, for ``ocaml-indent``, there is no need of backward parsing(lexing).
+It can just use the good old ``lexer.mll``.