mutated_ocaml / README.typeloc

typeloc branch: An attempt to improve unification error report.

This typeloc branch tries to print additional information at type unification errors, 
to show which type constructors actually contradict, with their source code locations of introduction.

The idea of this implementation is given by the implementation of Haskell type checker by Lennart Augustsson, demonstrated at his talk at CUFP 2011.

It should be also noted that the same idea was implemented in Opa langage. 
See .

What does it do?

The modified OCaml compiler prints out additional lines at type error reports.
For example, think about a simple language modeled by the following type t 
and a function names : t -> string list, which should extract tokens used in it::

    type t = 
      | Int of int
      | Null
      | List of t list
      | Tuple of t * t
      | String of string
    let rec names st = function
      | Int n -> n :: st 
      | Null -> st
      | List xs -> List.fold_left names st xs
      | Tuple (x,y) -> List.fold_left names st [x; y]
      | String s -> s :: st

Actually this code is rejected by the compiler::

    $ ocamlc -c
    File "", line 13, characters 16-17:
    Error: This expression has type string but an expression was expected of type

The error is reported at the last line of the code, the use of the variable s. 
It saids s is expected to have type int. But it is sure to be string and 
the programmer is writing a function which returns string... Nothing wrong around the line. Strange!
And why did the compiler confuse with string and int? Where did this int come from?

This example is so simple that it is easy to see that the problem is the first case which stores an integer to the list,
but generally speaking, in more complex function definitions we often face more complex instance of this sort of cryptic type errors,
whose fixes are actually far away from the reported error locations. 

This is since the ML type inference is actually a simple (but also powerful) constraint solver, using type unification.
Each syntax construct may introduce some constraints of types to the system, and once the infernece system finds
the accumulated constraints have no solution, it reports a type error with the source location which added the last constraint. 
But it does not always mean that this position is the place to be fixed. The programmer often feels::

     The type system tells that this expression of this position has this type and that type at the same time, but they conflict. Wow.
     But, first of all, I do not even understand why the system thinks this expression should have those types.

Can we improve this? Pin-pointing where to be fixed is of course impossible. 
The type inference is just a damn mathematical algorithm and cannot understand human intention, 
like "the function has the name "names" so it should return string list rather than int list".
But still there are things we can do . The typeloc branch provides additional information::

    The differing types are:
      string : introduced at File "", line 6, characters 14-20
      int : introduced at File "", line 2, characters 11-14

      #  GeeeZ! It is not a good example!!! The location should be at the use of variants not at the defs of them!!

This tells you that it found a conflict between the two type constructors string and int 
with the locations of syntax constructs which introduced them.
Each syntactic construct introduces very small type constraint and therefore easy to understand:
for example an expression 42 enforces its type to be an int.
Constructor (::) asks its second argument and return type to be some list. The programmer's feeling should now change::

     I understand this part of the code introduces this type constraint.
     I understand that part of the code introduces that type constraint.
     They are clearly incompatible.
     Then, who has introduced unification of them?

Of course this does not pin-point the place to be fixed, but you have more clues which are easier to understand.

Current status 
`Proof of concept' level implementation.

Technical details

In the typeloc branch, each type expression node (type_expr) has location information of where it is introduced.
If two type constructors with valid introduction locations are unified, one of them are chosen randomly for the unified type.
This extension introduces some complexity to the type system, and may impact on its efficiency:
the same type can be introduced many times by different source code locations.
For example, all the int types are all pointer equivalent in the original OCaml type checker, but 
it is no longer true in typeloc, with reflecting the fact that int can be introduced by difference part of source code.