Matching on preexisting syntax properties tends to produce poor error messages

Issue #55 new
Alexis King created an issue

Using the define-typed-syntax clause that matches against a preexisting syntax property on the input syntax (that is, the form that corresponds to “checking” rules rather than “synthesis” rules) tends to produce extremely poor type errors. For example, take this extremely simple language with a canonical definition of λ:

; language.rkt
#lang turnstile

(provide #%module-begin #%top ann λ
         (typed-out [unit Unit]) (type-out -> Unit))

(define-base-type Unit)
(define unit 'unit)

(define-type-constructor -> #:arity = 2)

(define-typed-syntax (ann e:expr τ:type) 
  [ e  e-  τ.norm]
  -------------------------
  [ e-  τ.norm])

(define-typed-syntax λ
  [(_ x:id e:expr)  (~-> τ_in τ_out) 
   [[x  x- : τ_in]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)]]
  [(_ [x:id {~datum :} τ_in:type] e:expr) 
   [[x  x- : τ_in.norm]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)  (-> τ_in τ_out)]])

Now, consider this simple sample program:

#lang s-exp "language.rkt"
(ann (λ x unit) Unit)

This produces the following, quite poor, error message:

λ: bad syntax

The reason for this is obvious enough, but it is still unhelpful. The (~-> τ_in τ_out) pattern fails to match, so syntax-parse backtracks, but the second pattern also fails to match. Since syntax-parse can’t determine which pattern made more progress, it produces a simple “bad syntax” error, which is quite misleading for the user.

One way to fix this is to add a ~! to the first pattern somewhere to force syntax-parse to commit to the pattern, like this:

(define-typed-syntax λ
  [(_ x:id e:expr)  {~and ~! (~-> τ_in τ_out)} 
   [[x  x- : τ_in]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)]]
  [(_ [x:id {~datum :} τ_in:type] e:expr) 
   [[x  x- : τ_in.norm]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)  (-> τ_in τ_out)]])

This produces a better error message, though still not perfect:

λ: Expected -> type, got: Unit

This error is still pretty confusing, though, since really, the “expected” type is Unit, and the “actual” type is a function. Again, it makes sense when considering how syntax-parse synthesizes an error message, but it’s still poor.

It’s possible to manually produce a type error, of course, which allows the creation of a much nicer message:

(define-typed-syntax λ
  [(_ x:id e:expr)  τ_λ 
   #:fail-unless (->? #'τ_λ) (~a "expected " (type->str #'τ_λ) ", given a function")
   #:with (~-> τ_in τ_out) #'τ_λ
   [[x  x- : τ_in]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)]]
  [(_ [x:id {~datum :} τ_in:type] e:expr) 
   [[x  x- : τ_in.norm]  e  e-  τ_out]
   -------------------------
   [ (λ- (x-) e-)  (-> τ_in τ_out)]])

…but this feels rather against the spirit of the Turnstile DSL. It would be nice if Turnstile could help to synthesize a better error message itself.

Comments (12)

  1. Alex Knauth

    You don't have to go all the way to #:fail-unless to get a more specific error message, even this does that:

      [(_ x:id e:expr)  τ_λ 
       #:with (~-> τ_in τ_out) #'τ_λ
       ....]
    

    It might look like these should be equivalent, but they aren't in errors because of progress. In various places in Turnstile we've put ~post patterns in to prioritize error messages. If there are two different classes errors that have the same priority, you get bad syntax. The #:with patterns are set up to have more progress than the initial pattern, so a #:with will give it more priority.

    Should I add a ~post to the expected type pattern here to give its errors higher priority? With that change this becomes

    . λ: Expected -> type, got: Unit
    

    While highlighting the Unit in (ann (λ x unit) Unit). Would that be good enough? The other question is whether this would mess up any other error messages.

  2. Alexis King reporter

    Yes, that error is equivalent to my example using a cut, so it’s certainly far better than “bad syntax” without any additional explanation. It’s still considerably worse than the handwritten error message, though, for the same reason I explained above. On the other hand, that error isn’t any worse than this one, for the same reason:

    (define-typed-syntax #%app
      [(_ f:expr e:expr) 
       [ f  f-  (~-> τ_in τ_out)]
       [ e  e-  τ_in]
       -------------------------
       [ (#%app- f- e-)  τ_out]])
    
    (unit unit)
    
    #%app: Expected -> type, got: Unit in: (#%app Unit2)
    

    This error is slightly better in some respects (the expected/actual types are in the expected order), but it also manages to be considerably worse, since it doesn’t actually highlight the right source location!

    I think, for both of these scenarios, it’s reasonable to ask Turnstile to provide more intelligent type errors. I think all the necessary information is available, but it likely requires helping syntax-parse a little more than Turnstile already does (and possibly relying a little bit less on the built-in error message synthesis, cute as it currently is).

  3. Stephen Chang repo owner

    Thanks for the report. The bad syntax is definitely unacceptable. I think it makes sense to add a ~post to the expected type (even though it looks like we already have a few) since the first clause has made more progress than the second one due to the matched expression.

    I also agree about the err msg. It does look like we already try to do the right thing:

    https://bitbucket.org/stchang/macrotypes/src/31c3bba5c976248fc7f19b540cc631aace710e0c/turnstile/turnstile.rkt?at=master&fileviewer=file-view-default#turnstile.rkt-51

    so I will look into why that msg is not firing.

  4. Stephen Chang repo owner

    I'm about to push a fix. One small problem is that Turnstile does not know what the type should be, other than a pattern, which is an implementation detail. Do you think it's ok to expose this pattern to the programmer?

    Here's what the message would say:

    ; language-prog.rkt:3:5: TYPE_ERR: expression cannot have the type expected by its context
    ;   expr: (λ x unit)
    ;   context expects type: Unit
    ;   expr must have type matching pattern: (~-> τ_in τ_out)
    
  5. Alex Knauth

    Should we let the pattern specify what it expects? Currently the ~-> pattern says Expected -> type, but other pattern expanders (for example a more complicated ~>* pattern that handles keywords, optional arguments, and rest arguments) should be free to specify failure messages in different ways.

    Just adding ~post allows that, but at the cost of not knowing that it's a problem with the expected type as opposed to the expression. Would a ~post + a ~describe pattern give the best of both?

    (~post (~parse (~describe "expected-type" τ-pat) #'τ))
    
    .../bad-expected-type.rkt:26.5: λ: Expected -> type, got: Unit
      at: (#%app Unit2)
      in: (λ x unit)
      parsing context: 
       while parsing expected-type
        term: (#%app Unit2)
        location: .../bad-expected-type.rkt:26.16
    

    Another thing to consider, there could be multiple \Leftarrow rules with different expected types, for example one for ~->, one for ~->*, and one for ~case->. In those cases what should the error message be?

  6. Alexis King reporter

    Do you think it's ok to expose this pattern to the programmer?

    I considered that. Personally, I think exposing the pattern to the user isn’t good enough. That pattern could be arbitrarily complex, and I don’t think the two things should be necessarily coupled. The pattern really ought to be an implementation detail.

    Consider syntax/parse itself for a moment. It is an industrial-strength library for parsing macros, and it provides error reporting. It works for everything from the simplest macros to the most complex, so its error reporting must similarly scale. Since it’s clear that error messages cannot always be synthesized without any help, syntax/parse provides lots of ways to help and/or tweak the error generation when the defaults aren’t smart enough. It allows macro authors to annotate patterns and use classes. Importantly, though, these tools for improving error messages can often be added just once, in a syntax class, and it doesn’t really require users to break away from the nice, declarative DSL in order to get good errors.

    Similarly, I think Turnstile should ideally be an industrial-strength library for creating type systems. Obviously, it will have constraints, but it would be nice if it could scale similarly well. It’s fine to request that users add additional annotations in order to get the best error messages. For example, consider syntax classes: in order for syntax/parse to generate good messages, users usually have to provide a #:description clause. Otherwise, syntax/parse just generates best-effort messages.

    Applying that line of thinking to Turnstile, there are a couple of principles that make sense:

    1. Turnstile should synthesize best-effort error messages, but these error messages don’t have to be perfect.
    2. Error messages should be improvable by adding user-provided annotations, where desired.
    3. Annotations should ideally only needed to be added in one place, and they should not compromise the declarative nature of the DSL.

    The obvious place to put this annotation is define-type-constructor, since define-type-constructor has some parallels with define-syntax-class. Consider being able to do this:

    (define-type-constructor ->
      #:arity = 2
      #:description "function type")
    

    Without the #:description annotation, it should probably be inferred, so something like -> type, which is similar to the existing error message but does not reveal the underlying pattern.

  7. Alex Knauth

    Exactly, type patterns should be able to specify their failure messages for Turnstile or syntax-parse to use.

  8. Stephen Chang repo owner

    Actually, adding #:description to define-type-constructor does not solve the problem since the test message will still say that that type is the "expected type", which makes sense from the pattern-expander's perspective.

    Concretely, this example would still error with "expected function type, got Unit". Is this ok?

    In order to produce the better message, the error must be external to the type, ie from the rule. The rule would need a #:expected-type-description or something, which feels less nice.

    Will continue to think about this.

  9. Alex Knauth

    That's where the the extra ~describe pattern on the other end helps a little bit. When Turnstile matches the expected type against the pattern, it currently does this:

    (~parse τ-pat #'τ)
    

    If we change it to this:

    (~post (~parse (~describe "expected-type" τ-pat) #'τ))
    

    Then the error message will include while parsing expected-type. Is that good enough?

  10. Stephen Chang repo owner

    Yes, that's where I currently have the new err msg (in my unpushed commit). But the rule still needs to specify it somewhere that looks nice.

  11. Alexis King reporter

    Just reusing syntax/parse patterns is cute and elegant, but perhaps it’s not necessarily worth tethering yourselves to them? If it isn’t possible to get good enough type errors just from syntax classes and syntax/parse’s default error synthesis, it would be possible to create separate type patterns that attach the necessary static information to generate good errors.

  12. Log in to comment