> The job of a compiler error message is to prove to the user that their code is invalid
The job of the compiler error message is to convey why the compiler couldn’t demonstrate the code is correct. The code may be valid but the compiler just can’t prove it (analyses are sound but not complete unless the language is strongly normalizing)
Practically the difficulty of generating error messages is one reason why "we can't have nice things". I ran into this problem while trying to write an adventure game (like Zork) in Drools. Drools sucks in a lot of stuff from a Java compiler and mashes it into a production rules framework and the result is I was getting error messages that made no sense at all and thus gave up on the project.
One approach to compiler development is one of successive transformations and it is possible to cut compilers into very thin slices like the original FORTRAN for the IBM 1401. Similarly you can go beyond the methods taught in Graham's On Lisp to do rather complex metaprogramming in Lisp or some other language, trouble is that an error manifests at stage N+5 but the information required to explain what went wrong was elided at stage N so it is quite difficult to figure out what's going wrong.
Sure, but the goal when designing a compiler and its error messages is to prove to the user that their code in invalid. Most practical type systems are going to be either unsound or incomplete (or both), but the goal should be to make encountering those limitations rare in practice.
It seems to me that, if the limitations are encountered too often in practice, then people stop using that language to write that kind of code, because it's too frustrating to use that way.
A distinction without a difference. Perhaps "prove" is sounding too mathematical. I could rephrase it as "the goal...is to demonstrate to the user what is invalid about their code."
To prove is to demonstrate, and the way to do it is to display information about what is invalid about the code. So all these ways of phrasing it are synonymous.
One way of formally defining a programming languages is as the set of strings accepted as programs by the compiler, or by the formally defined algorithm that is implemented by the compiler. By such a definition, a given string is not a valid program of the respective programming language if the algorithm rejects it. And it is useful for the error message to show how the input program fails the requirements of the algorithm. The input program may be sound in type-system terms, but still not valid for the respective programming language.
I've been working on a type checker for a language with ad-hoc overloading and what I did was have the checker proceed iteratively, making passes over the set of constraints and applying deduction rules. So it never guesses, branches, or has to backtrack. If it can't make progress because there's too much overloading, it gives up and asks the user to add some type annotations. I suspect this will actually work quite well in practice even if it can't type check some valid programs.
Hot take: the primary role of a compiler is to turn malformed code into human readable diagnostics. They are error reporting tools with a codegen is side-gig.
This can and should be a separate tool, probably called a linter. The compiler dealing with codegen can then just say "nope, that's not a program in this language, try the linter", optionally calling into the linter for you.
This does of course mean that divergent ideas of "correct" between linter and compiler are confusing and really do need to be avoided for happy developer times.
> The job of a compiler error message is to prove to the user that their code is invalid
The job of the compiler error message is to convey why the compiler couldn’t demonstrate the code is correct. The code may be valid but the compiler just can’t prove it (analyses are sound but not complete unless the language is strongly normalizing)
One approach to compiler development is one of successive transformations and it is possible to cut compilers into very thin slices like the original FORTRAN for the IBM 1401. Similarly you can go beyond the methods taught in Graham's On Lisp to do rather complex metaprogramming in Lisp or some other language, trouble is that an error manifests at stage N+5 but the information required to explain what went wrong was elided at stage N so it is quite difficult to figure out what's going wrong.
The goal when designing a compiler and its error messages should be to give the user the information they need to correct their code.
This is closer, yes.
This does of course mean that divergent ideas of "correct" between linter and compiler are confusing and really do need to be avoided for happy developer times.