# thesis / ghc.tex

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 %------------------------ \chapter{Haskell and GHC} %------------------------ TODO: describe Haskell the language, and GHC the compiler. Also Core, the intermediate representation. \section{Haskell language} %------------------------- \label{sec:haskell} Haskell is a purely functional, lazy programming language. Purely functional means functions cannot have side effects or mutate data. Lazy means evaluation is delayed until required, \eg function arguments are passed unevaluated and then evaluated on demand. Delaying evaluation comes with several costs, \eg being less efficient and making it hard to predict space behavior of programs. Purity is pretty much required for lazy languages, but I/O is very clumsy in pure languages, so monadic I/O was invented for Haskell to solve this problem. A monad is a powerful abstraction that consist of a type constructor and a pair of functions: return and bind. Monads provide syntax to express structured procedures that are not otherwise supported by functional languages. Haskell also introduced type classes to support overloading of built-in numeric operators. Type classes define behavior of types (which behave similar to Java's interfaces) to provide support for ad-hoc polymorphism~\cite{hudak, ghc}. Other notable Haskell features include lazy evaluation, pattern matching, list comprehensions, and type inference. Type inference means type annotations are rarely needed despite Haskell being statically typed. Haskell's list comprehensions have been adopted by both Python and JavaScript~\cite{hudak, haskell98, ghc}. Type classes and Monads are perhaps Haskell's most distinctive design features, which have influenced many other programming languages. For example: Scala, Coq, and Rust have adopted both monads and type classes, while C\#'s \gls{linq} was directly inspired by Haskell's monad comprehensions~\cite{hudak}. Haskell was created to consolidate more than a dozen similar functional languages that was started in late 1970s and early 1980s. A meeting in January 1988 defined the following six goals for Haskell~\cite[p.~4]{hudak}: \begin{enumerate} \item It should be suitable for teaching, research, and applications, including building large systems. \item It should be completely described via the publication of a formal syntax and semantics. \item It should be freely available. Anyone should be permitted to implement the language and distribute it to whomever they please. \item It should be usable as a basis for further language research. \item It should be based on ideas that enjoy a wide consensus. \item It should reduce unnecessary diversity in functional programming languages. \end{enumerate} The second goal was not realized as Haskell's syntax and semantics have never been formally described. The plan for the last goal was to base Haskell on an an existing functional language called OL\@. This plan was abandoned early. Haskell has successfully achieved the majority of the remaining goals, although some features such as type classes were added without regard for goal five~\cite{hudak}. As Haskell is designed by committee, it is a rather big language, and there are usually more than one way to do something in Haskell. As the language grew it also quickly evolved, which was problematic for teaching and application that require stability. The committee therefor defined a stable version of the language, Haskell 98'', that implementations committed to support indefinitely. In 2005 design of Haskell´ (pronounced Haskell Prime) was started, to succeed Haskell 98 and to cover heavily used extensions~\cite{hudak}. Haskell 2010 is the latest stable version of the Haskell programming language~\cite{haskell2010}. \section{The Glasgow Haskell Compiler} %------------------------------------- \label{sec:ghc} \glsreset{ghc} \Gls{ghc} is the most fully featured Haskell compiler today, and has been the main Haskell implementation since the release of Haskell Report 1.0'' in 1990~\cite{hudak}. \citeauthor{ghc} states that today \gls{ghc} releases are downloaded by hundreds of thousands of people, the online repository of Haskell libraries has over 3,000 packages, \gls{ghc} is used to teach Haskell in many undergraduate courses''~\cite[p.~1]{ghc}. \gls{ghc} can be divided into three distinct parts: \begin{itemize} \item The compiler, a Haskell program that converts Haskell source code to machine code. \item The boot libraries that the compiler depend on. \item The runtime system. Large library of C code that handles running the compiled Haskell code, such as garbage collection, threads, and exceptions.~\cite{ghc} \end{itemize} The compiler itself is also divided into three parts: \begin{itemize} \item The compilation manager, which manages compilation of multiple Haskell files, the order of compilation and check if any modules does not require recompilation. \item The \gls{hsc}, which compiles a single Haskell file to machine code, depending on the selected backend. \item The pipeline, which handles Haskell code that interface with or require external programs. For example if a source file require preprocessing with a C preprocessor.~\cite{ghc} \end{itemize} The compiler is also a library, called \gls{ghc} \gls{api}, which provides access to internal parts of \gls{ghc} and allows working with Haskell code. \subsection{Compilation process} Compiling Haskell source files start with parsing with a fully functional parser that produces an \gls{ast} where identifiers are simple strings. The second step is called \textit{renaming}, where identifiers are turned into fully qualified names. This step also spots duplicate declarations, rearrange infix expressions, collect the equations of a functions together, and more. The third step is type checking. Any program that passes the type checker is type-safe and guaranteed to not crash at runtime. \subsection{Optimization's} \subsection{Core intermediate representation} \subsection{External Core representation} Either here or its own chapter?