Source

thesis / ghc.tex

Full commit
%------------------------
\chapter{Haskell and GHC}
%------------------------
TODO: describe Haskell the language, and GHC the compiler.
Also Core, the intermediate representation.


\section{Haskell language}
%-------------------------
\label{sec:haskell}
Haskell is a purely functional, lazy programming language. Purely functional
means functions cannot have side effects or mutate data. Lazy means
evaluation is delayed until required, \eg function arguments are passed
unevaluated and then evaluated on demand. Delaying evaluation comes with
several costs, \eg being less efficient and making it hard to predict space
behavior of programs. Purity is pretty much required for lazy languages, but
I/O is very clumsy in pure languages, so monadic I/O was invented for Haskell
to solve this problem. A monad is a powerful abstraction that consist of a type
constructor and a pair of functions: return and bind. Monads provide syntax to
express structured procedures that are not otherwise supported by functional
languages. Haskell also introduced type classes to support overloading of
built-in numeric operators. Type classes define behavior of types (which behave
similar to Java's interfaces) to provide support for ad-hoc
polymorphism~\cite{hudak, ghc}.

Other notable Haskell features include lazy evaluation, pattern matching, list
comprehensions, and type inference. Type inference means type annotations are
rarely needed despite Haskell being statically typed. Haskell's list
comprehensions have been adopted by both Python and
JavaScript~\cite{hudak, haskell98, ghc}.

Type classes and Monads are perhaps Haskell's most distinctive design features,
which have influenced many other programming languages. For example: Scala,
Coq, and Rust have adopted both monads and type classes, while C\#'s \gls{linq}
was directly inspired by Haskell's monad comprehensions~\cite{hudak}.

Haskell was created to consolidate more than a dozen similar functional
languages that was started in late 1970s and early 1980s. A meeting in January
1988 defined the following six goals for Haskell~\cite[p.~4]{hudak}:
\begin{enumerate}
    \item It should be suitable for teaching, research, and applications,
        including building large systems.
    \item It should be completely described via the publication of a formal
        syntax and semantics.
    \item It should be freely available. Anyone should be permitted to
        implement the language and distribute it to whomever they please.
    \item It should be usable as a basis for further language research.
    \item It should be based on ideas that enjoy a wide consensus.
    \item It should reduce unnecessary diversity in functional programming
        languages.
\end{enumerate}
The second goal was not realized as Haskell's syntax and semantics have never
been formally described. The plan for the last goal was to base Haskell on an
an existing functional language called OL\@. This plan was abandoned early.
Haskell has successfully achieved the majority of the remaining goals, although
some features such as type classes were added without regard for goal
five~\cite{hudak}.

As Haskell is designed by committee, it is a rather big language, and there are
usually more than one way to do something in Haskell. As the language grew it
also quickly evolved, which was problematic for teaching and application that
require stability. The committee therefor defined a stable version of the
language, ``Haskell 98'', that implementations committed to support
indefinitely. In 2005 design of Haskell´ (pronounced Haskell Prime) was
started, to succeed Haskell 98 and to cover heavily used
extensions~\cite{hudak}. Haskell 2010 is the latest stable version of the
Haskell programming language~\cite{haskell2010}.


\section{The Glasgow Haskell Compiler}
%-------------------------------------
\label{sec:ghc}
\glsreset{ghc}
\Gls{ghc} is the most fully featured Haskell compiler today, and has been the
main Haskell implementation since the release of ``Haskell Report 1.0'' in
1990~\cite{hudak}. \citeauthor{ghc} states that today ``\gls{ghc} releases are
downloaded by hundreds of thousands of people, the online repository of Haskell
libraries has over 3,000 packages, \gls{ghc} is used to teach Haskell in many
undergraduate courses''~\cite[p.~1]{ghc}.

\gls{ghc} can be divided into three distinct parts:
\begin{itemize}
	\item The compiler, a Haskell program that converts Haskell source code to
        machine code.
	\item The boot libraries that the compiler depend on.
	\item The runtime system. Large library of C code that handles running the
		compiled Haskell code, such as garbage collection, threads, and
		exceptions.~\cite{ghc}
\end{itemize}
The compiler itself is also divided into three parts:
\begin{itemize}
    \item The compilation manager, which manages compilation of multiple
        Haskell files, the order of compilation and check if any modules
        does not require recompilation.
    \item The \gls{hsc}, which compiles a single Haskell file to machine code,
        depending on the selected backend.
    \item The pipeline, which handles Haskell code that interface with or
        require external programs. For example if a source file require
        preprocessing with a C preprocessor.~\cite{ghc}
\end{itemize}
The compiler is also a library, called \gls{ghc} \gls{api}, which provides
access to internal parts of \gls{ghc} and allows working with Haskell code.

\subsection{Compilation process}
Compiling Haskell source files start with parsing with a fully functional
parser that produces an \gls{ast} where identifiers are simple strings.

The second step is called \textit{renaming}, where identifiers are turned into
fully qualified names. This step also spots duplicate declarations, rearrange
infix expressions, collect the equations of a functions together, and more.

The third step is type checking. Any program that passes the type checker is
type-safe and guaranteed to not crash at runtime.


\subsection{Optimization's}

\subsection{Core intermediate representation}

\subsection{External Core representation}
Either here or its own chapter?