+\DeclareMathAlphabet{\mathsfsl}{OT1}{cmss}{m}{sl}

\hyphenation{ana-ly-ti-cally}

-\DeclareMathAlphabet{\mathsfsl}{OT1}{cmss}{m}{sl}

+\title{Gaussian Process Regression with OCaml\\Version 0.9}

+\author{Markus Mottl\footnote{\mail}}

+This manual documents the implementation and use of the OCaml GPR library for

+Gaussian Process Regression with OCaml.

+The OCaml GPR library features implementations of many of the latest

+developments in the currently heavily researched machine learning area of

+Gaussian process regression.

+Gaussian processes define probability distributions over functions. This allows

+us to apply probabilistic reasoning to problems where we are dealing with

+\emph{latent functions}, i.e.\ functions that cannot be directly observed or

+known with certainty. By specifying prior knowledge about these distributions

+in form of a mean function and a \emph{covariance function}\footnote{Sometimes

+also called a \emph{kernel}.} and making use of Bayes' theorem, Gaussian

+processes provide us with a rigorous nonparametric way of computing posterior

+distributions over latent functions given data, e.g.\ to solve regression

+problems\footnote{Gaussian processes can also be used for classification

+purposes. This is by itself a large research area, which is not covered by this

+library.}. As more data becomes available, the Gaussian process framework

+learns an ever more accurate reflection of the probability distribution of

+latent functions that probably generate the observed data.\\

+Due to their mathematically elegant nature, Gaussian processes allow for

+analytically tractable calculation of the posterior mean and covariance

+functions. Though it is easy to formulate the required equations, GPs come at a

+usually intractably high computational price for large problems. Typically,

+only problems of up to a few thousand samples can be solved within reasonable

+time. Efficient approximation methods have been developed in the recent past to

+address this shortcoming, and this library makes heavy use of them.\\

+Gaussian processes are true generalizations of e.g.\ linear regression, ARMA

+processes, single-layer neural networks with an infinite number of hidden units,

+and many other more widely known approaches and modeling techniques. GPs are

+closely related to support vector- (SVM) and other kernel machines, but have

+features that may make them a more suitable choice in many situations. For

+example they offer predictive variances, Bayesian model selection, sampling from

+the posterior distribution, etc.\\

+It would go beyond the scope of this library documentation to provide for a

+detailed treatment of Gaussian processes. Hence, readers unfamiliar with this

+approach may want to consult online resources, of which there are plenty. This

+section presents an overview of recommended materials.

+\subsection{Video tutorials}

+Video tutorials are probably best suited for quickly developing an intuition and

+basic formal background of Gaussian processes and perspectives for their

+\item \emph{\footahref{http://videolectures.net/gpip06\_mackay\_gpb}{Gaussian

+Process Basics}}: David MacKay's lecture given at the \emph{Gaussian Processes

+in Practice Workshop} in 2006. This one hour video tutorial uses numerous

+graphical examples and animations to aid understanding of the basic principles

+behind inference techniques based on Gaussian processes.

+\emph{\footahref{http://videolectures.net/epsrcws08\_rasmussen\_lgp}{Learning

+with Gaussian Processes}}: a slightly longer, two hour video tutorial series

+presented by Carl Edward Rasmussen at the Sheffield EPSRC Winter School 2008,

+which goes into somewhat more detail.

+\emph{\footahref{http://videolectures.net/mlss07\_rasmussen\_bigp}{Bayesian

+Inference and Gaussian Processes}}: readers interested in a fairly thorough,

+from the ground up treatment of Bayesian inference techniques using Gaussian

+processes may want to watch this five hour video tutorial series presented by

+Carl Edward Rasmussen at the MLSS 2007 in T\"ubingen.

+\subsection{Books and papers}

+The following texts are intended for people who need a more formal treatment and

+theory. This is especially recommended if one wants to be able to implement

+Gaussian processes and their approximations efficiently.

+\emph{\footahref{http://www.gatsby.ucl.ac.uk/\home{snelson}/thesis.pdf}{Flexible

+and efficient Gaussian process models for machine learning}}: Edward Lloyd

+Snelson's PhD thesis \cite{SnelsonThesis} offers a particularly readable

+treatment of modern inference and approximation techniques that avoids heavy

+formalism in favor of intuitive notation and clearly presented high-level

+concepts without sacrificing detail needed for implementation. This library

+\item \emph{\footahref{http://www.gaussianprocess.org/gpml}{Gaussian Processes

+for Machine Learning}}: many researchers in this area would call this book,

+which was written by Carl Edward Rasmussen and Christopher K.\ I.\ Williams,

+the ``bible of Gaussian processes''. It presents a rigorous treatment of the

+underlying theory for both regression and classification problems, and more

+general aspects like properties of covariance functions, etc. The authors have

+kindly made the full text and Matlab sources available online. Their

+\footahref{http://www.gaussianprocess.org}{Gaussian process website} also lists

+a great wealth of other resources valuable for both researchers and

+References to research about specific techniques used in the OCaml GPR library

+are provided in the bibliography.

+\section{Features of OCaml GPR}

+Among other things the OCaml GPR library currently offers:

+\item Sparse Gaussian processes using the FI(T)C\footnote{\emph{Fully

+Independent (Training) Conditional}} approximations for computationally

+tractable learning (see \cite{conf/nips/2005}, \cite{SnelsonThesis}). Unlike

+some other approximations that lead to degeneracy, FI(T)C maintains sane

+posterior variances, at almost no extra computational cost.

+\item Safe and convenient API for computing posterior means, variances,

+covariances, log evidence, for sampling from the posterior distribution,

+calculating statistics of the quality of fit, etc. The OCaml type and module

+system as used by the API make sure that many easily made programming errors can

+be avoided and guide the user to make efficient use of the library.

+\item Optimization of hyper parameters via evidence maximization\footnote{Also

+known as type II maximum likelihood.}, including optimization of inducing inputs

+(SPGP algorithm\footnote{This library exploits sparse matrix operations to

+achieve optimum big-O complexity when learning inducing inputs with the SPGP

+algorithm, but also for multiscales and other hyper parameters that imply sparse

+derivative matrices for the marginal log likelihood.}).

+\item Supervised dimensionality reduction, and improved predictions under

+heteroskedastic noise conditions (see \cite{conf/uai/SnelsonG06},

+\item Sparse multiscale Gaussian process regression (see

+\cite{conf/icml/WalderKS08}).

+\item Variational improvements to the approximate posterior distribution (see

+\item Numerically stable GP calculations using QR-factorization to avoid the

+more commonly used and numerically unstable solution of normal equations via

+Cholesky factorization (see \cite{Foster2009}).

+\item Consistent use of BLAS/LAPACK and C-code throughout key computational

+parts of the library for optimum performance and conciseness.

+\item Functors for plugging arbitrary covariance functions into the framework.

+There is no constraint on the type of covariance functions, i.e.\ also string

+inputs, graph inputs, etc., could potentially be used with ease given suitable

+covariance functions\footnote{The library is currently only distributed with

+covariance functions that operate on multivariate numerical inputs. Interested

+readers may feel free to contribute others.}.

+\item Rigorous test suite for checking user-provided derivatives of covariance

+functions, which are usually quite hard to implement correctly, and self-test

+code to verify derivatives of marginal log likelihood functions using finite

+\chapter{Using the library}

+\section{Interface documentation}

+The most important file for understanding the API is called \verb=interfaces.ml=

+and contained in the \verb=lib= directory. It is already heavily documented so

+we will only provide for a high-level view here. Please refer to the OCaml file

+for details. Besides defining a few types, e.g.\ representations for sparse

+matrices that users will have to use when communicating covariance matrices to

+the system, the interfaces file contains two important submodules:

+\item \verb=Specs=, which contains signatures that users need to provide for

+specifying covariance functions:

+\item \verb=Kernel=, the signature for accessing the datastructure that

+determines a covariance function (= kernel) and its parameters.

+\item \verb=Eval=, the signature of modules users have to implement to evaluate

+\item \verb=Kernel=, a module satisfying the \verb=Kernel= signature above.

+\item \verb=Inducing=, for evaluating covariances among inducing points.

+\item \verb=Input=, for evaluating covariances involving single input points and

+\item \verb=Inputs=, for evaluating covariances involving multiple input

+points\footnote{Required separately besides evaluation of single inputs to force

+the user to think about how to optimize for this case, which is quite

+important.} and inducing inputs.

+\item \verb=Deriv=, the signature of modules users have to implement to compute

+derivatives of covariance functions:

+\item \verb=Eval=, a module satisfying the \verb=Eval= signature above.

+Derivative code without the ability to evaluate functions would be rather

+\item \verb=Hyper=, a module specifying the type of hyper parameters, for which

+derivatives can be computed.

+\item \verb=Inducing= and \verb=Input=, which provide a similar abstraction for

+derivatives as the modules of same name provide for evaluation functions in

+\verb=Eval=. Note that computations between covariance evaluations and

+derivatives can be shared. This is especially useful and efficient for

+covariance functions that use the exponential function.

+\item \verb=Sigs=, which contains signatures a Gaussian process framework will

+provide once it has been instantiated with a given covariance specification:

+\item \verb=Eval=, which contains modules the user can access to perform

+computations in the Gaussian process framework:

+\item \verb=Spec=, the user-provided specification of the covariance function as

+mentioned further above.

+\item \verb=Inducing=, which contains functions to select and evaluate inducing

+\item \verb=Input=, for dealing with single inputs.

+\item \verb=Inputs=, for dealing with multiple inputs.

+\item \verb=Model=, for dealing with models. A model is specified by the inputs

+it consists of and the noise level.

+\item \verb=Trained=, for dealing with trained models. A trained model is a

+model that has been trained on a given target vector.

+\item \verb=Stats=, for computing statistics of the trained model (quality of

+\item \verb=Mean_predictor=, minimalist datastructure for making mean

+\item \verb=Mean=, a posterior mean for a single point.

+\item \verb=Means=, multiple means.

+\item \verb=Co_variance_predictor=, minimalist datastructure for making

+(co-)variance predictions.

+\item \verb=Variance=, a posterior variance for a single point.

+\item \verb=Variances=, multiple posterior variances.

+\item \verb=Covariances=, posterior covariances.

+\item \verb=Sampler=, sampling at a single point.

+\item \verb=Cov_sampler=, sampling at multiple points (accounting for their

+\item \verb=Deriv=, which contains modules the user can access to perform

+derivative computations for marginal log likelihoods within the Gaussian process

+\item \verb=Eval=, module satisfying the \verb=Eval= signature mentioned above.

+\item \verb=Deriv=, module containing all the derivative code:

+\item \verb=Spec=, the user specification for covariance function derivatives.

+\item \verb=Inducing=, \verb=Inputs=, \verb=Model=, \verb=Trained=, basically

+mirror the role of the modules of same name in the \verb=Eval= signature.

+\item \verb=Test= contains functions for testing both derivative code supplied

+by the user and internal code using finite differences.

+\item \verb=Optim= contains submodules for optimizing Gaussian processes,

+currently only the \verb=Gsl= submodule, which uses the GNU scientific library

+\section{Predefined covariance functions}

+The following modules implementing covariance functions already come with the

+\item \verb=Cov_const=: the covariance of a constant function.

+\item \verb=Cov_lin_one=: the covariance of a linear function with a single

+\item \verb=Cov_lin_ard=: the covariance of a linear function with

+\emph{Automatic Relevance Determination} (ARD).

+\item \verb=Cov_se_iso=: isotropic squared exponential covariance with amplitude

+and length scale hyper parameters.

+\item \verb=Cov_se_fat=: a highly parameterizable (``fat'') squared exponential

+covariance function with amplitude, dimensionality reduction, multiscales, and

+heteroskedastic noise support.

+\chapter{Example applications}

+There are currently three applications that are part of the distribution, two

+for testing the library, and one simple command-line tool for solving regression

+problems presented in comma-separated (CSV) files.

+\section{Test applications}

+The directory \verb=test= contains two applications.

+\subsection{Derivative testing}

+The application \verb=test_derivatives.opt= generates random data to run the

+internal test suite for checking the correctness of derivatives of the marginal

+log likelihood function for the ``fat'' squared exponential covariance function.

+It prints out the hyper parameters it is currently testing and would fail with

+an appropriate message if there were a problem.

+\subsection{Test case for learning}

+The application \verb=save_data.opt= evaluates random inputs on some known,

+nonlinear, one-dimensional function, adds noise, and then trains a Gaussian

+process to learn the function from the data. The application writes out a

+number of results that can subsequently be visualized using

+\footahref{http://www.r-project.org}{R} and cross-verified with a simple

+\footahref{http://www.gnu.org/software/octave}{Octave} script.\\

+Just run \verb=save_data.opt=. It will print out progress information while

+performing evidence maximization to find suitable hyper parameters and locations

+for inducing inputs. This will not usually take more than a few seconds (often

+just a fraction) unless the randomly chosen initial state leads to a bad local

+optimum that is surrounded by an almost flat surface. Just restart the run in

+this unlikely case. Gaussian processes do not seem overly prone to overfitting,

+but depending on the problem and the chosen covariance function, evidence

+maximization may become trapped in local optima. These local optima can be

+interpreted as alternative solutions if they fit about equally well though. Once

+the application finishes, it will store results in the \verb=test/data=

+\subsubsection{Visualisation of results}

+We can now execute \verb=R= on the command-line to run the visualization script

+for this data by typing:

+This will visualize the data points, true mean and true confidence intervals,

+the inferred mean function and its confidence intervals, samples of candidate

+latent functions from the posterior distribution, locations of inducing inputs,

+\subsubsection{Verification against Octave implementation}

+There is also a small test suite for comparing results of the OCaml library to

+equations written in Octave. It, too, depends on the data saved above by

+\verb=save_data.opt=. The Octave test suite does not use particularly efficient

+ways of computing its results, but is fairly simple and readable. It calls

+\footahref{http://www.gatsby.ucl.ac.uk/\home{}snelson/SPGP\_dist.tgz}{Edward

+Snelson's SPGP implementation} for reference. The user may want to compile the

+more efficient \verb=dist.c= file from within Octave first:

+\noindent Then source the test suite:

+\section{Command-line tool}

+The application \verb=app/ocaml_gpr.opt= employs the OCaml GPR library for

+implementing a simple utility to train and evaluate Gaussian process models

+using the ``fat'' squared exponential covariance function and the variational

+improvement \cite{Titsias2009} for model selection. It reads comma-separated

+values from standard input for both training and testing. This application is

+considered to be an example only, but will likely be extended in the future for

+Datasets for regression problems that one may want to try out for testing can be

+downloaded from many sites, one of the most well-known being the

+\footahref{http://archive.ics.uci.edu/ml/index.html}{UCI Machine Learning

+\subsection{Training models}

+Here is an example invocation:

+ ocaml_gpr.opt -verbose -cmd train -model foo.model < data.csv

+It is assumed that the file \verb=data.csv= is comma-separated and that the last

+column contains the target values. The trained model will be stored in file

+\verb=foo.model=. It is generally recommended to use the \verb=-verbose= flag

+for training, which will display various statistics at most once a second during

+training iterations on standard error, e.g.:

+ target variance: 84.41956

+ iter 1: MSLL=18.9074776 SMSE=0.3878503 MAD=3.8968803 MAXAD=32.1662739

+ iter 1: |gradient|=29911.88895

+ iter 171: MSLL=-0.8875856 SMSE=0.2672445 MAD=2.9789019 MAXAD=34.0733409

+ iter 171: |gradient|=57.08112

+The user can interrupt training at any time by pressing \verb=CTRL-C= if the

+result seems good enough. The best model found so far, as determined by the

+mean standardized log loss (MSLL), will then be saved to the specified model

+file. It is also possible to specify a maximum number of iterations using the

+flag \verb=-max-iter=. Otherwise the optimizer parameters (see below) determine

+\subsubsection{Training flags}

+Various flags can be passed to parameterize the learning process:

+\item \verb=-n-inducing= sets the number of inducing inputs. The more points

+are used, the more flexible the function that can be learnt. Note that using as

+many inducing points as there are inputs will not necessarily yield the full

+Gaussian process, because the used approximation methods may also model

+heteroskedastic noise. Furthermore, the computational effort increases as

+$O(M^3)$, $M$ being the number of inducing inputs. The number of inducing

+inputs will by itself not lead to overlearning, i.e.\ more is usually rather

+better than worse. But increasing this number may lead to a larger number of

+local optima and hence not necessarily better results.

+\item \verb=-sigma2= sets the initial noise level hyper parameter.

+\item \verb=-amplitude= sets the initial amplitude hyper parameter.

+\item \verb=-dim-red= allows setting the target dimension for dimensionality

+reduction of the input data. None will be performed otherwise. Note that one

+can also specify the full dimensionality of the original input data, in which

+case it will be subject to a fully general linear transformation, which will be

+learnt in a supervised way in order to reveal useful features.

+\item \verb=-log-het-sked= turns on support for improved learning of

+heteroskedastic noise and sets the initial value for the logarithm of the

+associated hyper parameters. Negative values may often be required to avoid

+getting trapped in bad optima right at the start.

+\item \verb=-multiscale= turns on learning of multiscale parameters.

+\item \verb=-tol=, \verb=-step=, and \verb=-eps= set the line search tolerance,

+the initial step size, and the stopping criterion (gradient norm) for the

+GSL-optimizer respectively.

+It usually requires some experimentation to find out what kinds of parameters

+may be most suitable for a given problem.

+\subsection{Applying models}

+Here is an example on how to apply models to test sets:

+ ocaml_gpr.opt -cmd test -model foo.model < test.csv

+It is assumed that the test set only contains inputs in its columns. The mean

+predictions for each input will be printed in the same order to standard

+By specifying \verb=-with-stddev= on the command line, a second column will be

+printed separated by a comma, which contains the uncertainty of this mean

+prediction expressed as a standard deviation. If the flag \verb=-predictive= is

+used, the noise will be included to yield a predictive distribution.

+Besides improving the usability of the example application, a few extensions to

+the library are considered for the near future:

+\item More flexible covariance functions. Besides adding more such functions

+and more features to e.g.\ the ``fat'' squared exponential covariance function

+and making parameterization simpler, a very interesting approach would be to

+support combining covariance functions. As described in

+\cite{oai:eprints.pascal-network.org:1211}, the sum and product of these are

+also covariance functions, which would hence allow making better use of

+problem-specific background knowledge.

+\item Warping (see \cite{conf/nips/SnelsonRG03}) for nonlinear, nonparametric

+transformations of the target variable.

+\item Sparse convolved GPs (see \cite{DBLP:conf/nips/AlvarezL08}), which would

+support multiple nonlinearly correlated target variables. This would require

+implementing the PI(T)C\footnote{\emph{Partially Independent (Training)

+Conditional}} approximation (see \cite{SnelsonThesis}), which may also be

+beneficial for solving particular problems.

+\item Initialisation of inducing inputs using partial Cholesky factorization

+instead of randomly chosen points. This may be especially useful in the future

+for problems that have non-numerical inputs, because these cannot be optimized

+with numerical and even less so with efficient gradient-based methods.

+\item A global optimisation framework that uses Gaussian processes to model loss

+functions that are expensive to evaluate would seem like a great application to

+The interested reader may feel free to contribute these or other features.

+\chapter{Implementation details}

\newcommand{\red}{\textcolor{red}}

\newcommand{\blue}{\textcolor{blue}}

\newcommand{\Lamss}{\mat{\Lambda}_{\sigma^2}}

\newcommand{\Lamssi}{\imat{\Lambda_{\sigma^2}}}

-\title{Gaussian Process Regression with OCaml\\Version 0.9}

-\author{Markus Mottl\footnote{\mail}}

-This manual documents the implementation and use of the OCaml GPR library for

-Gaussian Process Regression with OCaml.

-The OCaml GPR library features implementations of many of the latest

-developments in the currently heavily researched machine learning area of

-Gaussian process regression.

-Gaussian processes define probability distributions over functions as prior

-knowledge. Bayesian inference can then be used to compute posterior

-distributions over these functions given data, e.g.\ to solve regression

-problems\footnote{Gaussian processes can also be used for classification

-purposes. This is by itself a large research area, which is not covered by this

-library.}. As more data becomes available, the Gaussian process framework

-learns an ever more accurate distribution of functions that generate the data.\\

-Due to their mathematically elegant nature, Gaussian processes allow for

-analytically tractable calculation of the posterior mean and covariance

-functions. Though it is easy to formulate the required equations, GPs come at a

-usually intractably high computational price for large problems. Efficient

-approximation methods have been developed in the recent past to address this

-shortcoming, and this library makes heavy use of them.\\

-Gaussian processes are true generalizations of e.g.\ linear regression, ARMA

-processes, single-layer neural networks with an infinite number of hidden units,

-and many other more widely known modeling techniques. GPs are closely related

-to support vector- (SVM) and other kernel machines, but have features that may

-make them a more suitable choice in many situations. For example they offer

-predictive variances, Bayesian model selection, sampling from the posterior

-It would go beyond the scope of this library documentation to provide for a

-detailed treatment of Gaussian processes. Hence, readers unfamiliar with this

-approach may want to consult online resources, of which there are plenty. This

-section presents an overview of recommended materials.

-\subsection{Video tutorials}

-Video tutorials are probably best suited for quickly developing an intuition and

-basic formal background of Gaussian processes and perspectives for their

-\item \emph{\footahref{http://videolectures.net/gpip06\_mackay\_gpb}{Gaussian

-Process Basics}}: David MacKay's lecture given at the \emph{Gaussian Processes

-in Practice Workshop} in 2006. This one hour video tutorial uses numerous

-graphical examples and animations to aid understanding of the basic principles

-behind inference techniques based on Gaussian processes.

-\emph{\footahref{http://videolectures.net/epsrcws08\_rasmussen\_lgp}{Learning

-with Gaussian Processes}}: a slightly longer, two hour video tutorial series

-presented by Carl Edward Rasmussen at the Sheffield EPSRC Winter School 2008,

-which goes into somewhat more detail.

-\emph{\footahref{http://videolectures.net/mlss07\_rasmussen\_bigp}{Bayesian

-Inference and Gaussian Processes}}: readers interested in a fairly thorough,

-from the ground up treatment of Bayesian inference techniques using Gaussian

-processes may want to watch this five hour video tutorial series presented by

-Carl Edward Rasmussen at the MLSS 2007 in T\"ubingen.

-\subsection{Books and papers}

-The following texts are intended for people who need a more formal treatment and

-theory. This is especially recommended if you want to be able to implement

-Gaussian processes and their approximations efficiently.

-\emph{\footahref{http://www.gatsby.ucl.ac.uk/\home{snelson}/thesis.pdf}{Flexible

-and efficient Gaussian process models for machine learning}}: Edward Lloyd

-Snelson's PhD thesis (\cite{SnelsonThesis}) offers a particularly readable

-treatment of modern inference and approximation techniques that avoids heavy

-formalism in favor of intuitive notation and clearly presented high-level

-concepts without sacrificing detail needed for implementation. This library

-\item \emph{\footahref{http://www.gaussianprocess.org/gpml}{Gaussian Processes

-for Machine Learning}}: many researchers in this area would call this book,

-which was written by Carl Edward Rasmussen and Christopher K.\ I.\ Williams,

-the ``bible of Gaussian processes''. It presents a rigorous treatment of the

-underlying theory for both regression and classification problems, and more

-general aspects like properties of covariance functions, etc. The authors have

-kindly made the full text and Matlab sources available online. Their

-\footahref{http://www.gaussianprocess.org}{Gaussian process website} also lists

-a great wealth of other resources valuable for both researchers and

-References to research about specific techniques used in the OCaml GPR library

-are provided in the bibliography.

-\section{Features of OCaml GPR}

-Among other things the OCaml GPR library currently offers:

-\item Sparse Gaussian processes using the FI(T)C\footnote{\emph{Fully

-Independent (Training) Conditional}} approximations for computationally

-tractable learning (see \cite{conf/nips/2005}, \cite{SnelsonThesis}). Unlike

-some other approximations that lead to degeneracy, FI(T)C maintains sane

-\item Safe and convenient API for computing posterior means, variances,

-covariances, log evidence, for sampling from the posterior distribution,

-calculating statistics of the quality of fit, etc.

-\item Optimization of hyper parameters via evidence maximization\footnote{Also

-known as type II maximum likelihood.}, including optimization of inducing inputs

-(SPGP algorithm\footnote{This library exploits sparse matrix operations to

-achieve optimum big-O complexity when learning inducing inputs with the SPGP

-algorithm, but also for multiscales and other hyper parameters that imply sparse

-\item Supervised dimensionality reduction, and improved predictions under

-heteroskedastic noise conditions (see \cite{conf/uai/SnelsonG06},

-\item Sparse multiscale Gaussian process regression (see

-\cite{conf/icml/WalderKS08}).

-\item Variational improvements to the approximate posterior distribution (see

-\item Numerically stable GP calculations using QR-factorization to avoid the

-more commonly used and numerically unstable solution of normal equations via

-Cholesky factorization (see \cite{Foster2009}).

-\item Consistent use of BLAS/LAPACK throughout the library for optimum

-\item Functors for plugging arbitrary covariance functions (= kernels) into the

-framework. There is no constraint on the type of covariance functions, i.e.\

-also string inputs, graph inputs, etc., could potentially be used with ease

-given suitable covariance functions\footnote{The library is currently only

-distributed with covariance functions that operate on multivariate numerical

-inputs. Interested readers may feel free to contribute others.}.

-\item Rigorous test suite for checking both user-provided derivatives of

-covariance functions, which are usually quite hard to implement correctly, and

-self-test code to verify derivatives of log likelihood functions using finite

-\chapter{API documentation}

-\chapter{Example application}

-\chapter{FIC computations}

This section consists of equations used for computing the FI(T)C predictive

distribution, and the log likelihood and its derivatives in the OCaml GPR

-\chapter{Notes/reminders for future work}

-Initialize inducing inputs with partial Cholesky factorization

-(better with discrete values)

-\item $\tfrac{\partial f}{\partial \log(x)} = \tfrac{\partial f}{\partial x} x$

-\section{Nonlinear clustering:}

-\item $k(x, y) = \langle \phi(x) | \phi(y) \rangle$

-\item $\|\phi(x) - \phi(y)\|^2 = k(x,x)-2k(x,y)+k(y,y)$

-\item find one inducing point

-\item choose point x farthest away wrt.\ k

-\item choose antipodal point y to x wrt.\ k

-\item determine for all points to which of x or y they are closer

-\item create two clusters

-\item when suitable granularity reached, use PI(T)C

\bibliographystyle{alpha}