Markus Mottl avatar Markus Mottl committed e481604

Improved documentation

Comments (0)

Files changed (3)

 NAME = gpr
 
+gpr.pdf: gpr.bib
+
 LaTeXDocument($(NAME), gpr)
 
 .DEFAULT: $(NAME).pdf
+@Book{oai:eprints.pascal-network.org:1211,
+  title =       "Gaussian Processes for Machine Learning",
+  author =      "Carl Edward Rasmussen and Christopher Williams",
+  publisher =   "MIT Press",
+  year =        "2006",
+  abstract =    "Publisher's description: Gaussian processes (GPs)
+                 provide a principled, practical, probabilistic approach
+                 to learning in kernel machines. GPs have received
+                 increased attention in the machine-learning community
+                 over the past decade, and this book provides a
+                 long-needed systematic and unified treatment of
+                 theoretical and practical aspects of GPs in machine
+                 learning. The treatment is comprehensive and
+                 self-contained, targeted at researchers and students in
+                 machine learning and applied statistics. The book deals
+                 with the supervised-learning problem for both
+                 regression and classification, and includes detailed
+                 algorithms. A wide variety of covariance (kernel)
+                 functions are presented and their properties discussed.
+                 Model selection is discussed both from a Bayesian and a
+                 classical perspective. Many connections to other
+                 well-known techniques from machine learning and
+                 statistics are discussed, including support-vector
+                 machines, neural networks, splines, regularization
+                 networks, relevance vector machines and others.
+                 Theoretical issues including learning curves and the
+                 PAC-Bayesian framework are treated, and several
+                 approximation methods for learning with large datasets
+                 are discussed. The book contains illustrative examples
+                 and exercises, and code and datasets are available on
+                 the Web. Appendixes provide mathematical background and
+                 a discussion of Gaussian Markov processes.",
+  bibsource =   "OAI-PMH server at eprints.pascal-network.org",
+  oai =         "oai:eprints.pascal-network.org:1211",
+  subject =     "Learning/Statistics \& Optimisation; Theory \&
+                 Algorithms",
+  type =        "NonPeerReviewed",
+  URL =         "http://eprints.pascal-network.org/archive/00001211/;
+                 http://mitpress.mit.edu/catalog/item/default.asp?ttype=2\&tid=10930",
+}
+
+@PhdThesis{SnelsonThesis,
+  title =       "Flexible and efficient Gaussian process models for
+                 machine learning",
+  author =      "Edward Lloyd Snelson",
+  year =        "2008",
+  month =       feb # "~06",
+  abstract =    "2007 I, Edward Snelson, confirm that the work
+                 presented in this thesis is my own. Where information
+                 has been derived from other sources, I confirm that
+                 this has been indi-cated in the thesis. 2 Gaussian
+                 process (GP) models are widely used to perform Bayesian
+                 nonlinear re-gression and classification --- tasks that
+                 are central to many machine learning prob-lems. A GP is
+                 nonparametric, meaning that the complexity of the model
+                 grows as more data points are received. Another
+                 attractive feature is the behaviour of the error bars.
+                 They naturally grow in regions away from training data
+                 where we have high uncertainty about the interpolating
+                 function. In their standard form GPs have several
+                 limitations, which can be divided into two broad
+                 categories: computational difficulties for large data
+                 sets, and restrictive modelling assumptions for complex
+                 data sets. This thesis addresses various aspects",
+  school = "Gatsby Computational Neuroscience Unit, University College London",
+  bibsource =   "OAI-PMH server at citeseerx.ist.psu.edu",
+  contributor =  "CiteSeerX",
+  language =    "en",
+  oai =         "oai:CiteSeerXPSU:10.1.1.62.4041",
+  relation =    "10.1.1.28.8311",
+  rights =      "Metadata may be used without restrictions as long as
+                 the oai identifier remains attached to it.",
+  URL =
+"http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.62.4041;
+                 http://www.gatsby.ucl.ac.uk/~snelson/thesis.pdf",
+}
+
+@InProceedings{conf/nips/2005,
+  title =       "Sparse Gaussian Processes using Pseudo-inputs",
+  author =      "Edward Snelson and Zoubin Ghahramani",
+  year =        "2005",
+  bibdate =     "2006-02-15",
+  bibsource =   "DBLP,
+                 http://dblp.uni-trier.de/db/conf/nips/nips2005.html#SnelsonG05",
+  booktitle =   "NIPS",
+  URL =         "http://books.nips.cc/papers/files/nips18/NIPS2005_0543.pdf",
+}
+
+@InProceedings{conf/uai/SnelsonG06,
+  title =       "Variable Noise and Dimensionality Reduction for Sparse
+                 Gaussian processes",
+  author =      "Edward Snelson and Zoubin Ghahramani",
+  publisher =   "AUAI Press",
+  year =        "2006",
+  bibdate =     "2007-07-26",
+  bibsource =   "DBLP,
+                 http://dblp.uni-trier.de/db/conf/uai/uai2006.html#SnelsonG06",
+  booktitle =   "UAI",
+  ISBN =        "0-9749039-2-2",
+  URL =
+"http://uai.sis.pitt.edu/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1316&proceeding_id=22",
+}
+
+@InProceedings{conf/nips/SnelsonRG03,
+  title =       "Warped Gaussian Processes",
+  author =      "Edward Snelson and Carl Edward Rasmussen and Zoubin
+                 Ghahramani",
+  publisher =   "MIT Press",
+  year =        "2003",
+  bibdate =     "2004-10-12",
+  bibsource =   "DBLP,
+                 http://dblp.uni-trier.de/db/conf/nips/nips2003.html#SnelsonRG03",
+  booktitle =   "NIPS",
+  crossref =    "conf/nips/2003",
+  editor =      "Sebastian Thrun and Lawrence K. Saul and Bernhard
+                 Sch{\"o}lkopf",
+  ISBN =        "0-262-20152-6",
+  URL =         "http://books.nips.cc/papers/files/nips16/NIPS2003_AA43.pdf",
+}
+
+@InProceedings{conf/icml/WalderKS08,
+  title =       "Sparse multiscale gaussian process regression",
+  author =      "Christian Walder and Kwang In Kim and Bernhard
+                 Sch{\"o}lkopf",
+  bibdate =     "2008-08-14",
+  bibsource =   "DBLP,
+                 http://dblp.uni-trier.de/db/conf/icml/icml2008.html#WalderKS08",
+  booktitle =   "Machine Learning, Proceedings of the Twenty-Fifth
+                 International Conference ({ICML} 2008), Helsinki,
+                 Finland, June 5-9, 2008",
+  publisher =   "ACM",
+  year =        "2008",
+  volume =      "307",
+  editor =      "William W. Cohen and Andrew McCallum and Sam T.
+                 Roweis",
+  ISBN =        "978-1-60558-205-4",
+  pages =       "1112--1119",
+  series =      "ACM International Conference Proceeding Series",
+  URL =         "http://doi.acm.org/10.1145/1390156.1390296",
+}
+
+@Journal{Foster2009,
+  author =      "Leslie Foster and Alex Waagen and Nabeela Aijaz and
+                 Michael Hurley and Apolonio Luis and Joel Rinsky and
+                 Chandrika Satyavolu and Michael J. Way and Paul Gazis
+                 and Ashok Srivastava",
+  title =       "Stable and Efficient Gaussian Process Calculations",
+  journal =     "Journal of Machine Learning Research",
+  publisher =   "Microtome Publishing",
+  volume =      "10",
+  pages =       "857--882",
+  ISSN =        "1533-7928 (electronic); 1532-4435 (paper)",
+  year =        "2009",
+  month =       apr,
+  abstract =    "The use of Gaussian processes can be an effective
+                 approach to prediction in a supervised learning
+                 environment. For large data sets, the standard Gaussian
+                 process approach requires solving very large systems of
+                 linear equations and approximations are required for
+                 the calculations to be practical. We will focus on the
+                 subset of regressors approximation technique. We will
+                 demonstrate that there can be numerical instabilities
+                 in a well known implementation of the technique. We
+                 discuss alternate implementations that have better
+                 numerical stability properties and can lead to better
+                 predictions. Our results will be illustrated by looking
+                 at an application involving prediction of galaxy
+                 redshift from broadband spectrum data.",
+  URL =         "http://www.jmlr.org/jmlr.xml;
+                 http://www.jmlr.org/style.css;
+                 http://www.jmlr.org/papers/volume10/foster09a/foster09a.pdf;
+                 http://www.jmlr.org; http://www.jmlr.org/; /papers;
+                 /author-info.html; /news.html; /scope.html;
+                 /editorial-board.html; /announcements.html;
+                 /proceedings; /mloss; /search-jmlr.html; /manudb;
+                 /jmlr.xml",
+}
+
+@techreport{Titsias2009,
+  author = "Michalis K.\ Titsias",
+  title = "Variational Model Selection for Sparse Gaussian Process Regression",
+  institution = "School of Computer Science, University of Manchester, UK",
+  year = "2009",
+  URL = "http://www.cs.manchester.ac.uk/~mtitsias/papers/sparseGPv2.pdf",
+}
 \documentclass[10pt]{article}
 
+% PACKAGES
 \usepackage[usenames]{color}
 
+\newcommand{\mail}{\mailto{markus.mottl@gmail.com}}
+\newcommand{\athome}[2]{\ahref{http://www.ocaml.info/#1}{#2}}
+\newcommand{\www}{\athome{}{Markus Mottl}}
+
+% INCLUDE HEVEA SUPPORT
+\usepackage{hevea}
+
+%BEGIN LATEX
+\usepackage{natbib}
+%END LATEX
+
 \usepackage{amsmath}
 \usepackage{amsfonts}
 \usepackage{amssymb}
 \usepackage{amsbsy}
 \usepackage{accents}
 
+% HTML FOOTER
+\htmlfoot{
+  \rule{\linewidth}{1mm}
+  Copyright \quad \copyright \quad 2009-
+  \quad \www \quad \langle\mail\rangle
+}
+
+% HYPHENATION
+
+\hyphenation{he-te-ro-ske-da-stic}
+\hyphenation{ana-ly-ti-cally}
+\hyphenation{know-ledge}
+
 \DeclareMathAlphabet{\mathsfsl}{OT1}{cmss}{m}{sl}
 
 \newcommand{\red}{\textcolor{red}}
 \newcommand{\Lamss}{\mat{\Lambda}_{\sigma^2}}
 \newcommand{\Lamssi}{\imat{\Lambda_{\sigma^2}}}
 
+% TITLE
+
+\title{Gaussian Process Regression with OCaml\\Version 0.9}
+
+\author{Markus Mottl\footnote{\mail}}
+
+\date{\today}
+
+% DOCUMENT
 \begin{document}
 
+\maketitle
+
+\begin{abstract}
+
+This manual documents the implementation and use of the OCaml GPR
+library for Gaussian Process Regression with OCaml.
+
+\end{abstract}
+
+\section{Overview}
+
+The OCaml GPR library features implementations of many of the latest
+developments in the currently heavily researched machine learning
+area of Gaussian process regression.
+
+\subsection{Background}
+
+Gaussian processes define probability distributions over functions
+as prior knowledge.  Bayesian inference can then be used to compute
+posterior distributions over these functions given data, e.g.\ to
+solve regression problems\footnote{Gaussian processes can also be
+used for classification purposes.  This is by itself a large research
+area, which is not covered by this library.}.  As more data becomes
+available, a Gaussian process framework learns an ever more accurate
+distribution of functions that generate the data.\\
+
+Due to their mathematically elegant nature, Gaussian processes allow
+for analytically tractable calculation of the posterior mean and
+covariance functions.  Though it is easy to formulate the required
+equations, GPs come at a usually intractably high computational
+price for large problems.  Good approximation methods have been
+developed in the recent past to address this shortcoming, and this
+library makes heavy use of them.\\
+
+Gaussian processes are true generalizations of e.g.\ linear regression,
+ARMA processes, single-layer neural networks with infinitely many
+hidden units, etc., and thus capable of replacing numerous less
+general approaches.  They are closely related to support vector-
+(SVM) and other modern kernel machines, but have features that may
+make them a more suitable choice in many situations, for example
+predictive variances, Bayesian model selection, \ldots\\
+
+It would go beyond the scope of this library documentation to provide
+for a detailed treatment of Gaussian processes.  Hence, readers
+unfamiliar with this approach may want to consult a wealth of online
+resources.  This subsection presents an overview of recommended
+materials.
+
+\subsubsection{Video tutorials}
+
+Video tutorials are probably best suited for quickly developing an
+intuition and basic formal background of Gaussian processes and
+perspectives for their practical use.
+
+\begin{itemize}
+
+\item
+\emph{\footahref{http://videolectures.net/gpip06\_mackay\_gpb}{Gaussian
+Process Basics}}: David MacKay's lecture given at the \emph{Gaussian
+Processes in Practice Workshop} in 2006.  This one hour video
+tutorial uses numerous graphical examples and animations to aid
+understanding of the basic principles behind inference techniques
+based on Gaussian processes.
+
+\item
+\emph{\footahref{http://videolectures.net/epsrcws08\_rasmussen\_lgp}{Learning
+with Gaussian Processes}}: a slightly longer, two hour video tutorial
+series presented by Carl Edward Rasmussen at the Sheffield EPSRC
+Winter School 2008, which goes into slightly more detail.
+
+\item
+\emph{\footahref{http://videolectures.net/mlss07\_rasmussen\_bigp}{Bayesian
+Inference and Gaussian Processes}}: readers interested in a fairly
+thorough, from the ground up treatment of Bayesian inference
+techniques using Gaussian processes may want to watch this five
+hour video tutorial series presented by Carl Edward Rasmussen at
+the MLSS 2007 in T\"ubingen.
+
+\end{itemize}
+
+\subsubsection{Books and papers}
+
+The following texts are intended for people who need a more formal
+treatment and theory.  This is especially recommended if you want
+to be able to implement Gaussian processes and their approximations
+efficiently.
+
+\begin{itemize}
+
+\item
+\emph{\footahref{http://www.gatsby.ucl.ac.uk/\home{snelson}/thesis.pdf}{Flexible
+and efficient Gaussian process models for machine learning}}: Edward
+Lloyd Snelson's PhD thesis (\cite{SnelsonThesis}) offers a particularly
+readable treatment of modern inference and approximation techniques
+that avoids heavy formalism in favor of intuitive notation and
+clearly presented high-level concepts without sacrificing detail
+needed for implementation.  This library owes a lot to his work.
+
+\item \emph{\footahref{http://www.gaussianprocess.org/gpml}{Gaussian
+Processes for Machine Learning}}: many researchers in this area
+would call this book written by Carl Edward Rasmussen and Christopher
+K.\ I.\  Williams the ``bible of Gaussian processes''.  It presents
+a rigorous treatment of the underlying theory for both regression
+and classification problems, and more general aspects like properties
+of covariance functions, etc.  The authors have kindly made the
+full text and Matlab sources available online.  Their
+\footahref{http://www.gaussianprocess.org}{Gaussian process website}
+also lists a great wealth of other resources valuable for both
+researchers and practitioners.
+
+\end{itemize}
+
+References to research about specific techniques used in the OCaml
+GPR library are provided in the bibliography.
+
+\subsection{Features of OCaml GPR}
+
+Among other things the OCaml GPR library currently offers:
+
+\begin{itemize}
+
+\item Sparse Gaussian processes using the FI(T)C\footnote{\emph{Fully
+Independent (Training) Conditional}} approximation for computationally
+tractable learning (see \cite{conf/nips/2005}, \cite{SnelsonThesis}).
+Unlike some other approximations that lead to degeneracy, this one
+maintains sane posterior variances.
+
+\item Optimization of hyper parameters by evidence
+maximization\footnote{Also known as type II maximum likelihood.},
+including optimization of inducing inputs (SPGP algorithm\footnote{This
+library exploits sparse matrix operations to achieve optimum big-O
+complexity when learning inducing inputs with the SPGP algorithm,
+but also for multiscales and other hyper parameters that imply
+sparse derivative matrices.}).
+
+\item Supervised dimensionality reduction, and improved predictions
+under heteroskedastic noise conditions (see \cite{conf/uai/SnelsonG06},
+\cite{SnelsonThesis}).
+
+\item Sparse multiscale Gaussian process regression (see
+\cite{conf/icml/WalderKS08}).
+
+\item Variational improvements to the approximate posterior
+distribution (see \cite{Titsias2009}).
+
+\item Numerically stable GP calculations using QR-factorization to
+avoid the more commonly used and numerically unstable solution of
+normal equations via Cholesky factorization (see \cite{Foster2009}).
+
+\item Consistent use of BLAS/LAPACK throughout the library for
+optimum performance.
+
+\item Functors for plugging arbitrary covariance functions (=
+kernels) into the framework.  There is no constraint on the type
+of covariance functions, i.e.\ also string inputs, graph inputs,
+etc., could potentially be used with ease given suitable covariance
+functions\footnote{The library is currently only distributed with
+covariance functions that operate on multivariate numerical inputs.
+Feel free to contribute others.}.
+
+\item Rigorous test suite for checking both user-provided derivatives
+of covariance functions, which are usually quite hard to implement
+correctly, and self-test code to verify derivatives of log-likelihood
+functions using finite differences.
+
+\end{itemize}
+
+\section{API documentation}
+
+\section{Example application}
+
+\section{Internals}
+
 \section{FIC computations}
 
-These are the equations used for computing the FIC predictive
-distribution and FIC marginal likelihood and its derivatives in the
-OCaml-implementation.  The implementation factorizes the computations
-in this way for several reasons: to minimize computation time and
-memory usage, and to improve numerical stability by e.g.\ avoiding
-inverses and by using QR factorization to avoid normal equations
+This section consists of equations used for computing the FI(T)C
+predictive distribution, and the log-likelihood and its derivatives
+in the OCaml GPR library.  The implementation factorizes the
+computations in this way for several reasons: to minimize computation
+time and memory usage, and to improve numerical stability by using
+QR factorization to avoid normal equations, and by avoiding inverses
 whenever possible without great loss of efficiency.  It otherwise
 aims for ease of implementation, e.g.\ combining derivative terms
 to simplify dealing with sparse matrices.\\
 
-Here are a few symbology conventions:
+The presentation and notation here is somewhat similar to
+\cite{SnelsonThesis}.  Thus, interested readers are encouraged to
+first read his work, especially the derivations in the appendix.
+Our presentation deviates in minor ways, but should hopefully still
+be fairly easy to compare.  The log-likelihood derivatives have
+been heavily restructured though.  The mathematical derivation of
+this restructuring would be extremely tedious, hence only the final
+result is presented.\\
+
+Here are a few definitions:
 
 \begin{itemize}
-\item $\mathrm{diag_m}$ is the matrix consisting of only the diagonal
-\item Parts in \red{red} represent terms used for Michalis Titsias'
-variational approximation of the posterior marginal likelihood.
+
+\item $\mathrm{diag_m}$ is the function that returns the matrix
+consisting of only the diagonal of a given matrix.  $\mathrm{diag_v}$
+returns the diagonal as a vector.
+
+\item $\otimes$ represents element-wise multiplication of vectors.
+A vector raised to a power means element-wise application of that
+power.
+
+\item Parts in \red{red} represent terms used for Michalis K.\
+Titsias' variational improvement (see \cite{Titsias2009}) to the
+posterior marginal likelihood.
+
 \item Parts in \blue{blue} provide for an alternative, more compact,
 direct and hence more efficient way of computing some result if the
 required parameters are already available.
+
 \end{itemize}
 
 \begin{eqnarray*}
 \vecs & = & \diagv{\Lamss} \\
 \\
 \uKnm & = & \ichol{\Lamss} \Knm \\
-\matQ \matR & = & {\uKnm\choose\cholt{K_M}} \hspace{1cm} \textrm{(QR-factorization)} \\
+\matQ \matR & = & {\uKnm\choose\cholt{K_M}} \hspace{5mm}
+\textrm{(QR-factorization of $\uKnm$ stacked on $\cholt{K_M}$)} \\
 \\
 \matB & = & \Km + \uKmn\uKnm = \transm{R}\transm{Q} \mat{Q} \matR = \transm{R}\matR \\
-\matQn & = & {\lfloor \matQ \rfloor}_{N}\footnotemark \longrightarrow \uKnm = \matQn \matR \\
+\matQn & = & {\lfloor \matQ \rfloor\footnotemark}_{N} \longrightarrow \uKnm = \matQn \matR \\
 \matS & = & \ichol{\Lamss}\matQn\itransm{R} \\
 \\
 l_1 & = & -\onehalf (\log|\matB| - \log|\Km| + \log|\Lamss| + N \log 2\pi) \red{+ -\onehalf\vecis \cdot \vecr} \\
 \item when suitable granularity reached, use PI(T)C
 \end{itemize}
 
+% BIBLIOGRAPHY
+\bibliographystyle{alpha}
+\bibliography{gpr}
+ 
 \end{document}
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.