\section confinst Configure the compilation/install -As we have made to select the install path or to add python bindings, -CMake allows to configure the compilation using some variables. These +CMake allows to configure the compilation using some variables (see +for example how to compile the Python module in Linux). These variables can be set in Linux/MacOS from the command line with the -D flag: \verbatim @@ -233,9 +233,9 @@ >> cmake -DCMAKE_BUILD_TYPE=Debug . \endverbatim -If you use ccmake or CMake for Windows, just modify the value of the -variable. - +If you use ccmake in Linux/MacOS or CMake for Windows, you can access +to a list of all the variables and their values. Just modify the value +of the desired variable. \subsection instshared Compile as shared libraries diff --git a/doxygen/models.dox b/doxygen/models.dox --- a/doxygen/models.dox +++ b/doxygen/models.dox @@ -1,4 +1,5 @@ /*! \page modelopt Models and functions +\tableofcontents This library was originally developed for as part of a robotics research project \cite MartinezCantin09AR \cite MartinezCantin07RSS, @@ -175,7 +176,7 @@ "cHedge(cSum(cEI,cDistance),cLCB,cPOI,cOptimisticSampling)" -\subsection learnmod Methods for learning the kernel parameters +\section learnmod Methods for learning the kernel parameters As commented before, we consider that the prior of the kernel hyperparameters \f$\theta\f$ --if available-- is independent of other @@ -218,4 +219,27 @@ assume no prior. Since we assume that the hyperparameters are independent, we can apply priors selectively only to a small set. +\section initdes Initial design methods + +In order to build a suitable surrogate function, we a need a +preliminar set of samples. In Bayesian optimization this is typically +performed using alternative experimental design criteria. In this +first step, usually the main criteria is space filling. Thus, we have +implemented the subsequent designs: + +\li Latin hypercube sampling: Each dimension of the space is divided +in several intervals. Samples are then taken according to a +generalization of the Latin square +scheme. http://en.wikipedia.org/wiki/Latin_hypercube_sampling + +\li Sobol sequences: It is a set of quasi-random low-discrepancy +sequences. Thus the space is sampled more evenly than with uniform +sampling. http://en.wikipedia.org/wiki/Sobol_sequence + +\li Uniform sampling: The search space is sampled uniformly. + +Note: Since we do not assume any struture in the set of discrete +points during discrete optimization, only uniform sampling of the +discrete set is available in that case. + */ \ No newline at end of file diff --git a/doxygen/reference.dox b/doxygen/reference.dox --- a/doxygen/reference.dox +++ b/doxygen/reference.dox @@ -29,7 +29,8 @@ not included by default in the linker, Python or Matlab paths by default. This is specially critical when building shared libraries (mandatory for Python usage). The script \em exportlocalpaths.sh makes -sure that the folder is included in all the necessary paths. +sure that the folder with the libraries is included in all the +necessary paths. After that, there are 3 steps that should be follow: \li Define the function to optimize. @@ -40,10 +41,21 @@ \section params Understanding the parameters BayesOpt relies on a complex and highly configurable mathematical -model. Also, the key to nonlinear optimization is to include as much -knowledge as possible about the target function or about the -problem. Or, if the knowledge is not available, keep the model as -general as possible (to avoid bias). +model. In theory, it should work reasonably well for many problems in +its default configuration. However, Bayesian optimization shines when +we can include as much knowledge as possible about the target function +or about the problem. Or, if the knowledge is not available, keep the +model as general as possible (to avoid bias). In this part, knowledge +about Gaussian process or nonparametric models in general might be +useful. + +For example, with the parameters we can select the kind of kernel, +mean or surrogate model that we want to use. With the kernel we can +play with the smoothness of the function and it's derivatives. The +mean function can be use to model the overall trend (is it flat? +linear?). If we know the overall signal variance we better use a +Gaussian process, if we don't, we should use a Student's t process +instead. For that reason, the parameters are bundled in a structure or dictionary, depending on the API that we use. This is a brief diff --git a/optimization.bib b/optimization.bib --- a/optimization.bib +++ b/optimization.bib @@ -616,6 +616,16 @@ timestamp = {2013.02.22} } +@INPROCEEDINGS{Bergstra2011, + author = {James Bergstra and Remi Bardenet and Yoshua Bengio and Balázs Kégl}, + title = {Algorithms for Hyper-parameter Optimization. }, + booktitle = {Advances in Neural Information Processing Systems}, + year = {2011}, + pages = {2546–2554}, + owner = {rmcantin}, + timestamp = {2013.07.24} +} + @ARTICLE{Song2012, author = {Le Song and Alex Smola and Arthur Gretton and Justin Bedo and Karsten Borgwardt},