Commits

Ruben Martinez-Cantin committed 42463d7

Optimization of kernel parameters is done in the log space.

Comments (0)

Files changed (4)

doxygen/reference.dox

 we can include as much knowledge as possible about the target function
 or about the problem. Or, if the knowledge is not available, keep the
 model as general as possible (to avoid bias). In this part, knowledge
-about Gaussian process or nonparametric models in general might be
+about Gaussian processes or nonparametric models in general might be
 useful. 
 
 For example, with the parameters we can select the kind of kernel,
-mean or surrogate model that we want to use. With the kernel we can
-play with the smoothness of the function and it's derivatives. The
-mean function can be use to model the overall trend (is it flat?
-linear?). If we know the overall signal variance we better use a
-Gaussian process, if we don't, we should use a Student's t process
-instead.
+mean function or surrogate model that we want to use. With the kernel
+we can play with the smoothness of the function and it's
+derivatives. The mean function can be use to model the overall trend
+(flat, linear, etc.). If we know the overall signal variance we better
+use a Gaussian process, if we don't, we should use a Student's t
+process instead.
 
-For that reason, the parameters are bundled in a structure or
-dictionary, depending on the API that we use. This is a brief
-explanation of every parameter
+For that reason, the parameters are bundled in a structure
+(C/C++/Matlab/Octave) or dictionary (Python), depending on the API
+that we use. This is a brief explanation of every parameter
 
 \subsection budgetpar Budget parameters
 
 \li \b n_inner_iterations: Maximum number of iterations of the inner
 optimization process. Each iteration corresponds with a criterion
 evaluation. The inner optimization results in the "most interest
-point" to run evaluate the target function. This is also used for the
-kernel hyperparameter computation. In order to scale the process for
+point" to evaluate the target function. In order to scale the process for
 increasing dimensional spaces, the actual number of iterations is this
 number times the number of dimensions. [Default 500]
 \li \b n_init_samples: BayesOpt requires an initial set of samples to
 a target function evaluation. [Default 30]
 \li \b n_iter_relearn: Although most of the parameters of the model
 are updated after every iteration, the kernel parameters cannot be
-updated continuously as it might crash the convergence. This
-represents the number of iterations between recomputing the kernel
-parameters. If it is set to 0, they are only learned after the initial
-set of samples. [Default 0]
+updated continuously as it has a very large computational overhead and
+it might introduce bias in the result. This represents the number of
+iterations between recomputing the kernel parameters. If it is set to
+0, they are only learned after the initial set of samples. [Default 0]
 
 \subsection initpar Initialization parameters
 
 \li \b init_method: (unsigned integer value) For continuous
 optimization, we can choose among diferent strategies for the initial
-design (1-Latin Hypercube Sampling, 2-Sobol sequences (if available,
-see \ref mininst), Other-Uniform Sampling).
+design (1-Latin Hypercube Sampling (LHS), 2-Sobol sequences (if available,
+see \ref mininst), Other-Uniform Sampling) [Default 1, LHS].
 
 
 \subsection logpar Logging parameters
 
-\li \b verbose_level: (unsigned integer value) Verbose level 0,3 -> warnings,
-1,4 -> general information, 2,5 -> debug information, any other value
--> only errors. Levels < 3 send the messages to stdout. Levels > 4
-send them to a log file. [Default 1].
-\li \b log_filename: Name of the log file (if applicable, verbose >
-4)[Default "bayesopt.log"]
+\li \b verbose_level: (unsigned integer value) Verbose level 0,3 ->
+warnings, 1,4 -> general information, 2,5 -> debug information, any
+other value -> only errors. Levels 0,1,2 -> send messages to
+stdout. Levels 3,4,5 -> send messages to a log file. [Default 1,
+general info->stdout].
+\li \b log_filename: Name of the log file (if applicable,
+verbose_level= 3, 4 or 5) [Default "bayesopt.log"]
 
 \subsection surrpar Surrogate model parameters
 
 \li \b surr_name: Name of the hierarchical surrogate function
 (nonparametric process and the hyperpriors on sigma and w). See
-Section \ref surrmod for a detailed description. [Default "sGaussianProcess"]
+Section \ref surrmod for a detailed description. [Default
+"sGaussianProcess"]
 \li \b sigma_s: Signal variance (if known) [Default 1.0]
-\li \b noise: Observation noise. For computer simulations or
-deterministic functions, it should be close to 0. However, to avoid
-numerical instability due to model inaccuracies, do not make it
-0. [Default 0.0001]
+\li \b noise: Observation noise/signal ratio. For computer simulations
+or deterministic functions, it should be close to 0. However, to avoid
+numerical instability due to model inaccuracies, make it always <0.
+[Default 0.0001]
 \li \b alpha, \b beta: Inverse-Gamma prior hyperparameters (if
 applicable) [Default 1.0, 1.0]
-\li \b l_type: Learning method for the kernel hyperparameters. See
-section \ref learnmod for a detailed description [Default L_MAP]
+
+\subsection hyperlearn Hyperparameter learning
+
+Although BayesOpt tries to build a full analytical Bayesian model for
+the surrogate function, some hyperparameters cannot be estimated in
+closed form. Currently, the only parameters of BayesOpt models that
+require special treatment are the kernel hyperparameters. See Section
+\ref learnmod for a detailed description
+
+\li \b l_type: Learning method for the kernel
+hyperparameters. Currently, only L_FIXED and L_EMPIRICAL are
+implemented [Default L_EMPIRICAL]
+\li \b sc_type: Score function for the learning method. [Default
+SC_MAP]
+
 
 \subsection critpar Exploration/exploitation parameters
 
 \li \b epsilon: According to some authors, it is recommendable to
 include an epsilon-greedy strategy to achieve near optimal convergence
 rates. Epsilon is the probability of performing a random (blind)
-evaluation of the target function [Default 0.0 (disabled)]
+evaluation of the target function. Higher values implies forced
+exploration while lower values relies more on the
+exploration/exploitation policy of the criterion [Default 0.0
+(disabled)]
 \li \b crit_name: Name of the sample selection criterion or a
 combination of them. It is used to select which points to evaluate for
 each iteration of the optimization process. Could be a combination of
 functions like "cHedge(cEI,cLCB,cPOI,cThompsonSampling)". See section
-critmod for the different possibilities. [Default: "cEI]"
+critmod for the different possibilities. [Default: "cEI"]
 \li \b crit_params, \b n_crit_params: Array with the set of parameters
 for the selected criteria. If there are more than one criterium, the
 parameters are split among them according to the number of parameters

examples/bo_cont.cpp

   par.sc_type = SC_ML;
   par.n_iterations = 200;       // Number of iterations
   par.n_init_samples = 50;
+  par.n_iter_relearn = 20;
   par.verbose_level = 2;
   /*******************************************/
 

include/kernel_atomic.hpp

 #ifndef  _KERNEL_ATOMIC_HPP_
 #define  _KERNEL_ATOMIC_HPP_
 
+#include <valarray>
 #include "kernel_functors.hpp"
 #include "elementwise_ublas.hpp"
 
 	  FILE_LOG(logERROR) << "Wrong number of kernel hyperparameters"; 
 	  throw std::invalid_argument("Wrong number of kernel hyperparameters");
 	}
-      params = theta;
+      params = theta; //TODO: To make enough space. Make it more efficient.
+      std::transform(theta.begin(), theta.end(), params.begin(), (double (*)(double)) exp);
+      //      params = exp(theta);
     };
 
-    vectord getHyperParameters() {return params;};
+    vectord getHyperParameters() 
+    { 
+      vectord theta(params.size());
+      std::transform(params.begin(), params.end(), theta.begin(), (double (*)(double)) log);
+      return theta;
+      //  return log(params);
+    };
     size_t nHyperParameters() {return n_params;};
 
     virtual ~AtomicKernel(){};

include/parameters.h

     size_t n_coef;               /**< Number of mean funct. hyperparameters */
   } mean_parameters;
 
-  /** \brief Configuration parameters */
+  /** \brief Configuration parameters 
+   *  @see \ref reference for a full description of the parameters
+   */
   typedef struct {
     size_t n_iterations;         /**< Maximum BayesOpt evaluations (budget) */
     size_t n_inner_iterations;   /**< Maximum inner optimizer evaluations */
     double noise;                /**< Observation noise (and nugget) */
     double alpha;                /**< Inverse Gamma prior for signal var */
     double beta;                 /**< Inverse Gamma prior for signal var*/
+
     score_type sc_type;          /**< Score type for kernel hyperparameters (ML,MAP,etc) */
     learning_type l_type;        /**< Type of learning for the kernel params*/
+
     double epsilon;              /**< For epsilon-greedy exploration */
 
     kernel_parameters kernel;    /**< Kernel parameters */
 
   /* Nonparametric process "parameters" */
   const double KERNEL_THETA    = 1.0;
-  const double KERNEL_SIGMA    = 100.0;
+  const double KERNEL_SIGMA    = 10.0;
   const double MEAN_MU         = 1.0;
   const double MEAN_SIGMA      = 1000.0;
   const double PRIOR_ALPHA     = 1.0;