# Commits

committed 9a410df

Improving docs about parameters. Fixing issue in verbose parameter.

• Participants
• Parent commits af50e48

# File doxygen/introduction.dox

-/*! \page bopttheory Bayesian optimization
-\tableofcontents
-
-\section introbopt Introduction to Bayesian Optimization
-
-Many problems in engineering, computer science, economics, etc.,
-require to find the extremum of a real valued function. These
-functions are typically continuous and sometimes smooth (e.g.:
-Lipschitz continuous). However, those functions do not have a
-closed-form expression or might be multimodal, where some of the local
-extrema might have a bad outcome compared to the global extremum. The
-evaluation of those functions might be costly.
-
-Global optimization is a special case of non-convex optimization where
-we want to find the global extremum of a real valued function, that
-is, the target function. The search is done by some pointwise
-evaluation of the target function.
-
-The objective of a global optimization algorithm is to find the
-sequence of points
-\f[
-x_n \in \mathcal{A} \subset \mathbb{R}^m , \;\;\; n = 1,2,\ldots
-\f]
-which converges to the point \f$x^*\f$, that is, the extremum of the
-target function, when \f$n\f$ is large. The algorithm should be able to
-find that sequence at least for all functions from a given family.
-
-As explained in \cite Mockus94, this search procedure is a sequential
-decision making problem where point at step \f$n+1\f$ is based on decision
-\f$d_n\f$ which considers all previous data:
-\f[
-x_{n+1} = d_n(x_{1:n},y_{1:n})
-\f]
-where \f$y_i = f(x_i) + \epsilon_i\f$. For simplicity, many works assume
-\f$\epsilon_i = 0\f$, that is, function evaluations are
-deterministic. However, we can easily extend the description to
-include stochastic functions (e.g.: homoscedastic noise \f$\epsilon_i -\sim \mathcal{N}(0,\sigma)\f$).
-
-The search method is the sequence of decisions \f$d = {d_0,\ldots, -d_{n-1}}\f$, which leads to the final decision \f$x_{n} = x_{n}(d)\f$. In
-most applications, the objective is to optimize the response of the
-final decisions. Then, the criteria relies on the \a optimality
-\a error or \a optimality \a gap, which can be expressed as:
-\f[
-\delta_n(f,d) = f\left(x_n\right) - f(x^*)
-\f]
-In other applications, the objective may require to converge to \f$x^*\f$
-in the input space. Then, we can use for example the <em>Euclidean
-distance error</em>:
-\f[
-\delta_n(f,d) = \|x_n - x^*\|_2 \label{eq:dist-error}
-\f]
-The previous equations can also be interpreted as variants of the
-\a loss function for the decision at each step. Thus, the optimal
-decision is defined as the function that minimizes the loss function:
-\f[
-d_n = \arg \min_d \delta_n(f,d)
-\f]
-This requires full knowledge of function \f$f\f$, which is
-unavailable. Instead, let assume that the target function \f$f = f(x)\f$
-belongs to a family of functions \f$f \in F\f$, e.g.: continuous functions
-in \f$\mathbb{R}^m\f$. Let also assume that the function can be
-represented as sample from a probability distribution over functions
-\f$f \sim P(f)\f$. Then, the best response case analysis for the search
-process is defined as the decision that optimizes the expectation of
-the loss function:
-\f[
-d^{BR}_n = \arg \min_d \mathbb{E}_{P(f)} \left[
-\delta_n(f,d)\right]= \arg \min_d \int_F \delta_n(f,d) \; dP(f)
-\f]
-where \f$P\f$ is a prior distribution over functions.
-
-However, we can improve the equation considering that, at decision
-\f$d_n\f$ we have already \a observed the actual response of the
-function at \f$n-1\f$ points, \f$\{x_{1:n-1},y_{1:n-1}\}\f$. Thus, the prior
-information of the function can be updated with the observations and
-the Bayes rule:
-\f[
-  P(f|x_{1:n-1},y_{1:n-1}) = \frac{P(x_{1:n-1},y_{1:n-1}|f) P(f)}{P(x_{1:n-1},y_{1:n-1})}
-\f]
-In fact, we can actually rewrite the equation to represent the updates
-sequentially:
-\f[
-  P(f|x_{1:i},y_{1:i}) = \frac{P(x_{i},y_{i}|f) P(f|x_{1:i-1},y_{1:i-1})}{P(x_{i},y_{i})}, \qquad \forall \; i=1 \ldots n-1
-\f]
-Thus, the previous equation can be rewritten as:
-\f[
-d^{BO}_n = \arg \min_d \mathbb{E}_{P(f|x_{1:n-1},y_{1:n-1})} \left[ \delta_n(f,d)\right] = \arg \min_d \int_F \delta_n(f,d) \; dP(f|x_{1:n-1},y_{1:n-1})
-\f]
-This equation is the root of <em>Bayesian optimization</em>, where the
-Bayesian part comes from the fact that we are computing the
-expectation with respect to the posterior distribution, also called
-\a belief, over functions. Therefore, Bayesian optimization is a
-memory-based optimization algorithm.
-
-As commented before, most of the theory of Bayesian optimization is
-related to deterministic functions, we consider also stochastic
-functions, that is, we assume there might be a random error in the
-function output. In fact, evaluations can produce different outputs if
-repeated. In that case, the target function is the expected
-output. Furthermore, in a recent paper by \cite Gramacy2012 it has
-been shown that, even for deterministic functions, it is better to
-assume certain error in the observation. The main reason being that,
-in practice, there might be some mismodelling errors which can lead to
-instability of the recursion if neglected.
-
-\section modbopt Bayesian optimization general model
-
-In order to simplify the description, we are going to use a special
-case of Bayesian optimization model defined previously which
-corresponds to the most common application. In subsequent Sections we
-will introduce some generalizations for different applications.
-
-Without loss of generality, consider the problem of finding the
-minimum of an unknown real valued function \f$f:\mathbb{X} \rightarrow -\mathbb{R}\f$, where \f$\mathbb{X}\f$ is a compact space, \f$\mathbb{X} -\subset \mathbb{R}^d, d \geq 1\f$. Let \f$P(f)\f$ be a prior distribution
-over functions represented as a stochastic process, for example, a
-Gaussian process \f$\mathbf{x}i(\cdot)\f$, with inputs \f$x \in \mathbb{X}\f$ and an
-associate kernel or covariance function \f$k(\cdot,\cdot)\f$. Let also
-assume that the target function is a sample of the stochastic process
-\f$f \sim \mathbf{x}i(\cdot)\f$.
-
-In order to find the minimum, the algorithm has a maximum budget of
-\f$N\f$ evaluations of the target function \f$f\f$. The purpose of the
-algorithm is to find optimal decisions that provide a better
-performance at the end.
-
-One advantage of using Gaussian processes as a prior distributions
-over functions is that new observations of the target function
-\f$(x_i,y_i)\f$ can be easily used to update the distribution over
-functions. Furthermore, the posterior distribution is also a Gaussian
-process \f$\mathbf{x}i_i = \left[ \mathbf{x}i(\cdot) | x_{1:i},y_{1:i} -\right]\f$. Therefore, the posterior can be used as an informative prior
-for the next iteration in a recursive algorithm.
-
-In a more general setting, many authors have suggested to modify the
-standard zero-mean Gaussian process for different variations that
-include semi-parametric models \cite Huang06 \cite Handcock1993 \cite Jones:1998 \cite OHagan1992, use of hyperpriors on the parameters
-\cite MartinezCantin09AR \cite Brochu:2010c \cite Hoffman2011, Student
-t processes \cite Gramacy_Polson_2009 \cite Sacks89SS \cite Williams_Santner_Notz_2000, etc.
-
-We use a generalized linear model of the form:
-\f[
-  f(x) = \phi(\mathbf{x})^T \mathbf{w} + \epsilon(\mathbf{x})
-\f]
-where
-\f[
-  \epsilon(\mathbf{x}) \sim \mathcal{NP} \left( 0, \sigma^2_s (\mathbf{K}(\theta) + \sigma^2_n \mathbf{I}) \right)
-\f]
-The term \f$\mathcal{NP}\f$ means a non-parametric process, which can
-make reference to a Gaussian process \f$\mathcal{GP}\f$ or a Student's
-t process \f$\mathcal{TP}\f$. In both cases, \f$\sigma^2_n\f$ is the
-observation noise variance, sometimes called nugget, and it is problem
-specific. Many authors decide to fix this value \f$\sigma^2_n = 0\f$
-when the function \f$f(x)\f$ is deterministic, for example, a computer
-simulation. However, as cleverly pointed out in \cite Gramacy2012,
-there might be more reasons to include this term appart from being the
-observation noise, for example, to consider model inaccuracies.
-
-This model has been presented in different ways depending on the field
-where it was used:
-\li As a generalized linear model \f$\phi(\mathbf{x})^T\mathbf{w}\f$ with heteroscedastic
-perturbation \f$\epsilon(\mathbf{x})\f$.
-\li As a nonparametric process of the form \f$\mathcal{NP} \left(\phi(\mathbf{x})^T\mathbf{w}, -\sigma^2_s (\mathbf{K}(\theta) + \sigma^2_n \mathbf{I}) \right)\f$.
-\li As a semiparametric model \f$f(\mathbf{x}) = f_{par}(\mathbf{x}) + f_{nonpar}(\mathbf{x}) = -\phi(\mathbf{x})^T\mathbf{w} + \mathcal{NP}(\cdot)\f$
-
-
-*/

# File doxygen/models.dox

-/*! \page modelopt Models and functions
+/*! \page bopttheory Bayesian optimization
 \tableofcontents

+\section introbopt Introduction to Bayesian Optimization
+
+Many problems in engineering, computer science, economics, etc.,
+require to find the extremum of a real valued function. These
+functions are typically continuous and sometimes smooth (e.g.:
+Lipschitz continuous). However, those functions do not have a
+closed-form expression or might be multimodal, where some of the local
+extrema might have a bad outcome compared to the global extremum. The
+evaluation of those functions might be costly.
+
+Global optimization is a special case of non-convex optimization where
+we want to find the global extremum of a real valued function, that
+is, the target function. The search is done by some pointwise
+evaluation of the target function.
+
+The objective of a global optimization algorithm is to find the
+sequence of points
+\f[
+x_n \in \mathcal{A} \subset \mathbb{R}^m , \;\;\; n = 1,2,\ldots
+\f]
+which converges to the point \f$x^*\f$, that is, the extremum of the
+target function, when \f$n\f$ is large. The algorithm should be able to
+find that sequence at least for all functions from a given family.
+
+As explained in \cite Mockus94, this search procedure is a sequential
+decision making problem where point at step \f$n+1\f$ is based on decision
+\f$d_n\f$ which considers all previous data:
+\f[
+x_{n+1} = d_n(x_{1:n},y_{1:n})
+\f]
+where \f$y_i = f(x_i) + \epsilon_i\f$. For simplicity, many works assume
+\f$\epsilon_i = 0\f$, that is, function evaluations are
+deterministic. However, we can easily extend the description to
+include stochastic functions (e.g.: homoscedastic noise \f$\epsilon_i +\sim \mathcal{N}(0,\sigma)\f$).
+
+The search method is the sequence of decisions \f$d = {d_0,\ldots, +d_{n-1}}\f$, which leads to the final decision \f$x_{n} = x_{n}(d)\f$. In
+most applications, the objective is to optimize the response of the
+final decisions. Then, the criteria relies on the \a optimality
+\a error or \a optimality \a gap, which can be expressed as:
+\f[
+\delta_n(f,d) = f\left(x_n\right) - f(x^*)
+\f]
+In other applications, the objective may require to converge to \f$x^*\f$
+in the input space. Then, we can use for example the <em>Euclidean
+distance error</em>:
+\f[
+\delta_n(f,d) = \|x_n - x^*\|_2 \label{eq:dist-error}
+\f]
+The previous equations can also be interpreted as variants of the
+\a loss function for the decision at each step. Thus, the optimal
+decision is defined as the function that minimizes the loss function:
+\f[
+d_n = \arg \min_d \delta_n(f,d)
+\f]
+This requires full knowledge of function \f$f\f$, which is
+unavailable. Instead, let assume that the target function \f$f = f(x)\f$
+belongs to a family of functions \f$f \in F\f$, e.g.: continuous functions
+in \f$\mathbb{R}^m\f$. Let also assume that the function can be
+represented as sample from a probability distribution over functions
+\f$f \sim P(f)\f$. Then, the best response case analysis for the search
+process is defined as the decision that optimizes the expectation of
+the loss function:
+\f[
+d^{BR}_n = \arg \min_d \mathbb{E}_{P(f)} \left[
+\delta_n(f,d)\right]= \arg \min_d \int_F \delta_n(f,d) \; dP(f)
+\f]
+where \f$P\f$ is a prior distribution over functions.
+
+However, we can improve the equation considering that, at decision
+\f$d_n\f$ we have already \a observed the actual response of the
+function at \f$n-1\f$ points, \f$\{x_{1:n-1},y_{1:n-1}\}\f$. Thus, the prior
+information of the function can be updated with the observations and
+the Bayes rule:
+\f[
+  P(f|x_{1:n-1},y_{1:n-1}) = \frac{P(x_{1:n-1},y_{1:n-1}|f) P(f)}{P(x_{1:n-1},y_{1:n-1})}
+\f]
+In fact, we can actually rewrite the equation to represent the updates
+sequentially:
+\f[
+  P(f|x_{1:i},y_{1:i}) = \frac{P(x_{i},y_{i}|f) P(f|x_{1:i-1},y_{1:i-1})}{P(x_{i},y_{i})}, \qquad \forall \; i=1 \ldots n-1
+\f]
+Thus, the previous equation can be rewritten as:
+\f[
+d^{BO}_n = \arg \min_d \mathbb{E}_{P(f|x_{1:n-1},y_{1:n-1})} \left[ \delta_n(f,d)\right] = \arg \min_d \int_F \delta_n(f,d) \; dP(f|x_{1:n-1},y_{1:n-1})
+\f]
+This equation is the root of <em>Bayesian optimization</em>, where the
+Bayesian part comes from the fact that we are computing the
+expectation with respect to the posterior distribution, also called
+\a belief, over functions. Therefore, Bayesian optimization is a
+memory-based optimization algorithm.
+
+As commented before, most of the theory of Bayesian optimization is
+related to deterministic functions, we consider also stochastic
+functions, that is, we assume there might be a random error in the
+function output. In fact, evaluations can produce different outputs if
+repeated. In that case, the target function is the expected
+output. Furthermore, in a recent paper by \cite Gramacy2012 it has
+been shown that, even for deterministic functions, it is better to
+assume certain error in the observation. The main reason being that,
+in practice, there might be some mismodelling errors which can lead to
+instability of the recursion if neglected.
+
+\section modbopt Bayesian optimization general model
+
+In order to simplify the description, we are going to use a special
+case of Bayesian optimization model defined previously which
+corresponds to the most common application. In subsequent Sections we
+will introduce some generalizations for different applications.
+
+Without loss of generality, consider the problem of finding the
+minimum of an unknown real valued function \f$f:\mathbb{X} \rightarrow +\mathbb{R}\f$, where \f$\mathbb{X}\f$ is a compact space, \f$\mathbb{X} +\subset \mathbb{R}^d, d \geq 1\f$. Let \f$P(f)\f$ be a prior distribution
+over functions represented as a stochastic process, for example, a
+Gaussian process \f$\mathbf{x}i(\cdot)\f$, with inputs \f$x \in \mathbb{X}\f$ and an
+associate kernel or covariance function \f$k(\cdot,\cdot)\f$. Let also
+assume that the target function is a sample of the stochastic process
+\f$f \sim \mathbf{x}i(\cdot)\f$.
+
+In order to find the minimum, the algorithm has a maximum budget of
+\f$N\f$ evaluations of the target function \f$f\f$. The purpose of the
+algorithm is to find optimal decisions that provide a better
+performance at the end.
+
+One advantage of using Gaussian processes as a prior distributions
+over functions is that new observations of the target function
+\f$(x_i,y_i)\f$ can be easily used to update the distribution over
+functions. Furthermore, the posterior distribution is also a Gaussian
+process \f$\mathbf{x}i_i = \left[ \mathbf{x}i(\cdot) | x_{1:i},y_{1:i} +\right]\f$. Therefore, the posterior can be used as an informative prior
+for the next iteration in a recursive algorithm.
+
+In a more general setting, many authors have suggested to modify the
+standard zero-mean Gaussian process for different variations that
+include semi-parametric models \cite Huang06 \cite Handcock1993 \cite Jones:1998 \cite OHagan1992, use of hyperpriors on the parameters
+\cite MartinezCantin09AR \cite Brochu:2010c \cite Hoffman2011, Student
+t processes \cite Gramacy_Polson_2009 \cite Sacks89SS \cite Williams_Santner_Notz_2000, etc.
+
+We use a generalized linear model of the form:
+\f[
+  f(x) = \phi(\mathbf{x})^T \mathbf{w} + \epsilon(\mathbf{x})
+\f]
+where
+\f[
+  \epsilon(\mathbf{x}) \sim \mathcal{NP} \left( 0, \sigma^2_s (\mathbf{K}(\theta) + \sigma^2_n \mathbf{I}) \right)
+\f]
+The term \f$\mathcal{NP}\f$ means a non-parametric process, which can
+make reference to a Gaussian process \f$\mathcal{GP}\f$ or a Student's
+t process \f$\mathcal{TP}\f$. In both cases, \f$\sigma^2_n\f$ is the
+observation noise variance, sometimes called nugget, and it is problem
+specific. Many authors decide to fix this value \f$\sigma^2_n = 0\f$
+when the function \f$f(x)\f$ is deterministic, for example, a computer
+simulation. However, as cleverly pointed out in \cite Gramacy2012,
+there might be more reasons to include this term appart from being the
+observation noise, for example, to consider model inaccuracies.
+
+This model has been presented in different ways depending on the field
+where it was used:
+\li As a generalized linear model \f$\phi(\mathbf{x})^T\mathbf{w}\f$ with heteroscedastic
+perturbation \f$\epsilon(\mathbf{x})\f$.
+\li As a nonparametric process of the form \f$\mathcal{NP} \left(\phi(\mathbf{x})^T\mathbf{w}, +\sigma^2_s (\mathbf{K}(\theta) + \sigma^2_n \mathbf{I}) \right)\f$.
+\li As a semiparametric model \f$f(\mathbf{x}) = f_{par}(\mathbf{x}) + f_{nonpar}(\mathbf{x}) = +\phi(\mathbf{x})^T\mathbf{w} + \mathcal{NP}(\cdot)\f$
+
+\section modelopt Models and functions
+
 This library was originally developed for as part of a robotics
 research project \cite MartinezCantin09AR \cite MartinezCantin07RSS,
 where a Gaussian process with hyperpriors on the mean and signal
 bounded optimization, stochastic bandits, active learning for
 regression, etc.

-\section surrmod Surrogate models
+\subsection surrmod Surrogate models

 As seen in Section \ref modopt this library implements only one
 general regression model. However, we can assign a set of priors on
 Student's t distribution is robust to outliers and heavy tails in the
 data.

-\section kermod Kernel (covariance) models
+\subsection kermod Kernel (covariance) models

 One of the critical components of Gaussian and Student's t processes
 is the definition of the kernel function, which defines the
 relevance of the each feature in the input space. In the limit, this
 can be used for feature selection.

-\subsection singker Atomic kernels
+\subsubsection singker Atomic kernels
 \li "kConst": a simple constant function.
 \li "kLinear", "kLinearARD": a linear function.
 \li "kMaternISO1",
 \li "kRQISO": Rational quadratic kernel, also known as Student's t
 kernel.

-\subsection combker Binary kernels
+\subsubsection combker Binary kernels
 This kernels allow to combine some of the previous kernels.
 \li "kSum": Sum of kernels.
 \li "kProd": Product of kernels.
 function and 1 for the constant. If the vector of parameters have more
 or less than 6 elements, the system complains.

-\section parmod Parametric (mean) functions
+\subsection parmod Parametric (mean) functions

 Although the nonparametric process is able to model a large amount of
 funtions, we can model the expected value of the nonparametric process
 \li "mLinear": linear function.
 \li "mSum": binary function which can be used to combine other functions.

-\section critmod Selection criteria
+\subsection critmod Selection criteria

 As discussed in \ref introbopt, one of the critical aspects for
 Bayesian optimization is the decision (loss) function. Unfortunately,
 model. However, the library includes all the criteria for both
 distributions, and the system automatically selected the correct one.

-\subsection atomcri Atomic criteria
+\subsubsection atomcri Atomic criteria

 \li "cEI","cBEI","cEIa": The most extended and reliable algorithm is
 the Expected Improvement algorithm \cite Mockus78. In this case we
 applications \cite Marchant2012


-\subsection combcri Combined criteria
+\subsubsection combcri Combined criteria

 \li "cSum","cProd": Sum and product of different criteria functions.
 \li "cHedge", "cHedgeRandom": Bandit based selection of the best

 "cHedge(cSum(cEI,cDistance),cLCB,cPOI,cOptimisticSampling)"

-\section learnmod Methods for learning the kernel parameters
+\subsection learnmod Methods for learning the kernel parameters

 The posterior distribution of the model, which is necessary to compute
 the criterion function, cannot be computed in closed form if the
 hyperparameter. Since we assume that the hyperparameters are
 independent, we can apply priors selectively only to a small set.

-\section initdes Initial design methods
+\subsection initdes Initial design methods

 In order to build a suitable surrogate function, we a need a
 preliminar set of samples. In Bayesian optimization this is typically

# File doxygen/reference.dox

 - \subpage usemanual
 - \subpage demos
 - \subpage bopttheory
-- \subpage modelopt
 - \subpage contriblib

 */

# File doxygen/using.dox

+namespace bayesopt
+{
 /*! \page usemanual Using the library
 \tableofcontents

 The library is intended to be both fast and clear for development and
-research. At the same time, it allows great level of costumization and
+research. At the same time, it allows great level of customization and
 guarantees a high level of accuracy and numerical robustness.


 - Define the function to optimize.
 - Modify the parameters of the optimization process. In general, many
   problems can be solved with the default set of parameters, but some
-  of them will require some tunning.
+  of them will require some tuning.
    - The set of parameters and the default set can be found in
      parameters.h.
-   - In general most users will need to modify onlyare described in \ref basicparams.
+   - In general most users will need to modify only the parameters
+     described in \ref basicparams.
    - Advanced users should read \ref params for a full description of the parameters.
 - Set and run the corresponding optimizer (continuous, discrete,
 categorical, etc.). In this step, the corresponding restriction should

 \section basicparams Basic parameter setup

-Many users will only need to change the following parametes. Advanced
+Many users will only need to change the following parameters. Advanced
 users should read \ref params for a full description of the
 parameters.

   parameters. That is, kernel learning ocur 1 out of \em
   n_iter_relearn iterations. Ideally, the best precision is obtained
   when the kernel parameters are learned every iteration
-  (n_iter_relearn = 1). However, this \i learning part is
-  computationaly expensive and implies a higher cost per
-  iteration. [Default 50]
+  (n_iter_relearn=1). However, this \i learning part is
+  computationally expensive and implies a higher cost per
+  iteration. If n_iter_relearn=0, then there is no
+  relearning. [Default 50]

 - \b n_inner_iterations: (only for continuous optimization) Maximum
   number of iterations (per dimension!) to optimize the acquisition

 <HR>

-\section usage Using the library
+\section usage API description

-Here we show a brief summary of the different ways to use the library:
+Here we show a brief summary of the different ways to use the
+library. Basically, there are two ways to use the library based on
+your coding style:

-\subsection cusage C/C++ callback usage
+-Callback: The user sends a function pointer or handler to the
+ optimizer, following a prototype. This method is available for C/C++,
+ Python, Matlab and Octave.
+
+-Inheritance: This is a more object oriented method and allows more
+ flexibility. The user creates a module with his function, process,
+ etc. This module inherits one of BayesOpt models, depending if the
+ optimization is discrete or continuous, and overrides the \em
+ evaluateSample method. This method is available only for C++ and
+ Python.
+
+\subsection cusage C usage

 This interface is the most standard approach. Due to the large
 compatibility with C code with other languages it could also be used
 \code{.c}
 bopt_params initialize_parameters_to_default(void);
 \endcode
-and then, modify the necesary fields. For the non-numeric parameters,
+and then, modify the necessary fields. For the non-numeric parameters,
 there are a set of functions that can help to set the corresponding
 parameters:
 \code{.c}
 -For the continuous case:
 \code{.cpp}
 int bayes_optimization(int nDim, // number of dimensions
-		       eval_func f, // function to optimize
-		       void* f_data, // extra data that is transfered directly to f
-		       const double *lb, const double *ub, // bounds
-		       double *x, // out: minimizer
-		       double *minf, // out: minimum
-		       bopt_params parameters);
+                       eval_func f, // function to optimize
+                       void* f_data, // extra data that is transferred directly to f
+                       const double *lb, const double *ub, // bounds
+                       double *x, // out: minimizer
+                       double *minf, // out: minimum
+                       bopt_params parameters);
 \endcode

 -For the discrete case:
 \code{.cpp}
 int bayes_optimization_disc(int nDim, // number of dimensions
-		            eval_func f, // function to optimize
-			    void* f_data, // extra data that is transfered directly to f
-			    double *valid_x, size_t n_points, // set of discrete points
-			    double *x, // out: minimizer
-			    double *minf, // out: minimum
-			    bopt_params parameters);
+                            eval_func f, // function to optimize
+                            void* f_data, // extra data that is transferred directly to f
+                            double *valid_x, size_t n_points, // set of discrete points
+                            double *x, // out: minimizer
+                            double *minf, // out: minimum
+                            bopt_params parameters);
 \endcode

 -For the categorical case:
 \code{.cpp}
 int bayes_optimization_categorical(int nDim, // number of dimensions
-		 eval_func f, // function to optimize
-		 void* f_data, // extra data that is transfered directly to f
-		 int *categories, // array of size nDim with the number of categories per dim
-		 double *x, // out: minimizer
-		 double *minf, // out: minimum
-		 bopt_params parameters);
+                 eval_func f, // function to optimize
+                 void* f_data, // extra data that is transferred directly to f
+                 int *categories, // array of size nDim with the number of categories per dim
+                 double *x, // out: minimizer
+                 double *minf, // out: minimum
+                 bopt_params parameters);
 \endcode

+This interface catches all the expected exceptions and returns error
+codes for C compatibility.

-\subsection cppusage C++ inheritance usage
+\subsection cppusage C++ usage

-This is the most straighforward and complete method to use the
+Besides being able to use the library with the \ref cusage from C++,
+we can also take advantage of the object oriented properties of the
+language.
+
+This is the most straightforward and complete method to use the
 library. The object that must be optimized must inherit from one of
 the models defined in bayesopt.hpp.

 false and if it is valid, \i true). Note that the latter feature is
 experimental. There is no convergence guarantees if used.

-For example, with for a continous problem, we will define out optimizer as:
+For example, with for a continuous problem, we will define out optimizer as:
 \code{.cpp}
 class MyOptimization: public ContinuousModel
 {
 MyOptimization optimizer(params);

 //Set the bounds. This is optional. Default is [0,1]
+//Only required because we are doing continuous optimization
 optimizer.setBoundingBox(lowerBounds,upperBounds);

 //Collect the result in bestPoint
 optimizer.optimize(bestPoint);
 \endcode

+For discrete a categorical cases, we just need to inherit from the
+\ref DiscreteModel. Depending on the type of input we can use the
+corresponding constructor. In this case, the setBoundingBox
+step should be skipped.
+
 Optionally, we can also choose to run every iteration
 independently. See bayesopt.hpp and bayesoptbase.hpp

-\subsection pyusage Python callback/inheritance usage
+\subsection pyusage Python usage

-The file python/demo_quad.py provides examples of the two Python
-interfaces.
+The file python/demo_quad.py provides simple example of different ways
+to use the library from Python.

-\b Parameters: For both interfaces, the parameters are defined as a
-Python dictionary with the same structure as the bopt_params struct in
-the C/C++ interface. The enumerate values are replaced by strings
-without the prefix. For example, the C_EI criteria is replaced by the
-string "EI" and the M_ZERO mean function is replaced by the string
-"ZERO".
+1. \b Parameters: The parameters are defined as a Python dictionary
+with the same structure and names as the bopt_params struct in the
+C/C++ interface, with the exception of \em kernel.* and \em mean.*
+which are replaced by \em kernel_ and \em mean_ respectively. Also, C
+arrays are replaced with numpy arrays, thus there is no need to set
+the number of elements as a separate entry.

-The parameter dictionary can be initialized using
-\code{.py}
-parameters = bayesopt.initialize_params()
-\endcode
+There is no need to fill all the parameters. If any of the parameter
+is not included in the dictionary, the default value is included
+instead.

-however, this is not necesary in general. If any of the parameter is
-not included in the dictionary, the default value is included instead.
-
-\b Callback: The callback interface is just a wrapper of the C
-interface. In this case, the callback function should have the form
+2a. \b Callback: The callback interface is just a wrapper of the C
+interface. In this case, the callback function should have the prototype
 \code{.py}
 def my_function (query):
 \endcode
 where \em query is a numpy array and the function returns a double
 scalar.
-
-The optimization process can be called as
+
+The optimization process for a continuous model can be called as
 \code{.py}
-y_out, x_out, error = bayesopt.optimize(my_function, n_dimensions, lower_bound, upper_bound, parameters)
+y_out, x_out, error = bayesopt.optimize(my_function,
+              n_dimensions,
+              lower_bound,
+              upper_bound,
+              parameters)
 \endcode
 where the result is a tuple with the minimum as a numpy array (x_out),
-the value of the function at the minimum (y_out) and the error code.
+the value of the function at the minimum (y_out) and an error code.

-\b Inheritance: The object oriented construction is similar to the C++ interface.
+Analogously, the function for a discrete model is:
+\code{.py}
+y_out, x_out, error = bo.optimize_discrete(my_function,
+              x_set,
+              parameters)
+\endcode
+where x_set is an array of arrays with the valid inputs.
+
+And for the categorical case:
+\code{.py}
+y_out, x_out, error = bo.optimize_discrete(my_function,
+              categories,
+              parameters)
+\endcode
+where categories is an integer array with the number of categories per dimension.
+
+2b. \b Inheritance: The object oriented methodology is similar to the C++
+interface.

 \code{.py}
-class MyModule(bayesoptmodule.BayesOptModule):
-    def evalfunc(self,query):
-        """ My function """
+from bayesoptmodule import BayesOptContinuous
+
+class MyOptimization(BayesOptContinuous):
+    def __init__(self):
+        BayesOptContinuous.__init__(n_dimensions)
+
+    def evaluateSample(self,query):
+        """ My function here """
 \endcode

-The BayesOptModule include atributes for the parameters (\em params),
-number of dimensions (\em n) and bounds (\em lb and \em up).
-
 Then, the optimization process can be called as
 \code{.py}
-my_instance = MyModule()
-# set parameters, bounds and number of dimensions.
+import numpy as np
+
+my_opt = MyOptimization()
+
+# Set non-default parameters
+params["l_type"] = "L_MCMC"
+my_opt.params = params
+
+# Set the bounds. This is optional. Default is [0,1]
+# Only required because we are doing continuous optimization
+my_opt.lower_bound = #numpy array
+my_opt.upper_bound = #numpy array
+
+# Collect the results
 y_out, x_out, error = my_instance.optimize()
 \endcode
-wher the result is a tuple with the minimum as a numpy array (x_out),
-the value of the function at the minimum (y_out) and the error code.
+where the result is a tuple with the minimum as a numpy array (x_out),
+the value of the function at the minimum (y_out) and an error code.

-\subsection matusage Matlab/Octave callback usage
+For discrete a categorical cases, we just need to inherit from the
+bayesoptmodule.BayesOptDiscrete or
+bayesoptmodule.BayesOptCategorical. See bayesoptmodule.py. In this
+case, the "set bounds" step should be skipped.

-The file matlab/runtest.m provides an example of the Matlab/Octave
-interface.
+Note: For some "expected" error codes, a corresponding Python
+exception is raised. However, this exception is raised once the error
+code is found the Python environment, so it does not have track of any
+exception happening in the C++ part of the code.

-\b Parameters: The parameters are defined as a Matlab struct
-equivalent to bopt_params struct in the C/C++ interface, except for
-the \em theta and \em mu arrays which are replaced by Matlab
-vectors. Thus, the number of elements (\em n_theta and \em n_mu) are not
-needed. The enumerate values are replaced by strings without the
-prefix. For example, the C_EI criteria is replaced by the string "EI"
-and the M_ZERO mean function is replaced by the string "ZERO".
+\subsection matusage Matlab/Octave usage

-If any of the parameter is not included in the Matlab struct, the
-default value is automatically included instead.
+The file matlab/runtest.m provides an example of different ways to use
+BayesOpt from Matlab/Octave.

-\b Callback: The callback interface is just a wrapper of the C
-interface. In this case, the callback function should have the form
+The parameters are defined as a Matlab struct with the same structure
+and names as the bopt_params struct in the C/C++ interface, with the
+exception of \em kernel.* and \em mean.* which are replaced by \em
+kernel_ and \em mean_ respectively. Also, C arrays are replaced with
+vector, thus there is no need to set the number of elements as a
+separate entry.
+
+There is no need to fill all the parameters. If any of the parameter
+is not included in the Matlab struct, the default value is
+automatically included instead.
+
+The Matlab/Octave interface is just a wrapper of the C interface. In
+this case, the callback function should have the form
 \code{.m}
 function y = my_function (query):
 \endcode
-where \em query is a Matlab vector and the function returns a scalar.
+where \em query is a Matlab vector and the function returns a scalar
+value.

-The optimization process can be called (both in Matlab and Octave) as
+The optimization process can be run for continuous variables (both in
+Matlab and Octave) as
 \code{.m}
-[x_out, y_out] = bayesopt('my_function', n_dimensions, parameters, lower_bound, upper_bound)
+[x_out, y_out] = bayesoptcont('my_function',
+                        n_dimensions,
+                        parameters,
+                        lower_bound,
+                        upper_bound);
 \endcode
 where the result is the minimum as a vector (x_out) and the value of
 the function at the minimum (y_out).
+Analogously, the optimization process for discrete variables:
+\code{.m}
+[x_out, y_out] = bayesoptdisc('my_function',
+                              xset,
+                              parameters);
+\endcode
+and for categorical variables:
+\code{.m}
+[x_out, y_out] = bayesoptcat('my_function',
+                             categories,
+                             parameters);
+\endcode

 In Matlab, but not in Octave, the optimization can also be called with
-function handlers
+function handlers. For example:
 \code{.m}
-[x_out, y_out] = bayesopt(@my_function, n_dimensions, parameters, lower_bound, upper_bound)
+[x_out, y_out] = bayesoptcont(@my_function,
+                        n_dimensions,
+                        parameters,
+                        lower_bound,
+                        upper_bound)
 \endcode
-
 <HR>

 \section params Understanding the parameters
 or about the problem. Or, if the knowledge is not available, keep the
 model as general as possible (to avoid bias). In this part, knowledge
 about Gaussian processes or nonparametric models in general might be
-useful.
+useful.

-For example, with the parameters we can select the kind of kernel,
-mean function or surrogate model that we want to use. With the kernel
-we can play with the smoothness of the function and it's
-derivatives. The mean function can be use to model the overall trend
-(flat, linear, etc.). If we know the overall signal variance we better
-use a Gaussian process, if we don't, we should use a Student's t
-process instead.
+It is recommendable to read the page about \ref bopttheory in advance.

-For that reason, the parameters are bundled in a structure
-(C/C++/Matlab/Octave) or dictionary (Python), depending on the API
-that we use. This is a brief explanation of every parameter
+The parameters are bundled in a structure (C/C++/Matlab/Octave) or
+dictionary (Python), depending on the API that we use. This is a brief
+explanation of every parameter.

 \subsection budgetpar Budget parameters

-\li \b n_iterations: Maximum number of iterations of BayesOpt. Each
-iteration corresponds with a target function evaluation. This is
-related with the budget of the application [Default 300]
-\li \b n_inner_iterations: Maximum number of iterations of the inner
-optimization process. Each iteration corresponds with a criterion
-evaluation. The inner optimization results in the "most interest
-point" to evaluate the target function. In order to scale the process for
-increasing dimensional spaces, the actual number of iterations is this
-number times the number of dimensions. [Default 500]
-\li \b n_init_samples: BayesOpt requires an initial set of samples to
-learn a preliminary model of the target function. Each sample is also
-a target function evaluation. [Default 30]
-\li \b n_iter_relearn: Although most of the parameters of the model
-are updated after every iteration, the kernel parameters cannot be
-updated continuously as it has a very large computational overhead and
-it might introduce bias in the result. This represents the number of
-iterations between recomputing the kernel parameters. If it is set to
-0, they are only learned after the initial set of samples. [Default 0]
+This set of parameters have to deal with the number of evaluations or
+iterations for each step.
+
+- \b n_iterations: Number of iterations of BayesOpt. Each iteration
+  corresponds with a target function evaluation. In general, more
+  evaluations result in higher precision [Default 190]
+
+- \b n_iter_relearn: Number of iterations between re-learning kernel
+  parameters. That is, kernel learning ocur 1 out of \em
+  n_iter_relearn iterations. Ideally, the best precision is obtained
+  when the kernel parameters are learned every iteration
+  (n_iter_relearn=1). However, this \i learning part is
+  computationally expensive and implies a higher cost per
+  iteration. If n_iter_relearn=0, then there is no
+  relearning. [Default 50]
+
+- \b n_inner_iterations: (only for continuous optimization) Maximum
+  number of iterations (per dimension!) to optimize the acquisition
+  function (criteria). That is, each iteration corresponds with a
+  criterion evaluation. If the original problem is high dimensional or
+  the result is needed with high precision, we might need to increase
+  this value.  [Default 500]
+

 \subsection initpar Initialization parameters

-\li \b init_method: (unsigned integer value) For continuous
-optimization, we can choose among different strategies for the initial
-design (1-Latin Hypercube Sampling (LHS), 2-Sobol sequences (if available,
-see \ref mininst), Other-Uniform Sampling) [Default 1, LHS].
-\li \b random_seed: >=0 -> Fixed seed, <0 -> Time based (variable)
-seed. For debugging purposes, it might be useful to freeze the random
-seed. [Default 1, variable seed].
+Sometimes, BayesOpt requires an initial set of samples to learn a
+preliminary model of the target function. This parameter is important
+if n_iter_relearn is 0 or too high.
+
+- \b n_init_samples: Initial set of samples. Each sample requires a
+  target function evaluation. [Default 10]
+
+- \b init_method: (for continuous optimization only, unsigned integer)
+  There are different strategies available for the initial design:
+  [Default 1, LHS].
+   1. Latin Hypercube Sampling (LHS)
+   2. Sobol sequences
+   3. Uniform Sampling
+
+Random numbers are used frequently, from initial design, to MCMC,
+Thompson sampling, etc. They are based on the boost random number
+library.
+
+- \b random_seed: If this value is positive (including 0), then it is
+  used as a fixed seed for the boost random number generator. If the
+  value is negative, a time based (variable) seed is used. For
+  debugging or benchmarking purposes, it might be useful to freeze the
+  random seed. [Default -1, variable seed].


 \subsection logpar Logging parameters

-\li \b verbose_level: (unsigned integer value) Verbose level 0,3 ->
-warnings, 1,4 -> general information, 2,5 -> debug information, any
-other value -> only errors. Levels 0,1,2 -> send messages to
-stdout. Levels 3,4,5 -> send messages to a log file. [Default 1,
-general info->stdout].
-\li \b log_filename: Name of the log file (if applicable,
-verbose_level= 3, 4 or 5) [Default "bayesopt.log"]
+- \b verbose_level: (integer)
+  - Negative -> Error -> stdout
+  - 0 -> Warning -> stdout
+  - 1 -> Info -> stdout
+  - 2 -> Debug -> stdout
+  - 3 -> Error -> log file
+  - 4 -> Warning -> log file
+  - 5 -> Info -> log file
+  - >5 -> Debug -> log file
+
+- \b log_filename: Name/path of the log file (if applicable,
+  verbose_level>=3) [Default "bayesopt.log"]
+
+\subsection critpar Exploration/exploitation parameters
+
+This is the set of parameters that drives the sampling procedure to
+explore more unexplored regions or improve the best current result.
+
+- \b crit_name: Name of the sample selection criterion or a
+  combination of them. It is used to select which points to evaluate
+  for each iteration of the optimization process. Could be a
+  combination of functions like
+  "cHedge(cEI,cLCB,cPOI,cThompsonSampling)". See section critmod for
+  the different possibilities. [Default: "cEI"]
+
+- \b crit_params, \b n_crit_params: Array with the set of parameters
+  for the selected criteria. If there are more than one criterion, the
+  parameters are split among them according to the number of
+  parameters required for each criterion. If n_crit_params is 0, then
+  the default parameters are selected for each criteria. [Default:
+  n_crit_params = 0] For Matlab and Python, n_crit_params is not used,
+  instead the default value is an empty array.
+
+- \b epsilon: According to some authors \cite Bull2011, it is
+  recommendable to include an epsilon-greedy strategy to achieve near
+  optimal convergence rates. Epsilon is the probability of performing
+  a random (blind) evaluation of the target function. Higher values
+  implies forced exploration while lower values relies more on the
+  exploration/exploitation policy of the criterion [Default 0.0
+  (epsilon-greedy disabled)]

 \subsection surrpar Surrogate model parameters

-\li \b surr_name: Name of the hierarchical surrogate function
-(nonparametric process and the hyperpriors on sigma and w). See
-Section \ref surrmod for a detailed description. [Default
-"sGaussianProcess"]
-\li \b sigma_s: Signal variance (if known) [Default 1.0]
-\li \b noise: Observation noise/signal ratio. For computer simulations
-or deterministic functions, it should be close to 0. However, to avoid
-numerical instability due to model inaccuracies, make it always <0.
-[Default 0.0001]
-\li \b alpha, \b beta: Inverse-Gamma prior hyperparameters (if
-applicable) [Default 1.0, 1.0]
+The main advantage of Bayesian optimization over other optimization
+model is the use of a surrogate model. These parameters allow to
+configure it. See Section \ref surrmod for a detailed description.

-\subsection hyperlearn Hyperparameter learning
+- \b surr_name: Name of the hierarchical surrogate function
+  (nonparametric process and the hyperpriors on sigma and w).
+  [Default "sGaussianProcess"]
+
+- \b noise: Observation noise/signal ratio. [Default 1e-6]
+  - For stochastic functions (if several evaluations of the same point
+  produce different results) it should match as close as possible the
+  variance of the noise with respect to the variance of the
+  signal. Too much noise results in slow convergence while not enough
+  noise might result in not converging at all.
+  - For simulations and deterministic functions, it should be close to
+  0. However, to avoid numerical instability due to model inaccuracies,
+  make it always greater than 0. For example, between 1e-10 and 1e-14.
+
+- \b sigma_s: (only used for "sGaussianProcess" and
+  "sGaussianProcessNormal") Known signal variance [Default 1.0]
+
+- \b alpha, \b beta: (only used for "sStudentTProcessNIG")
+  Inverse-Gamma prior hyperparameters (if applicable) [Default 1.0,
+  1.0]
+
+\subsubsection meanpar Mean function parameters
+
+This set of parameters represents the mean function (or trend) of the
+surrogate model.
+
+- \b mean.name: Name of the mean function. Could be a combination of
+  functions like "mSum(mConst, mLinear)". See Section \ref parmod for
+  the different possibilities. [Default: "mConst"]
+
+- \b mean.coef_mean, \b mean.coef_std, \b mean.n_coef: Mean function
+  coefficients. [Default: "1.0, 1000.0, 1"]
+  - If the mean function is assumed to be known (like in
+    "sGaussianProcess"), then coef_mean represents the actual values
+    and coef_std is ignored.
+  - If the mean function has normal prior on the coeficients (like
+    "sGaussianProcessNormal" or "sStudentTProcessNIG") then both the
+    mean and std are used. The parameter mean.coef_std is a vector, it
+    does not consider correlations.
+  - For Matlab and Python, the parameters are called mean_coef_mean
+    and mean_coef_std and the number of elements is not needed.
+
+\subsubsection kernelpar Kernel parameters
+
+The kernel of the surrogate model represents the correlation between
+points, which is related to the smoothness of the prediction.
+
+- \b kernel.name: Name of the kernel function. Could be a combination
+  of functions like "kSum(kSEARD,kMaternARD3)". See Section \ref
+  kermod for the different posibilities. [Default: "kMaternARD5"]
+
+- \b kernel.hp_mean, \b kernel.hp_std, \b kernel.n_hp: Kernel
+  hyperparameters normal prior in the log space. That is, if the
+  hyperparameters are \f$\theta\f$, this prior is
+  \f$p(\log(\theta))\f$. Any "ilegal" standard deviation (std<=0)
+  results in a flat prior for the corresponding component.
+
+  - If there are more than one kernel (a compound kernel), the
+    parameters are split among them according to the number of
+    parameters required for each criterion.  [Default:1.0, 10.0, 1]
+
+  - ARD kernels require parameters for each dimension, if there are
+    only one dimension provided (like in the default), it is copied
+    for every dimension.
+
+  - For Matlab and Python, the parameters are called kernel_hp_mean
+    and kernel_hp_std and the number of elements is not needed.
+
+\paragraph hyperlearn Hyperparameter learning

 Although BayesOpt tries to build a full analytic Bayesian model for
-the surrogate function, some hyperparameters cannot be estimated in
-closed form. Currently, the only parameters of BayesOpt models that
-require special treatment are the kernel hyperparameters. See Section
-\ref learnmod for a detailed description
+the surrogate function, the kernel hyperparameters cannot be estimated
+in closed form. See Section \ref learnmod for a detailed description

-\li \b l_type: Learning method for the kernel
-hyperparameters. Currently, L_FIXED, L_EMPIRICAL and L_MCMC are
-implemented [Default L_EMPIRICAL]
-\li \b sc_type: Score function for the learning method. [Default
-SC_MAP]
+- \b l_type: Learning method for the kernel
+  hyperparameters. Currently, L_FIXED, L_EMPIRICAL and L_MCMC are
+  implemented [Default L_EMPIRICAL]

-
-\subsection critpar Exploration/exploitation parameters
-
-\li \b epsilon: According to some authors, it is recommendable to
-include an epsilon-greedy strategy to achieve near optimal convergence
-rates. Epsilon is the probability of performing a random (blind)
-evaluation of the target function. Higher values implies forced
-exploration while lower values relies more on the
-exploration/exploitation policy of the criterion [Default 0.0
-(disabled)]
-\li \b crit_name: Name of the sample selection criterion or a
-combination of them. It is used to select which points to evaluate for
-each iteration of the optimization process. Could be a combination of
-functions like "cHedge(cEI,cLCB,cPOI,cThompsonSampling)". See section
-critmod for the different possibilities. [Default: "cEI"]
-\li \b crit_params, \b n_crit_params: Array with the set of parameters
-for the selected criteria. If there are more than one criterion, the
-parameters are split among them according to the number of parameters
-required for each criterion. If n_crit_params is 0, then the default
-parameter is selected for each criteria. [Default: n_crit_params = 0]
-
-\subsection kernelpar Kernel parameters
-
-\li \b kernel.name: Name of the kernel function. Could be a
-combination of functions like "kSum(kSEARD,kMaternARD3)". See section
-kermod for the different posibilities. [Default: "kMaternISO3"]
-\li \b kernel.hp_mean, \b kernel.hp_std, \b kernel.n_hp: Kernel
-hyperparameters prior in the log space. That is, if the
-hyperparameters are \f$\theta\f$, this prior is \f$p(\log(\theta))\f$. Any
-"ilegal" standard deviation (std<=0) results in a maximum likelihood
-estimate. Depends on the kernel selected. If there are more than one,
-the parameters are split among them according to the number of
-parameters required for each criterion. [Default: "1.0, 10.0, 1" ]
-
-\subsection meanpar Mean function parameters
-
-\li \b mean.name: Name of the mean function. Could be a combination of
-functions like "mSum(mOne, mLinear)". See Section parmod for the different
-posibilities. [Default: "mOne"]
-\li \b mean.coef_mean, \b kernel.coef_std, \b kernel.n_coef: Mean
-function coefficients. The standard deviation is only used if the
-surrogate model assumes a normal prior. If there are more than one,
-the parameters are split among them according to the number of
-parameters required for each criterion. [Default: "1.0, 30.0, 1" ]
+- \b sc_type: Score function for the learning method. [Default SC_MAP]


 */
+
+}

# File include/parameters.h

     size_t init_method;
     int random_seed;             /**< >=0 -> Fixed seed, <0 -> Time based (variable). */

-    size_t verbose_level;        /**< 1-Error,2-Warning,3-Info. 4-6 log file*/
+    int verbose_level;           /**< Neg-Error,0-Warning,1-Info,2-Debug -> stdout
+				      3-Error,4-Warning,5-Info,>5-Debug -> logfile*/
     char* log_filename;          /**< Log file path (if applicable) */

     size_t load_save_flag;       /**< 1-Load data,2-Save data,

# File matlab/bayesoptextras.h

 static void struct_value(const mxArray *s, const char *name, double *result);
 static void struct_array(const mxArray *s, const char *name, size_t *n, double *result);
 static void struct_size(const mxArray *s, const char *name, size_t *result);
+static void struct_int(const mxArray *s, const char *name, int *result);
 static void struct_string(const mxArray *s, const char *name, char* result);

 static double user_function(unsigned n, const double *x,
       if(!(mxIsNumeric(val) && !mxIsComplex(val)
 	   && mxGetM(val) * mxGetN(val) == 1))
 	{
-	  mexErrMsgTxt("param fields must be real scalars");
+	  mexErrMsgTxt("param fields must be scalar");
 	}
       else
 	{
-	  *result = (size_t) mxGetScalar(val);
+	  *result = (size_t)(mxGetScalar(val));
 	}
     }
   else
   return;
 }

+void struct_size(const mxArray *s, const char *name, int *result)
+{
+  mxArray *val = mxGetField(s, 0, name);
+  if (val)
+    {
+      if(!(mxIsNumeric(val) && !mxIsComplex(val)
+	   && mxGetM(val) * mxGetN(val) == 1))
+	{
+	  mexErrMsgTxt("param fields must be scalar");
+	}
+      else
+	{
+	  *result = (int)(mxGetScalar(val));
+	}
+    }
+  else
+    {
+      mexPrintf("Field %s not found. Default not modified.\n", name);
+    }
+  return;
+}
+
+

 void struct_string(const mxArray *s, const char *name, char* result)
 {
   struct_size(params, "n_iter_relearn", &parameters.n_iter_relearn);

   struct_size(params, "init_method", &parameters.init_method);
-  struct_size(params, "random_seed", &parameters.random_seed);
+  struct_int(params, "random_seed", &parameters.random_seed);

-  struct_size(params, "verbose_level", &parameters.verbose_level);
+  struct_int(params, "verbose_level", &parameters.verbose_level);
   struct_string(params, "log_filename", parameters.log_filename);

   struct_string(params, "surr_name", parameters.surr_name);

# File python/bayesopt.cpp

-/* Generated by Cython 0.19 on Thu May  1 01:12:59 2014 */
+/* Generated by Cython 0.19 on Thu May  8 16:06:36 2014 */

 #define PY_SSIZE_T_CLEAN
 #ifndef CYTHON_USE_PYLONG_INTERNALS

# File python/bayesopt.pyx

         unsigned int n_iter_relearn
         unsigned int init_method
         int random_seed
-        unsigned int verbose_level
+        int verbose_level
         char* log_filename
         unsigned int load_save_flag
         char* load_filename

# File python/bayesoptmodule.py

 ## Python Module for BayesOptContinuous
 #
 # Python module to get run BayesOpt library in a OO pattern.
-# The objective module should inherit this one and override evalfunc.
-class BayesOptContinuous:
+# The objective module should inherit this one and override evaluateSample.
+class BayesOptContinuous(object):

     ## Let's define the parameters.
     #
         ## n dimensions
         self.n_dim = n_dim
         ## Lower bounds
-        self.lower_bound = np.zeros((self.n_dim,))
+        self.lb = np.zeros((self.n_dim,))
         ## Upper bounds
-        self.upper_bound = np.ones((self.n_dim,))
+        self.ub = np.ones((self.n_dim,))
+
+    @property
+    def parameters(self):
+        return self.params
+
+    @parameters.setter
+    def parameters(self,params):
+        self.params = params
+
+    @property
+    def lower_bound(self):
+        return self.lb
+
+    @lower_bound.setter
+    def lower_bound(self,lb):
+        self.lb = lb
+
+    @property
+    def upper_bound(self):
+        return self.ub
+
+    @upper_bound.setter
+    def upper_bound(self,ub):
+        self.ub = ub

     ## Function for testing.
     # It should be overriden.
-    def evalfunc(self, x_in):
+    def evaluateSample(self, x_in):
         raise NotImplementedError("Please Implement this method")

     ## Main function. Starts the optimization process.
     def optimize(self):
-        min_val, x_out, error = bo.optimize(self.evalfunc, self.n_dim,
-                                            self.lower_bound, self.upper_bound,
+        min_val, x_out, error = bo.optimize(self.evaluateSample, self.n_dim,
+                                            self.lb, self.ub,
                                             self.params)

         return min_val, x_out, error
 ## Python Module for BayesOptDiscrete
 #
 # Python module to get run BayesOpt library in a OO pattern.
-# The objective module should inherit this one and override evalfunc.
+# The objective module should inherit this one and override evaluateSample.
 class BayesOptDiscrete:

     ## Let's define the parameters.
                 raise ValueError
             else:
                 self.x_set = np.random.rand(n_samples, n_dim)
+
+    @property
+    def parameters(self):
+        return self.params
+
+    @parameters.setter
+    def parameters(self,params):
+        self.params = params
+

     ## Function for testing.
     # It should be overriden.
-    def evalfunc(self, x_in):
+    def evaluateSample(self, x_in):
         raise NotImplementedError("Please Implement this method")

     ## Main function. Starts the optimization process.
     def optimize(self):
-        min_val, x_out, error = bo.optimize_discrete(self.evalfunc,
+        min_val, x_out, error = bo.optimize_discrete(self.evaluateSample,
                                                     self.x_set,
                                                     self.params)

 ## Python Module for BayesOptCategorical
 #
 # Python module to get run BayesOpt library in a OO pattern.
-# The objective module should inherit this one and override evalfunc.
+# The objective module should inherit this one and override evaluateSample.
 class BayesOptCategorical:

     ## Let's define the parameters.
         ## Library parameters
         self.params = {}
         self.categories = categories
+
+    @property
+    def parameters(self):
+        return self.params
+
+    @parameters.setter
+    def parameters(self,params):
+        self.params = params
+

     ## Function for testing.
     # It should be overriden.
-    def evalfunc(self, x_in):
+    def evaluateSample(self, x_in):
         raise NotImplementedError("Please Implement this method")

     ## Main function. Starts the optimization process.
     def optimize(self):
-        min_val, x_out, error = bo.optimize_categorical(self.evalfunc,
+        min_val, x_out, error = bo.optimize_categorical(self.evaluateSample,
                                                         self.categories,
                                                         self.params)


# File python/demo_distance.py

 # ------------------------------------------------------------------------

 import bayesopt
-import bayesoptmodule
+from bayesoptmodule import BayesOptContinuous
 import numpy as np

 from time import clock
     return total

 # Class for OO testing.
-class BayesOptTest(bayesoptmodule.BayesOptContinuous):
-    def evalfunc(self,Xin):
+class BayesOptTest(BayesOptContinuous):
+    def evaluateSample(self,Xin):
         return testfunc(Xin)


 # For different options: see parameters.h and cpp
 # If a parameter is not define, it will be automatically set
 # to a default value.
-params = {} #bayesopt.initialize_params()
+params = {}
 params['n_iterations'] = 50
 params['n_init_samples'] = 20
 params['crit_name'] = "cSum(cEI,cDistance)"

 print "OO implementation"
 bo_test = BayesOptTest(n)
-bo_test.params = params
-bo_test.n = n
-bo_test.lb = lb
-bo_test.ub = ub
+bo_test.parameters = params
+bo_test.lower_bound = lb
+bo_test.upper_bound = ub

 start = clock()
 mvalue, x_out, error = bo_test.optimize()

# File python/demo_multiprocess.py


         return

-    def evalfunc(self, x):
+    def evaluateSample(self, x):
         self.pipe.send(x)
         result = self.pipe.recv()
         return result


 if __name__ == '__main__':
+    params = {
+        'n_iterations' : 50,
+        'n_init_samples' : 20,
+        's_name' : "sGaussianProcessNormal",
+        'c_name' : "cHedge(cEI,cLCB,cExpReturn,cOptimisticSampling)"
+    }
+
     pipe_par, pipe_child = Pipe()

     bo = BayesOptProcess(pipe_child,n_dim=5)
-    bo.params['n_iterations'] = 50
-    bo.params['n_init_samples'] = 20
-    bo.params['s_name'] = "sGaussianProcessNormal"
-    bo.params['c_name'] = "cHedge(cEI,cLCB,cExpReturn,cOptimisticSampling)"
+    bo.parameters = params

     p = Process(target=worker, args=(pipe_par,))


 # ------------------------------------------------------------------------

 import bayesopt
-import bayesoptmodule
+from bayesoptmodule import BayesOptContinuous
 import numpy as np

 from time import clock
     return total

 # Class for OO testing.
-class BayesOptTest(bayesoptmodule.BayesOptContinuous):
-    def evalfunc(self,Xin):
+class BayesOptTest(BayesOptContinuous):
+    def evaluateSample(self,Xin):
         return testfunc(Xin)



 print "OO implementation"
 bo_test = BayesOptTest(n)
-bo_test.params = params
+bo_test.parameters = params
 bo_test.lower_bound = lb
 bo_test.upper_bound = ub


# File src/bayesoptbase.cpp


     mModel.reset(PosteriorModel::create(dim,parameters,mEngine));

-    size_t verbose = mParameters.verbose_level;
+    int verbose = mParameters.verbose_level;
     if (verbose>=3)
       {
 	FILE* log_fd = fopen( mParameters.log_filename , "w" );

# File src/gaussian_process_ml.cpp

 				       MeanModel& mean, randEngine& eng):
     HierarchicalGaussianProcess(dim, params, data, mean, eng)
   {
-    mSigma = params.sigma_s;
     d_ = new GaussianDistribution(eng);
   }  // Constructor


# File src/student_t_process_jef.cpp

 						   randEngine& eng):
     HierarchicalGaussianProcess(dim, params, data, mean, eng)
   {
-    mSigma = params.sigma_s;
     d_ = new StudentTDistribution(eng);
   }  // Constructor