# Case study: bioreactor (4D input)

Authors: JuanPi Carbajal

## Introduction

The objective is to emulate a DAE system. Currently we focus on emulation of the algebraic part. For details and discussions see the previous report.

Source code committed on: 7a41f8766bc8

### Model

We considered a simulator that takes a 4-dimensional input and returns the pH value.

The dataset consist of 10'000 points sampled using a Latin hypercube sampler.

## Decorrelating inputs

The first step is to see if input dimensionality can be reduced. To achieve this we search for an unitary transformation of the input vector that maximizes total correlation with the output and gives a correlation vector that is sparse.

To achieve this we solve the following minimization

\begin{equation*} A = \min_{A \in \mathbb{U}^{n\times n}} -(1 - \theta) \Vert A\boldsymbol{r} \Vert_2 + \theta \Vert A\boldsymbol{r} \Vert_1 \end{equation*}

where $$\boldsymbol{r}$$ is the vector of correlation coefficients between inputs and outputs and $$\mathbb{U}^{n\times n}$$ is the set of square unitary matrices. The weight $$\theta$$ balances between maximizing the correlation and the sparseness of the correlation vector.

See the script s_data4d.m

The correlation in the original variables is $$\begin{bmatrix}0.58 & -0.59 & -0.00 & 0.46\end{bmatrix}^\top$$, and after the optimization the correlation in transformed variables is $$\begin{bmatrix}0.93 & -0.09 & 0.06 & 0.05\end{bmatrix}^\top$$ These value will change with each execution of the s_data4d.m script. Also, since we use bagging to minimize the number of relevant inputs sometimes you might get a less sparse correlation vector, meaning that more transformed variables will correlate with the output.

The plots of output vs. the original and the new variables is shown in the following figures: Plots of outputs vs. original inputs. Plots of outputs vs. transformed inputs.

The results indicates that it is worth trying a 1D or 2D interpolation. Alternatively a Gaussian Process with an automatic relevance determination covariance function should work pretty well.

## GP regression

To train a GP on the data we split it into three randomly sampled subsets of points:

Reserved set (rsv)
1'000 points that will be not used for any adjustment of the GP. the ya re used at the very end to estimate of the error on true unseen inputs.
Training set (trn)
Maximum of 500 points used for adapting hyperparameters of the GP.
Test set (tst)
All the points that are not in the training or reserved set.

The scripts s_data4d_GP.m and s_data4d_GP_greedy.m implement the functionality described in this section.

Source code committed on: 0ef36c1c8361

### Covariance structure

The covariance structure of the GP using to regress the data is the following

\begin{align*} K(\boldsymbol{x},\boldsymbol{z}, \ldots) &= \sigma^2 \left[ \theta_R \operatorname{R}\left(x_1,z_1, \boldsymbol{\mu}_R\right) + \theta_{SE} \operatorname{SE}\left(\boldsymbol{x},\boldsymbol{z}, \boldsymbol{\mu}_{SE}\right) \right]\\ \operatorname{R}(x,z,\boldsymbol{\mu}) &= \left[ \frac{2 \mu_2}{1 + \left( \frac{x-z}{\mu_1} \right)^2} \right]^{\mu_2}\\ \operatorname{SE}(\boldsymbol{x},\boldsymbol{z},\boldsymbol{\mu}) &= \exp \left[- \left(\boldsymbol{x}-\boldsymbol{z}\right)^\top \operatorname{diag}(\boldsymbol{\mu}^2)^{-1}\left(\boldsymbol{x}-\boldsymbol{z}\right)\right] \end{align*}

Where $$\operatorname{R}$$ is the Rational Quadratic covariance function in the first input variable and $$\operatorname{SE}$$ is the Squared Exponential covariance function with automatic relevance determination. The weights $$\theta_R, \theta_{SE}$$ have a smooth box prior distribution in $$\begin{bmatrix}0.5,1\end{bmatrix}$$ and $$\begin{bmatrix}0,0.5\end{bmatrix}$$, respectively. The assumption is that the rational function is assumed to be the most relevant term, while the squared exponential term is just a correction.

### Fixed training set

In this approach we train the GP using a fixed training set, obtaining the following results:

--- GP training result ---
Log. Marginal Likelihood (N= 498): 222
L2-norm error on test set        : 0.11 (~1.5%)
Max. abs. error on test set      : 0.97@7.7 (~13%)
Runtime test set (N=8502)        : 0.99 s
--------------------------
L2-norm error on reserved set   : 0.11 (~1.5%)
Max. abs. error on reserved set : 0.85@8.3 (~10%) Results of GP training. Blue: reserved inputs. Red: predicted

In this case the optimization of hyperparameters finds $$\theta_R, \theta_{SE} = 0.73, 0.08$$ which is consistent with the assumption on the covariance structure.

These results very from evaluation to evaluation, because the training set is created randomly.

They were generated using the following code:

clear all;
verbose = true;
s_data4d_GP


### Greedy training set

In this approach we train the GP with an adaptive training set. Several points in the test set with the highest relative error are moved to the training set. Selecting the 5 highest error points we achieve test error of about 5% with 340 points:

--- GP training result ---
Log. Marginal Likelihood (N= 340): -134
L2-norm error on test set        : 0.12 (~1.6%)
Max. abs. error on test set      : 0.49@8.8 (~5.5%)
Runtime test set (N=9660)        : 1.2 s
--------------------------
L2-norm error on reserved set   : 0.12 (~1.6%)
Max. abs. error on reserved set : 0.45@8.8 (~5.1%) Results of GP training with greedy algorithm. Blue: reserved inputs. Red: predicted

In this case the optimization of hyperparameters finds $$\theta_R, \theta_{SE} = 0.84, 0.09$$ which s consistent with the assumption on the covariance structure.

The results were generated with the following code:

clear all;
s_data4d_GP_greedy


Updated