Levenberg Marquardt optimizer

andrew_peterson repo owner

That sounds interesting.

@muammar and I were just discussing how it would be better if our code had a more unified interface to the optimizer. That is, if you can wrap any optimizer to look like scipy's fmin_bfgs, it will work. (Actually, this is basically the behavior now, it is just not so well documented.) Our current optimizers need the loss function and the partial derivatives of the loss function with respect to the free parameters. What exactly is the residual array?

Feel free to make a branch to try it out with. Our philosophy is python first, then fortran. That is, we implement everything in python where it's easy to debug and play with. After we're satisfied, we make a fortran version of the compute-heavy routines, and try to make the variables look the same in both the python and fortran versions.

Also, be sure to use pyflakes and pep8 to make sure your code fits the formatting. Let us know if you have questions.

2017-01-23T20:56:20+00:00

Geng Sun reporter

Hello, sorry for late response, since I was on holiday recently.

Firstly, If I understand it correctly, the residual array for Levenberg Marquardt method means the array of the differences between predicted and observed values. This interface is same as the leastsq in scipy.optimize https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq

 scipy.optimize.leastsq(func, x0, args=(), Dfun=None, full_output=0, col_deriv=0, ftol=1.49012e-08, xtol=1.49012e-08, gtol=0.0, maxfev=0, epsfcn=None, factor=100, diag=None)[source]

And func is defined as

func(params) = ydata - f(xdata, params)

so that the objective function is

  min sum((ydata - f(xdata, params))**2, axis=0)
params

So the difference between this method and other optimizers is that the residual energy or force of every image should be return in the get_loss function instead of returning only the total losses.

Now I can only use this LM method for training with a single core and without force, so the system is quite small. I still need some time to figure out how to use parallelized training with this method.

2017-01-31T20:57:32+00:00

Alireza Khorshidi

If you want to feed the residuals of every single image to the optimizer, then I think you need to modify the calculate_loss method here in python and here in fortran.

Like @andrewpeterson suggested, try to first get a working version of pure python as you desire, and then implement it inside fortran.

2017-02-01T18:40:54+00:00

andrew_peterson repo owner

Scipy has now added a unified interface to their optimizers, through the scipy.optimize.minimize function:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize

We should switch our code so that it just uses this interface instead of trying to use all the different possible methods individually. We will need to update the dependency on scipy to whatever version included that method.

2018-03-28T15:04:24+00:00

Comments (4)