- edited description
Default optimizer: "L-BFGS-B" or "BFGS"?
I do not quite remember why we switched the optimizer from 'BFGS' in v0.4 to 'L-BFGS-B' in the development version here?
L-BFGS-B apparently uses less memory, but we should not have memory issue since we have limited number of parameters. Instead, I would expect that in another aspect "BFGS" be more exact (and probably better).
When I try v0.4 with BFGS and development version with L-BFGS-B, I am seeing that v0.4 reduces the function more smoothly than the development version. That might be due to the optimizer, though I am not sure at this point.
Comments (6)
-
reporter -
reporter I am blamed for it in the commit 51b1fd7, but I don't remember the reason!
-
repo owner And you wonder why I nag you to write descriptive commit messages! :)
-
reporter - attached BFGS.txt
- attached L-BFGS-B.txt
:)) Yes, you are right, I should try to write more descriptive messages. Maybe since L-BFGS-B was newer than BFGS, I thought it should be more improved!
I gave it a shot on a large dataset of about 3500 images and two chemical elements. Starting from the same sets of initial parameters, L-BFGS-B reduces the loss function to 70.98, but BFGS reduces it to 1.37, much smaller than L-BFGS-B. Attached are the log texts.
The wikipedia page says that L-BFGS-B "is particularly suited to problems with very large numbers of variables (e.g., >1000)", which is roughly the case for 4 chemical elements and 10-10-10-10 hiddenlayers, and so we will not encounter it that much.
In addition, if BFGS hits memory issue, a message will appear and then we can think of how to reduce the memory use (e.g. use L-BFGS-B instead of BFGS). I wouldn't concern about the memory unless it becomes an issue. Having said all that, I vote for switching back to BFGS.
-
repo owner Sounds good -- go ahead and switch back.
-
reporter - changed status to resolved
Finally we decided to switch back the optimizer to BFGS. This was done in the commit 1f18182.
- Log in to comment