- changed status to resolved
Loss function issues (in parallel)
The loss function doesn't work, at least in parallel mode. I see at least three things:
(1) Note the output file tells you where to look for error messages. E.g.,
Establishing worker sessions.
Session 0 (localhost): stderr written to /tmp/tmpHdIjCs.stderr
Session 1 (localhost): stderr written to /tmp/tmpu30vpf.stderr
Session 2 (localhost): stderr written to /tmp/tmpriok2I.stderr
Session 3 (localhost): stderr written to /tmp/tmpqgzQK9.stderr
Look in those files to see why it doesn't seem to work "for some anonymous reason". The reason(s) are in those files.
(2) The new keywords ("energy_coefficient", "force_coefficient") need to be added to the parameters dictionary; this is what is passed to the workers. See the example for convergence. Only store parameters that you do not want passed to the workers outside this dictionary (e.g., cores). You can see this is the immediate reason it crashes in those tmp stderr files above.
(3) The __call__
method was renamed; this is how the workers call the function, so it cannot be renamed.
Comments (1)
-
- Log in to comment
Thanks for the comments. It helped me to fix it in the commit 9f37f7b.