Checkpoint recovery is nonfunctional in SimFactory 2 (it has broken since it was last fixed in ticket
Using the attached parameter file, I submit a simulation on Datura:
simfactory2/bin/sim --machine datura --config sim2_datura create-submit parfiles/cptest.par 12 1:00:00
This parameter file terminates the Cactus run after 1 minute and dumps a checkpoint file. I then manually remove the output-0000-active symlink, as the automatic cleanup in the main() function is cleaning up restarts that are attempting to run, so I have disabled it, and manual cleanup doesn't work (see ticket
I then resubmit the simulation
simfactory2/bin/sim --machine datura submit parfiles/cptest.par
and observe that the checkpoint files from the first restart are never hardlinked into the output directory. The job does not recover, and instead starts from initial data.
Log file is attached.
Looking at the code, it appears that the checkpoint linking is conditional on the from-restart-id parameter being passed to simfactory, which I think is something to do with job-chaining. I can't see anywhere in the code which sets this option, so this is probably why the linking is not happening.