coam workflow on bluewaters

Issue #8 resolved
Ardita Shkurti created an issue

This is also failing now, it was not failing when I uploaded it a week ago ...

ardita@moriarty 210% python extasy_amber_coco.py --RPconfig bluewaters.rcfg --Kconfig cocoamber.wcfg |& tee extasy.log

================================================================================
 EnsembleMD (0.3.14-27-g65bc062)                                                
================================================================================

Starting Allocation                                                           ok
Verifying pattern                                                             ok
Starting pattern execution                                                    ok
--------------------------------------------------------------------------------
Executing simulation-analysis loop with 2 iterations on 64 allocated core(s) on 'ncsa.bw'

Job waiting on queue...
Job is now running !
Iteration 1: Waiting for 16 simulation tasks: custom.amber to complete2016-03-17 13:49:00,529: radical.enmd.simulation_analysis_loop.static.default: MainProcess                     : Thread-4       : ERROR   : ComputeUnit error: STDERR: , STDOUT: 
2016-03-17 13:49:00,530: radical.enmd.simulation_analysis_loop.static.default: MainProcess                     : Thread-4       : ERROR   : Pattern execution FAILED.
2016-03-17 13:49:00,530: radical.pilot       : MainProcess                     : Thread-4       : ERROR   : unit manager controller thread caught system exit -- forcing application shutdown
Traceback (most recent call last):
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/controller/unit_manager_controller.py", line 262, in run
    self.call_unit_state_callbacks(unit_id, new_state)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/controller/unit_manager_controller.py", line 199, in call_unit_state_callbacks
    cb(self._shared_data[unit_id]['facade_object'], new_state)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 318, in unit_state_cb
    sys.exit(1)
SystemExit: 1
Execution interuptedTraceback (most recent call last):
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 496, in execute_pattern
    resource._umgr.wait_units(uids)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/unit_manager.py", line 698, in wait_units
    time.sleep (0.5)
KeyboardInterrupt

Starting Deallocation..
2016-03-17 13:50:01,286: radical.enmd.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Resource error: 
2016-03-17 13:50:01,286: radical.enmd.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Pattern execution FAILED.
2016-03-17 13:50:01,286: radical.pilot       : MainProcess                     : Thread-1       : ERROR   : sys.exit from callback
Traceback (most recent call last):
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 258, in call_callbacks
    cb(self._shared_data[pilot_id]['facade_object'](), new_state)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 168, in pilot_state_cb
    sys.exit(1)
SystemExit: 1
Traceback (most recent call last):
  File "extasy_amber_coco.py", line 221, in <module>
    cluster.deallocate()
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 117, in deallocate
    self._session.close(cleanup=self._cleanup)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/session.py", line 304, in close
    pmgr.close (terminate=terminate)
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 176, in close
    self.wait_pilots()
  File "/users/ardita/ExTASY_0.2-tools_9Mar16/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 532, in wait_pilots
    time.sleep(0.5)
KeyboardInterrupt
^C^Z
Suspended 

Comments (1)

  1. Ardita Shkurti reporter

    The workflow failed to carry out any amber simulations. @vivek-balasubramanian has recompiled amber binaries. After retrying the workflow it now seems to work - without any issues at all regarding coco - differently from the gmxcoco workflow!

  2. Log in to comment