ensemblemd generic examples fail on workflows.iu due to SSH
Issue #12
resolved
As discussed, if I run the first generic example workflow, it is failing to log in to localhost, when it should be using fork...
(extasy-test)[ibethune@workflow generic]$ RADICAL_PILOT_DBURL=mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot python multiple_simulations_single_analysis.py
================================================================================
EnsembleMD (0.3.14)
================================================================================
Starting Allocation2016-05-11 16:26:00,651: radical.saga.pty : MainProcess : MainThread : ERROR : prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match))
Traceback (most recent call last):
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 313, in _initialize_pty
% match)
AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match))
2016-05-11 16:26:00,652: radical.enmd.SingleClusterEnvironment: MainProcess : MainThread : ERROR : Fatal error during resource allocation: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match)).
Traceback (most recent call last):
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 181, in allocate
self._pilot = pmgr.submit_pilots(pdesc)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 361, in submit_pilots
resource_config=resource_cfg)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 433, in register_start_pilot_request
shell = sup.PTYShell(url, self._session)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 244, in __init__
posix=self.posix)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 196, in initialize
self._initialize_pty (info['pty'], info)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 411, in _initialize_pty
raise ptye.translate_exception (e)
AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match))
Allocation failed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match))Traceback (most recent call last):
File "multiple_simulations_single_analysis.py", line 83, in <module>
cluster.allocate()
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 181, in allocate
self._pilot = pmgr.submit_pilots(pdesc)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 361, in submit_pilots
resource_config=resource_cfg)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 433, in register_start_pilot_request
shell = sup.PTYShell(url, self._session)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 244, in __init__
posix=self.posix)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 196, in initialize
self._initialize_pty (info['pty'], info)
File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 411, in _initialize_pty
raise ptye.translate_exception (e)
saga.exceptions.AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty) : % match))
Comments (9)
-
reporter -
This is actually better, as it does not use ssh, but... :/ Would you mind trying to remove
rm -rf $HOME/.saga
, and try again, please?Thanks, Andre.
-
reporter After removing .saga - ran again, still failing, but a different error:
(test2)[ibethune@workflow generic]$ python multiple_simulations_multiple_analysis.py ================================================================================ EnsembleMD (0.4-RC0) ================================================================================ Starting Allocation ok Verifying pattern ok Starting pattern execution ok -------------------------------------------------------------------------------- Executing simulation-analysis loop with 2 iterations on 1 allocated core(s) on 'local.localhost' Job waiting on queue...2016-05-13 05:04:44,633: radical.saga : MainProcess : PilotLauncherWorker-1: ERROR : BadParameter: 'JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job 2016-05-13 05:04:44,744: radical.pilot : MainProcess : PilotLauncherWorker-1: ERROR : Using bootstrapper /home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh Copying bootstrapper 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh' to agent sandbox (<saga.filesystem.directory.Directory object at 0x7f9afc409950>). Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/utils/radical.utils-0.40.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/). Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/saga/saga-python-0.40.2.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/). Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/..//radical.pilot-0.40.1.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/). Writing agent configuration to file '/tmp/rp_agent_cfg_dirjhy94L/agent_0.cfg'. Copying agent configuration file 'file://localhost/tmp/rp_agent_cfg_dirjhy94L/agent_0.cfg' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/). Submitting SAGA job with description: {'Project': 'None', 'Executable': '/bin/bash', 'TotalPhysicalMemory': None, 'WorkingDirectory': '/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/', 'Queue': None, 'Environment': {}, 'WallTimeLimit': 15, 'Arguments': ['-l bootstrap_1.sh', " -d 'radical.utils-0.40.tar.gz:saga-python-0.40.2.tar.gz:radical.pilot-0.40.1.tar.gz' -m 'create' -p 'pilot.0000' -r 'debug' -s 'rp.session.workflow.iu.xsede.org.ibethune.016934.0003' -v '/home/ibethune/radical.pilot.sandbox/ve_localhost' -b 'default' -a 'multicore'"], 'ProcessesPerHost': None, 'Error': 'bootstrap_1.err', 'Output': 'bootstrap_1.out', 'CandidateHosts': None, 'TotalCPUCount': 1} Pilot launching failed! ('JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job (/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py +300 (create_job) : raise se.BadParameter._log (self._logger, msg))) Traceback (most recent call last): File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 781, in run pilotjob = js.create_job(jd) File "/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py", line 300, in create_job raise se.BadParameter._log (self._logger, msg) BadParameter: 'JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job (/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py +300 (create_job) : raise se.BadParameter._log (self._logger, msg)) 2016-05-13 05:04:44,756: radical.entk.SingleClusterEnvironment: MainProcess : Thread-1 : ERROR : Resource error: 2016-05-13 05:04:44,756: radical.entk.SingleClusterEnvironment: MainProcess : Thread-1 : ERROR : Pattern execution FAILED. 2016-05-13 05:04:44,756: radical.pilot : MainProcess : Thread-1 : ERROR : sys.exit from callback Traceback (most recent call last): File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 258, in call_callbacks cb(self._shared_data[pilot_id]['facade_object'](), new_state) File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 168, in pilot_state_cb sys.exit(1) SystemExit: 1 2016-05-13 05:04:44,795: radical.entk.SingleClusterEnvironment: MainProcess : MainThread : ERROR : Fatal error during execution: . Fatal error during execution: . Starting Deallocation.. 2016-05-13 05:04:44,795: radical.entk.SingleClusterEnvironment: MainProcess : MainThread : ERROR : Fatal error during execution: . Fatal error: . File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 312, in run plugin.execute_pattern(pattern, self) File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 342, in execute_pattern resource._pmgr.wait_pilots(resource._pilot.uid,'Active') File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 532, in wait_pilots time.sleep(0.5) done
FWIW, it's working to ARCHER.
-
It seems like the 'project' is set to the literal string 'none' instead of the Python defined 'None'. Could that be a typo in a config file?
-
reporter Correct, this is a bug in the config.json file. Instead of "None" it should be "" or null, in which case it works fine.
-
- changed status to resolved
→ <<cset ace07fe0517d>>
-
- changed status to open
-
reporter Still (another) bug in the config.json (a double comma).
-
- changed status to resolved
fixed
- Log in to comment
Testing again with EnMD 0.4-RC0, but fails with a different error: