ensemblemd generic examples fail on workflows.iu due to SSH

Issue #12 resolved
Iain Bethune created an issue

As discussed, if I run the first generic example workflow, it is failing to log in to localhost, when it should be using fork...

(extasy-test)[ibethune@workflow generic]$ RADICAL_PILOT_DBURL=mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot python multiple_simulations_single_analysis.py 

================================================================================
 EnsembleMD (0.3.14)                                                            
================================================================================

Starting Allocation2016-05-11 16:26:00,651: radical.saga.pty    : MainProcess                     : MainThread     : ERROR   : prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match))
Traceback (most recent call last):
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 313, in _initialize_pty
    % match)
AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match))
2016-05-11 16:26:00,652: radical.enmd.SingleClusterEnvironment: MainProcess                     : MainThread     : ERROR   : Fatal error during resource allocation: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match)).
Traceback (most recent call last):
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 181, in allocate
    self._pilot = pmgr.submit_pilots(pdesc)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 361, in submit_pilots
    resource_config=resource_cfg)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 433, in register_start_pilot_request
    shell = sup.PTYShell(url, self._session)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 244, in __init__
    posix=self.posix)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 196, in initialize
    self._initialize_pty (info['pty'], info)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 411, in _initialize_pty
    raise ptye.translate_exception (e)
AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match))
Allocation failed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match))Traceback (most recent call last):
  File "multiple_simulations_single_analysis.py", line 83, in <module>
    cluster.allocate()
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 181, in allocate
    self._pilot = pmgr.submit_pilots(pdesc)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 361, in submit_pilots
    resource_config=resource_cfg)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 433, in register_start_pilot_request
    shell = sup.PTYShell(url, self._session)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 244, in __init__
    posix=self.posix)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 196, in initialize
    self._initialize_pty (info['pty'], info)
  File "/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 411, in _initialize_pty
    raise ptye.translate_exception (e)
saga.exceptions.AuthenticationFailed: prompted for unknown password (ibethune@localhost's password: ) (/home/ibethune/extasy-test/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +313 (_initialize_pty)  :  % match))

Comments (9)

  1. Iain Bethune reporter

    Testing again with EnMD 0.4-RC0, but fails with a different error:

    (test2)[ibethune@workflow generic]$ python multiple_simulations_multiple_analysis.py 
    
    ================================================================================
     EnsembleMD (0.4-RC0)                                                           
    ================================================================================
    
    Starting Allocation                                                           ok
    Verifying pattern                                                             ok
    Starting pattern execution                                                    ok
    --------------------------------------------------------------------------------
    Executing simulation-analysis loop with 2 iterations on 1 allocated core(s) on 'local.localhost'
    
    Job waiting on queue...2016-05-13 04:54:46,455: radical.pilot       : MainProcess                     : PilotLauncherWorker-1: ERROR   : Using bootstrapper /home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh
    Copying bootstrapper 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh' to agent sandbox (<saga.filesystem.directory.Directory object at 0x7fc5604da150>).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/utils/radical.utils-0.40.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0000-pilot.0000/).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/saga/saga-python-0.40.2.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0000-pilot.0000/).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/..//radical.pilot-0.40.1.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0000-pilot.0000/).
    Writing agent configuration to file '/tmp/rp_agent_cfg_dirAm4N9f/agent_0.cfg'.
    Copying agent configuration file 'file://localhost/tmp/rp_agent_cfg_dirAm4N9f/agent_0.cfg' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0000-pilot.0000/).
    Pilot launching failed! (failed to run bootstrap: (127)(/bin/sh: /home/ibethune/.saga/adaptors/shell_job/wrapper.sh: No such file or directory
    ) (/home/ibethune/test2/lib/python2.7/site-packages/saga/adaptors/shell/shell_job.py +605 (initialize)  :  raise saga.NoSuccess ("failed to run bootstrap: (%s)(%s)" % (ret, out))))
    Traceback (most recent call last):
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 712, in run
        js = saga.job.Service(js_url, session=self._session)
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py", line 115, in __init__
        url, session, ttype=_ttype)
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/base.py", line 101, in __init__
        self._init_task = self._adaptor.init_instance (adaptor_state, *args, **kwargs)
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 57, in wrap_function
        return sync_function (self, *args, **kwargs)
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/adaptors/shell/shell_job.py", line 507, in init_instance
        self.initialize ()
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/adaptors/shell/shell_job.py", line 605, in initialize
        raise saga.NoSuccess ("failed to run bootstrap: (%s)(%s)" % (ret, out))
    NoSuccess: failed to run bootstrap: (127)(/bin/sh: /home/ibethune/.saga/adaptors/shell_job/wrapper.sh: No such file or directory
    ) (/home/ibethune/test2/lib/python2.7/site-packages/saga/adaptors/shell/shell_job.py +605 (initialize)  :  raise saga.NoSuccess ("failed to run bootstrap: (%s)(%s)" % (ret, out)))
    2016-05-13 04:54:47,301: radical.entk.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Resource error: 
    2016-05-13 04:54:47,301: radical.entk.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Pattern execution FAILED.
    2016-05-13 04:54:47,301: radical.pilot       : MainProcess                     : Thread-1       : ERROR   : sys.exit from callback
    Traceback (most recent call last):
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 258, in call_callbacks
        cb(self._shared_data[pilot_id]['facade_object'](), new_state)
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 168, in pilot_state_cb
        sys.exit(1)
    SystemExit: 1
    2016-05-13 04:54:47,683: radical.entk.SingleClusterEnvironment: MainProcess                     : MainThread     : ERROR   : Fatal error during execution: .
    Fatal error during execution: .
    Starting Deallocation..
    2016-05-13 04:54:47,684: radical.entk.SingleClusterEnvironment: MainProcess                     : MainThread     : ERROR   : Fatal error during execution: .
    Fatal error: .  File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 312, in run
        plugin.execute_pattern(pattern, self)
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 342, in execute_pattern
        resource._pmgr.wait_pilots(resource._pilot.uid,'Active')
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 532, in wait_pilots
        time.sleep(0.5)
                                                                  done 
    
  2. Andre Merzky

    This is actually better, as it does not use ssh, but... :/ Would you mind trying to remove rm -rf $HOME/.saga, and try again, please?

    Thanks, Andre.

  3. Iain Bethune reporter

    After removing .saga - ran again, still failing, but a different error:

    (test2)[ibethune@workflow generic]$ python multiple_simulations_multiple_analysis.py 
    
    ================================================================================
     EnsembleMD (0.4-RC0)                                                           
    ================================================================================
    
    Starting Allocation                                                           ok
    Verifying pattern                                                             ok
    Starting pattern execution                                                    ok
    --------------------------------------------------------------------------------
    Executing simulation-analysis loop with 2 iterations on 1 allocated core(s) on 'local.localhost'
    
    Job waiting on queue...2016-05-13 05:04:44,633: radical.saga        : MainProcess                     : PilotLauncherWorker-1: ERROR   : BadParameter: 'JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job
    2016-05-13 05:04:44,744: radical.pilot       : MainProcess                     : PilotLauncherWorker-1: ERROR   : Using bootstrapper /home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh
    Copying bootstrapper 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/bootstrapper/bootstrap_1.sh' to agent sandbox (<saga.filesystem.directory.Directory object at 0x7f9afc409950>).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/utils/radical.utils-0.40.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/saga/saga-python-0.40.2.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/).
    Copying sdist 'file://localhost/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/..//radical.pilot-0.40.1.tar.gz' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/).
    Writing agent configuration to file '/tmp/rp_agent_cfg_dirjhy94L/agent_0.cfg'.
    Copying agent configuration file 'file://localhost/tmp/rp_agent_cfg_dirjhy94L/agent_0.cfg' to sandbox (file://localhost/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/).
    Submitting SAGA job with description: {'Project': 'None', 'Executable': '/bin/bash', 'TotalPhysicalMemory': None, 'WorkingDirectory': '/home/ibethune/radical.pilot.sandbox/rp.session.workflow.iu.xsede.org.ibethune.016934.0003-pilot.0000/', 'Queue': None, 'Environment': {}, 'WallTimeLimit': 15, 'Arguments': ['-l bootstrap_1.sh', " -d 'radical.utils-0.40.tar.gz:saga-python-0.40.2.tar.gz:radical.pilot-0.40.1.tar.gz' -m 'create' -p 'pilot.0000' -r 'debug' -s 'rp.session.workflow.iu.xsede.org.ibethune.016934.0003' -v '/home/ibethune/radical.pilot.sandbox/ve_localhost' -b 'default' -a 'multicore'"], 'ProcessesPerHost': None, 'Error': 'bootstrap_1.err', 'Output': 'bootstrap_1.out', 'CandidateHosts': None, 'TotalCPUCount': 1}
    Pilot launching failed! ('JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job (/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py +300 (create_job)  :  raise se.BadParameter._log (self._logger, msg)))
    Traceback (most recent call last):
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_launcher_worker.py", line 781, in run
        pilotjob = js.create_job(jd)
      File "/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py", line 300, in create_job
        raise se.BadParameter._log (self._logger, msg)
    BadParameter: 'JobDescription.Project' (none) is not supported by adaptor saga.adaptor.shell_job (/home/ibethune/test2/lib/python2.7/site-packages/saga/job/service.py +300 (create_job)  :  raise se.BadParameter._log (self._logger, msg))
    2016-05-13 05:04:44,756: radical.entk.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Resource error: 
    2016-05-13 05:04:44,756: radical.entk.SingleClusterEnvironment: MainProcess                     : Thread-1       : ERROR   : Pattern execution FAILED.
    2016-05-13 05:04:44,756: radical.pilot       : MainProcess                     : Thread-1       : ERROR   : sys.exit from callback
    Traceback (most recent call last):
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/controller/pilot_manager_controller.py", line 258, in call_callbacks
        cb(self._shared_data[pilot_id]['facade_object'](), new_state)
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 168, in pilot_state_cb
        sys.exit(1)
    SystemExit: 1
    2016-05-13 05:04:44,795: radical.entk.SingleClusterEnvironment: MainProcess                     : MainThread     : ERROR   : Fatal error during execution: .
    Fatal error during execution: .
    Starting Deallocation..
    2016-05-13 05:04:44,795: radical.entk.SingleClusterEnvironment: MainProcess                     : MainThread     : ERROR   : Fatal error during execution: .
    Fatal error: .  File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/single_cluster_environment.py", line 312, in run
        plugin.execute_pattern(pattern, self)
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/ensemblemd/exec_plugins/simulation_analysis_loop/static.py", line 342, in execute_pattern
        resource._pmgr.wait_pilots(resource._pilot.uid,'Active')
      File "/home/ibethune/test2/lib/python2.7/site-packages/radical/pilot/pilot_manager.py", line 532, in wait_pilots
        time.sleep(0.5)
                                                                  done 
    

    FWIW, it's working to ARCHER.

  4. Andre Merzky

    It seems like the 'project' is set to the literal string 'none' instead of the Python defined 'None'. Could that be a typo in a config file?

  5. Iain Bethune reporter

    Correct, this is a bug in the config.json file. Instead of "None" it should be "" or null, in which case it works fine.

  6. Log in to comment