task is always pending

Issue #46 resolved
Stone sky created an issue

I run the web interface and databases(mongo\redis) in a docker container. The scheduler watcher and mesos are on my host machine. But the task I submitted is pending all the time. When I check the log of scheduler, It acts like no task detected.

And when I use mesos-execute to submit a pure script, it works.

Comments (11)

  1. Stone sky reporter

    the logs of mesos-master: Mar 08 14:42:55 tsinghua mesos-master[14809]: I0308 14:42:55.320962 14812 master.cpp:6517] Sending 1 offers to framework 6015b Mar 08 14:42:55 tsinghua mesos-master[14809]: I0308 14:42:55.326494 14813 master.cpp:4505] Processing DECLINE call for offers: Mar 08 14:43:01 tsinghua mesos-master[14809]: I0308 14:43:01.330230 14817 master.cpp:6517] Sending 1 offers to framework 6015b Mar 08 14:43:01 tsinghua mesos-master[14809]: I0308 14:43:01.335831 14814 master.cpp:4505] Processing DECLINE call for offers: Mar 08 14:43:02 tsinghua mesos-master[14809]: I0308 14:43:02.045397 14815 http.cpp:391] HTTP GET for /master/state from 127.0. Mar 08 14:43:07 tsinghua mesos-master[14809]: I0308 14:43:07.338490 14812 master.cpp:6517] Sending 1 offers to framework 6015b Mar 08 14:43:07 tsinghua mesos-master[14809]: I0308 14:43:07.344559 14813 master.cpp:4505] Processing DECLINE call for offers: Mar 08 14:43:12 tsinghua mesos-master[14809]: I0308 14:43:12.056957 14816 http.cpp:391] HTTP GET for /master/state from 202.12 Mar 08 14:43:13 tsinghua mesos-master[14809]: I0308 14:43:13.348469 14814 master.cpp:6517] Sending 1 offers to framework 6015b Mar 08 14:43:13 tsinghua mesos-master[14809]: I0308 14:43:13.354161 14817 master.cpp:4505] Processing DECLINE call for offers: ~

  2. Olivier Sallou repo owner

    Does your container with web interface mount the godshared volume defined in go-d.ini ? Web interface, at job creation, created some files in the job directory that will be used by scheduler afterwards. According to logs, it seems that indeed offers are declined, but it can be for several reasons (config, etc..)

    Please set all logs level to DEBUG in go-d.ini logs section and restart scheduler. Execute a job, wait for a few minutes and please share the scheduler logs,and give me the job id. No need of watcher logs for the moment.

    did you follow mesos setup instructions as per doc at https://godocker.atlassian.net/wiki/display/GOD/GODOCKER ?

  3. Stone sky reporter

    Actually, I used fake authenticating. And the submitting didn't change the /opt/godshared directory. That is, I saw no file changes under this directory.

  4. Stone sky reporter

    My docker run command:

    docker run \
      --rm \
      --name god-web \
      --link god-mongo:god-mongo  \
      --link god-redis:god-redis  \
      -v /opt/godshared:/opt/godshared \
      -v /opt/go-docker/plugins:/opt/go-docker/plugins \
      -v $HOME/docker/dockerfile/go-d.ini:/opt/go-docker/go-d.ini \
      -v $HOME/docker/dockerfile/production.ini:/opt/go-docker-web/production.ini \
      -p 6543:6543 \
      -e "PYRAMID_ENV=prod" \
      -d \
      osallou/go-docker \
      gunicorn -c /opt/go-docker-web/gunicorn_conf.py -p godweb.pid --log-config=/opt/go-docker-web/production.ini --paste /opt/go-docker-web/production.ini
    
  5. Olivier Sallou repo owner

    ok, what is in scheduler.log (after setting log levels to DEBUG) ? web UI "creates" the job, but it is submitted by the scheduler. So if job remains in pending, then it is a scheduler issue (config, link with mesos, ...).

    By the way, scheduler and watcher processes must run as root.

    Using fakeauth may create issues afterwards as user info (uid, gid, ...) will be taken in your case from the docker container. User must exists on system where web (your container) and scheduler/watcher are running. Fakeauth simply fakes the password checks and user management. Login user must not be "root".

  6. Stone sky reporter

    This error is caused by permission inconsistency with docker and host. The web interface in a docker container created the command shell scripts with root user and the permission mod is group executable.

    ERROR:root:Failed to create cmd: [Errno 13] Permission denied: '/opt/godshared/tasks/pairtree_root/10/task/cmd.sh'
    Traceback (most recent call last):
      File "/home/slhome/kys10/Workspace/go-docker/godocker/godscheduler.py", line 745, in run_tasks
        self._create_command(task)
      File "/home/slhome/kys10/Workspace/go-docker/godocker/godscheduler.py", line 477, in _create_command
        task_cmd)
      File "/home/slhome/kys10/Workspace/go-docker/godocker/pairtreeStorage.py", line 169, in add_file
        task_obj.add_bytestream(name, content, path=subpath)
      File "/home/slhome/kys10/Workspace/go-docker/godocker/pairtreeStorage.py", line 21, in add_bytestream
        objfile = open(os.path.join(dir_path, name), "w")
    IOError: [Errno 13] Permission denied: '/opt/godshared/tasks/pairtree_root/10/task/cmd.sh'
    
  7. Stone sky reporter

    Solved! I love this framework and I'm going to setup a mesos cluster on the cluster in my lab. Could you give some more information/docs about GPU resources scheduler?

  8. Olivier Sallou repo owner

    well, for gpu it is quite experimental as I could only make a few tests. The idea is to declare on mesos slaves the available GPUs are resources and in go-docker, when submitting a job you can ask for 1 (or more) GPU. Then job will be scheduled where gpus are available and selected ones will be marked as "occupied" (so will not be offered again while job is running) and gpus are mounted in container. You cannot however ask for half a GPU for example.

    See Resource reservation in README.md

  9. Log in to comment