Could not start task with gpu resource.

Issue #66 closed
IT Expert created an issue

Hi. Looks like after last update, scheduler unable to start task with gpu resource. It shows "Invalid task" on web, and scheduler logs like this:

2018-02-22 15:59:41,230 DEBUG [godocker-scheduler][Thread-1] OFFER RECEIVED: [<mesoshttp.offers.Offer object at 0x7ff3d3c93860>]
2018-02-22 15:59:41,231 DEBUG [godocker-scheduler][Thread-1] Mesos:Offers:Begin
2018-02-22 15:59:41,233 DEBUG [godocker-scheduler][Thread-1] {'hostname': 'host2.surc.local', 'url': {'path': '/slave(1)', 'scheme': 'http', 'address': {'hostname': 'host2.surc.local', 'ip': '106.125.32.162', 'port': 5051}}, 'attributes': [{'name': 'gputype', 'type': 'TEXT', 'text': {'value': 'titanz'}}, {'name': 'rack', 'type': 'TEXT', 'text': {'value': 'gpu'}}], 'allocation_info': {'role': 'god'}, 'framework_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-0017'}, 'resources': [{'name': 'gpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2.0}}, {'name': 'ports', 'ranges': {'range': [{'begin': 1025, 'end': 2180}, {'begin': 2182, 'end': 3887}, {'begin': 3889, 'end': 5049}, {'begin': 5052, 'end': 8079}, {'begin': 8082, 'end': 8180}, {'begin': 8182, 'end': 34000}]}, 'role': '*', 'type': 'RANGES', 'allocation_info': {'role': 'god'}}, {'name': 'cpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8.0}}, {'name': 'mem', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 14972.0}}, {'name': 'disk', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 464202.0}}], 'id': {'value': '26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23131'}, 'agent_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-S112'}}
2018-02-22 15:59:41,234 DEBUG [godocker-scheduler][Thread-1] Mesos:Labels:{'rack': 'gpu', 'gputype': 'titanz'}
2018-02-22 15:59:41,234 DEBUG [godocker-scheduler][Thread-1] Mesos:GetSlaveIdFromMasterInfo:host2.surc.local
2018-02-22 15:59:41,235 DEBUG [godocker-scheduler][Thread-1] Mesos:Received offer 26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23131 with cpus: 8.0 and mem: 14972.0
2018-02-22 15:59:41,235 DEBUG [godocker-scheduler][Thread-1] Try to place task 226
2018-02-22 15:59:41,236 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 15:59:41,237 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 15:59:41,237 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 15:59:41,238 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 15:59:41,238 ERROR [godocker-scheduler][Thread-1] Error with task 226: 'gpu'
2018-02-22 15:59:41,256 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:226


2018-02-22 16:14:11,285 DEBUG [godocker-scheduler][Thread-1] {'hostname': 'host5.surc.local', 'url': {'path': '/slave(1)', 'scheme': 'http', 'address': {'hostname': 'host5.surc.local', 'ip': '106.125.32.165', 'port': 5051}}, 'attributes': [{'name': 'gputype', 'type': 'TEXT', 'text': {'value': 'titanz'}}, {'name': 'rack', 'type': 'TEXT', 'text': {'value': 'gpu'}}], 'allocation_info': {'role': 'god'}, 'framework_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-0017'}, 'resources': [{'name': 'gpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2.0}}, {'name': 'ports', 'ranges': {'range': [{'begin': 1025, 'end': 2180}, {'begin': 2182, 'end': 3887}, {'begin': 3889, 'end': 5049}, {'begin': 5052, 'end': 8079}, {'begin': 8082, 'end': 8180}, {'begin': 8182, 'end': 34000}]}, 'role': '*', 'type': 'RANGES', 'allocation_info': {'role': 'god'}}, {'name': 'cpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8.0}}, {'name': 'mem', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 30027.0}}, {'name': 'disk', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 464202.0}}], 'id': {'value': '26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23233'}, 'agent_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-S101'}}
2018-02-22 16:14:11,285 DEBUG [godocker-scheduler][Thread-1] Mesos:Labels:{'rack': 'gpu', 'gputype': 'titanz'}
2018-02-22 16:14:11,286 DEBUG [godocker-scheduler][Thread-1] Mesos:GetSlaveIdFromMasterInfo:host5.surc.local
2018-02-22 16:14:11,286 DEBUG [godocker-scheduler][Thread-1] Mesos:Received offer 26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23233 with cpus: 8.0 and mem: 30027.0
2018-02-22 16:14:11,287 DEBUG [godocker-scheduler][Thread-1] Try to place task 226
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,289 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 16:14:11,289 ERROR [godocker-scheduler][Thread-1] Error with task 226: 'gpu'
2018-02-22 16:14:11,290 DEBUG [godocker-scheduler][Thread-1] Try to place task 227
2018-02-22 16:14:11,291 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,291 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 16:14:11,292 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,292 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 16:14:11,293 ERROR [godocker-scheduler][Thread-1] Error with task 227: 'gpu'
2018-02-22 16:14:11,294 DEBUG [godocker-scheduler][Thread-1] Try to place task 229
2018-02-22 16:14:11,295 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,295 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 0.0}}
2018-02-22 16:14:11,296 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,296 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2000}}
2018-02-22 16:14:11,297 ERROR [godocker-scheduler][Thread-1] Error with task 229: 'gpu'
2018-02-22 16:14:11,304 DEBUG [godocker-scheduler][MainThread] Mesos:WaitForOffer:Wait
2018-02-22 16:14:11,321 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:226
2018-02-22 16:14:11,325 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:227
2018-02-22 16:14:11,329 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:229

Comments (6)

  1. Olivier Sallou repo owner

    Looks related to modifications related to role based resource. It gets an offer without gpu. I think cofe tries to get gpu info from offer but do not find it. Will add a check

  2. Olivier Sallou repo owner

    Found the issue, a typo when added some extra controls on role based resources. Will patch this evening an rebuild docker dev. I do not have gpus available (need to reserve a server for this), so could not test it...

  3. Olivier Sallou repo owner

    Fix has been pushed, docker is building Can you confirm after that , that fix is efficient? i am sure of the error/fix, but can't test it.

  4. Log in to comment