- changed title to Could not start task with gpu resource.
Could not start task with gpu resource.
Hi. Looks like after last update, scheduler unable to start task with gpu resource. It shows "Invalid task" on web, and scheduler logs like this:
2018-02-22 15:59:41,230 DEBUG [godocker-scheduler][Thread-1] OFFER RECEIVED: [<mesoshttp.offers.Offer object at 0x7ff3d3c93860>]
2018-02-22 15:59:41,231 DEBUG [godocker-scheduler][Thread-1] Mesos:Offers:Begin
2018-02-22 15:59:41,233 DEBUG [godocker-scheduler][Thread-1] {'hostname': 'host2.surc.local', 'url': {'path': '/slave(1)', 'scheme': 'http', 'address': {'hostname': 'host2.surc.local', 'ip': '106.125.32.162', 'port': 5051}}, 'attributes': [{'name': 'gputype', 'type': 'TEXT', 'text': {'value': 'titanz'}}, {'name': 'rack', 'type': 'TEXT', 'text': {'value': 'gpu'}}], 'allocation_info': {'role': 'god'}, 'framework_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-0017'}, 'resources': [{'name': 'gpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2.0}}, {'name': 'ports', 'ranges': {'range': [{'begin': 1025, 'end': 2180}, {'begin': 2182, 'end': 3887}, {'begin': 3889, 'end': 5049}, {'begin': 5052, 'end': 8079}, {'begin': 8082, 'end': 8180}, {'begin': 8182, 'end': 34000}]}, 'role': '*', 'type': 'RANGES', 'allocation_info': {'role': 'god'}}, {'name': 'cpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8.0}}, {'name': 'mem', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 14972.0}}, {'name': 'disk', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 464202.0}}], 'id': {'value': '26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23131'}, 'agent_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-S112'}}
2018-02-22 15:59:41,234 DEBUG [godocker-scheduler][Thread-1] Mesos:Labels:{'rack': 'gpu', 'gputype': 'titanz'}
2018-02-22 15:59:41,234 DEBUG [godocker-scheduler][Thread-1] Mesos:GetSlaveIdFromMasterInfo:host2.surc.local
2018-02-22 15:59:41,235 DEBUG [godocker-scheduler][Thread-1] Mesos:Received offer 26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23131 with cpus: 8.0 and mem: 14972.0
2018-02-22 15:59:41,235 DEBUG [godocker-scheduler][Thread-1] Try to place task 226
2018-02-22 15:59:41,236 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 15:59:41,237 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 15:59:41,237 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 15:59:41,238 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 15:59:41,238 ERROR [godocker-scheduler][Thread-1] Error with task 226: 'gpu'
2018-02-22 15:59:41,256 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:226
2018-02-22 16:14:11,285 DEBUG [godocker-scheduler][Thread-1] {'hostname': 'host5.surc.local', 'url': {'path': '/slave(1)', 'scheme': 'http', 'address': {'hostname': 'host5.surc.local', 'ip': '106.125.32.165', 'port': 5051}}, 'attributes': [{'name': 'gputype', 'type': 'TEXT', 'text': {'value': 'titanz'}}, {'name': 'rack', 'type': 'TEXT', 'text': {'value': 'gpu'}}], 'allocation_info': {'role': 'god'}, 'framework_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-0017'}, 'resources': [{'name': 'gpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2.0}}, {'name': 'ports', 'ranges': {'range': [{'begin': 1025, 'end': 2180}, {'begin': 2182, 'end': 3887}, {'begin': 3889, 'end': 5049}, {'begin': 5052, 'end': 8079}, {'begin': 8082, 'end': 8180}, {'begin': 8182, 'end': 34000}]}, 'role': '*', 'type': 'RANGES', 'allocation_info': {'role': 'god'}}, {'name': 'cpus', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8.0}}, {'name': 'mem', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 30027.0}}, {'name': 'disk', 'role': '*', 'type': 'SCALAR', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 464202.0}}], 'id': {'value': '26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23233'}, 'agent_id': {'value': '1d2d8f83-e7ff-40ff-bf38-d21248192ca6-S101'}}
2018-02-22 16:14:11,285 DEBUG [godocker-scheduler][Thread-1] Mesos:Labels:{'rack': 'gpu', 'gputype': 'titanz'}
2018-02-22 16:14:11,286 DEBUG [godocker-scheduler][Thread-1] Mesos:GetSlaveIdFromMasterInfo:host5.surc.local
2018-02-22 16:14:11,286 DEBUG [godocker-scheduler][Thread-1] Mesos:Received offer 26fd306b-ffd4-407f-a424-a9b78e1ca54c-O23233 with cpus: 8.0 and mem: 30027.0
2018-02-22 16:14:11,287 DEBUG [godocker-scheduler][Thread-1] Try to place task 226
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 16:14:11,288 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,289 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 16:14:11,289 ERROR [godocker-scheduler][Thread-1] Error with task 226: 'gpu'
2018-02-22 16:14:11,290 DEBUG [godocker-scheduler][Thread-1] Try to place task 227
2018-02-22 16:14:11,291 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,291 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 4}}
2018-02-22 16:14:11,292 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,292 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 8000}}
2018-02-22 16:14:11,293 ERROR [godocker-scheduler][Thread-1] Error with task 227: 'gpu'
2018-02-22 16:14:11,294 DEBUG [godocker-scheduler][Thread-1] Try to place task 229
2018-02-22 16:14:11,295 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,295 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'cpus', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 0.0}}
2018-02-22 16:14:11,296 DEBUG [godocker-scheduler][Thread-1] Get resource for roles ['god', '*']
2018-02-22 16:14:11,296 DEBUG [godocker-scheduler][Thread-1] Reserve resource: {'name': 'mem', 'type': 'SCALAR', 'role': '*', 'allocation_info': {'role': 'god'}, 'scalar': {'value': 2000}}
2018-02-22 16:14:11,297 ERROR [godocker-scheduler][Thread-1] Error with task 229: 'gpu'
2018-02-22 16:14:11,304 DEBUG [godocker-scheduler][MainThread] Mesos:WaitForOffer:Wait
2018-02-22 16:14:11,321 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:226
2018-02-22 16:14:11,325 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:227
2018-02-22 16:14:11,329 DEBUG [godocker-scheduler][Thread-1] Mesos:Task:Rejected:229
Comments (6)
-
reporter -
repo owner Looks related to modifications related to role based resource. It gets an offer without gpu. I think cofe tries to get gpu info from offer but do not find it. Will add a check
-
repo owner Found the issue, a typo when added some extra controls on role based resources. Will patch this evening an rebuild docker dev. I do not have gpus available (need to reserve a server for this), so could not test it...
-
reporter Thanks you very much for quick issue resolution!
-
repo owner - changed status to closed
fix typo in resource allocation for gpus, closes
#66→ <<cset 38e64969c5a3>>
-
repo owner Fix has been pushed, docker is building Can you confirm after that , that fix is efficient? i am sure of the error/fix, but can't test it.
- Log in to comment