Handling mesos maintenance mode
Hi, Oliver. Have problem with mesos maintenance mode. Put few computing nodes in maintenance mode. (http://mesos.apache.org/documentation/latest/maintenance/) I have checked this nodes on mesos interface maintenance tab, and it Schedule for Maintenance. It's ok.
From mesos docs, this should makes resources on a machine unavailable. However jobs ended up on this nodes as usual.
Could you, please check, Is it mesos bug, or godocker doesn't handle it ? Thank you in advance.
Comments (8)
-
repo owner -
reporter Moreover it should send inverse offers. Asking just to make sure GoDocker handle it properly.
-
repo owner Godocker does not handle inverse orders. I will check mesos doc fir maintenance mode.
-
repo owner For jobs already scheduled, node in maintenance may kill runnings jobs. Godocker will keep them as failed and won't rechedule them
-
reporter As far, as I understood from docs, there are 3 modes:
- Scheduled window. ( /maintenance/schedule API endpoint)
In this mode mesos send inverse offers and doesn't take new jobs. But old jobs should work.
- Node down mode. ( /machine/down API endpoint)
This mode kill all tasks (send TASK_LOST message to it + doesn't take new jobs )
- Node up ( /machine/up API endpoint)
normal operation.
-
repo owner On scheduled maintenance it will stop sending offers when date is reached for defined duration. Before this date it will send inverse offers but godocker will ignire them as job may complete before.
-
reporter Looks like it working now.
2018-06-27 06:07:22,347 DEBUG [godocker-scheduler][Thread-1] OFFER RECEIVED: [<mesoshttp.offers.Offer object at 0x7f9c976b79b0>, <mesoshttp.offers.Offer object at 0x7f9c976b7438>] 2018-06-27 06:07:22,348 DEBUG [godocker-scheduler][Thread-1] Mesos:Offers:Begin 2018-06-27 06:07:22,350 DEBUG [godocker-scheduler][Thread-1] {'id': {'value': '97de61d7-475a-4755-9b4f-33b80046f622-O2478995'}, 'resources': [{'scalar': {'value': 1910366.0}, 'role': '*', 'name': 'disk', 'type': 'SCALAR'}, {'role': '*', 'ranges': {'range': [{'end': 2180, 'begin': 1025}, {'end': 3887, 'begin': 2182}, {'end': 5049, 'begin': 3889}, {'end': 8079, 'begin': 5052}, {'end': 8180, 'begin': 8082}, {'end': 34000, 'begin': 8182}]}, 'name': 'ports', 'type': 'RANGES'}, {'scalar': {'value': 3.0}, 'role': '*', 'name': 'cpus', 'type': 'SCALAR'}, {'scalar': {'value': 8671.0}, 'role': '*', 'name': 'mem', 'type': 'SCALAR'}], 'unavailability': {'start': {'nanoseconds': 1530079515000000000}, 'duration': {'nanoseconds': 518400000000000}}, 'attributes': [{'text': {'value': 'GTX1080'}, 'name': 'gputype', 'type': 'TEXT'}, {'text': {'value': 'gpu'}, 'name': 'rack', 'type': 'TEXT'}], 'framework_id': {'value': '6b6a2a5a-47aa-4773-9047-2a53d4e6600c-0002'}, 'url': {'scheme': 'http', 'address': {'ip': '10.0.0.10', 'hostname': 'node15.local', 'port': 5051}, 'path': '/slave(1)'}, 'agent_id': {'value': '6b6a2a5a-47aa-4773-9047-2a53d4e6600c-S24'}, 'hostname': 'node15.local'} 2018-06-27 06:07:22,350 DEBUG [godocker-scheduler][Thread-1] **Node node15.local in planned maintenance, skipping...**
also mesos send inverse offers
2018-06-27 06:04:15,145 ERROR [godocker-scheduler][Thread-1] A rescind event have been received for offer: {'offer_id': {'value': '97de61d7-475a-4755-9b4f-33b80046f622-O2478568'}} WARNING:mesoshttp.client:INVERSE_OFFERS event no yet implemented WARNING:mesoshttp.client:INVERSE_OFFERS event no yet implemented WARNING:mesoshttp.client:INVERSE_OFFERS event no yet implemented WARNING:mesoshttp.client:INVERSE_OFFERS event no yet implemented
I'll leave here correct request for mesos maintenance mode (just for google)
#144 hours maintenance window curl -X POST master.mesos:5050/maintenance/schedule --data '{"windows": [{"machine_ids":[{"hostname": "node15.local", "ip":"10.0.0.10"}], "unavailability": {"start": {"nanoseconds": '$(($(date +%s) + 60))'000000000}, "duration": {"nanoseconds": '$((144 * 3600000000000))'}}}]}'
hostname and ip both are required
Thank you for investigation issue!
-
repo owner - changed status to resolved
- Log in to comment
Godocker schedules jobs on mesos offers. If mesos continues to send offer for those nodes, then godocker use it. So in your case looks like offers are still sent.