Wrong Container ID

Issue #74 closed
IT Expert created an issue

Hi, Olivier.

Have found situation, that jobs that was started one by one and ended up on the same computing node have the same container id. So we can't match it properly, to get job metrics from cadvisor.

Comments (12)

  1. IT Expert reporter
     db.jobs.find({"container.id":"dc379355-0d68-49c2-9736-8dee4984ed2a"}).pretty()
    {
            "_id" : ObjectId("5b166987a422210009ca80ca"),
            "id" : 4354,
            "notify" : {
                    "email" : false
            },
            "container" : {
                    "network" : true,
                    "id" : "dc379355-0d68-49c2-9736-8dee4984ed2a",
                    "status" : "ready",
                    "volumes" : [
                            {
                                    "name" : "data1",
                                    "mount" : "/data",
                                    "acl" : "ro",
                                    "path" : "хххххххххххххххххх"
                            },
                            {
                                    "name" : "go-docker",
                                    "mount" : "/mnt/go-docker",
                                    "acl" : "rw",
                                    "path" : "/data/godshared/tasks/pairtree_root/43/54/task"
                            }
                    ],
                    "ports" : [ ],
                    "meta" : {
                            "offer" : "6b6a2a5a-47aa-4773-9047-2a53d4e6600c-O2362466",
                            "mesos-id" : "4354-0",
                            "Node" : {
                                    "slave" : "6b6a2a5a-47aa-4773-9047-2a53d4e6600c-S10",
                                    "Name" : "node7"
                            }
                    },
                    "image" : "хххххххххххххххххх",
                    "image_url" : "хххххххххххххххххх",
                    "stats" : null,
                    "root" : true,
                    "port_mapping" : [ ],
                    "ip_address" : "node7"
            },
            "status" : {
                    "date_over" : null,
                    "secondary" : null,
                    "reason" : "",
                    "primary" : "running",
                    "exitcode" : null,
                    "date_running" : 1528195466
    
    [skipped]               
    
    
            "_id" : ObjectId("5b166c59a422210009ca80d9"),
            "id" : 4380,
            "notify" : {
                    "email" : false
            },
            "container" : {
                    "network" : true,
                    "id" : "dc379355-0d68-49c2-9736-8dee4984ed2a",
                    "status" : "ready",
                    "volumes" : [
                            {
                                    "name" : "data2",
                                    "mount" : "/_bin",
                                    "acl" : "ro",
                                    "path" : "хххххххххххххххххх"
                            },
                            {
                                    "name" : "Sadsf",
                                    "mount" : "/_bin2",
                                    "acl" : "ro",
                                    "path" : "хххххххххххххххххх"
                            },
                            {
                                    "name" : "go-docker",
                                    "mount" : "/mnt/go-docker",
                                    "acl" : "rw",
                                    "path" : "/data/godshared/tasks/pairtree_root/43/80/task"
                            }
                    ],
                    "ports" : [ ],
                    "meta" : {
                            "offer" : "6b6a2a5a-47aa-4773-9047-2a53d4e6600c-O2364770",
                            "mesos-id" : "4380-0",
                            "Node" : {
                                    "slave" : "6b6a2a5a-47aa-4773-9047-2a53d4e6600c-S10",
                                    "Name" : "node7"
                            }
                    },
                    "image" : "хххххххххххххххххх",
                    "image_url" : "хххххххххххххххххх",
                    "stats" : null,
                    "root" : true,
                    "port_mapping" : [ ],
                    "ip_address" : "node7"
            },
            "status" : {
                    "date_over" : null,
                    "secondary" : null,
                    "reason" : "",
                    "primary" : "running",
                    "exitcode" : null,
                    "date_running" : 1528196203
            },
    
  2. Olivier Sallou repo owner

    Will have a look but i don't see how container id can be the same. You use docker containerizer or unified containerizer?

  3. IT Expert reporter

    Did you looked at cadvisor web ui directly to see running containers?

    yes

    It's 5 jobs running on the same node

    container.id:"dc379355-0d68-49c2-9736-8dee4984ed2a" correct for godocker job id "id" : 4354,

    but doesn't correct for godocker job id "id" : 4380

    and godocker job id "id" : 4380 have correct container id 22527b04-5623-424e-b775-4017125760db on mesos (got it from mesos UI) but this one even not found in mongodb

    db.jobs.find({"container.id":"22527b04-5623-424e-b775-4017125760db"}).pretty()

  4. Olivier Sallou repo owner

    are you "replaying" jobs ? or each job is created independently ?

    ifnot yet defined, container id is fetched from mesos slave (godocker ask slave the container id for the mesos task). In case of job replay, I just wonder if older id could have been kept (which would be a bug)

  5. Olivier Sallou repo owner

    Hi, found the issue, a strange API behavior with mesos which returns all containers on slave even if querying a specific executor. Anyway, I pushed to develop the fix. Already running jobs won't be fixed (will keep invalid container id), this will apply only to new jobs.

  6. Log in to comment