vidgrind: vidgrind processes terminated when broadcast check times out.

Sometimes a stream doesn’t come on automatically using the repeat functionality. When looking in the logs, we observe the vidgrind processes being terminated and restarting. For example for the stony point test stream, which we’re currently testing on unlisted, the repeat schedule time was 7:50am, start time was 8:00am, when looking in the logs, we can see: (see attached file for full logs)

ERROR 2023-06-22T22:30:00.431695Z [protoPayload.method: GET] [protoPayload.status: 500] [protoPayload.latency: 611.579 s] [protoPayload.userAgent: AppEngine-Google; (+http://code.google.com/appengine)] /broadcastcheck
  0.1.0.2 - - [22/Jun/2023:15:30:00 -0700] GET /broadcastcheck HTTP/1.1 500 - - "AppEngine-Google; (+http://code.google.com/appengine)" "vidgrind.ts.r.appspot.com" ms=611579 cpu_ms=51 cpm_usd=0 loading_request=0 instance=00c61b117c1c6865453608c293e076925aeaaf282081e628039f29bde1f1736046d057c0572e0eadb02f4bd89b49cb18d96cc1791d68d5cb8905ee app_engine_release=1.9.71 trace_id=ca7c64054b490dc3a465f04d79f37353
  {
    "protoPayload": {
      "@type": "type.googleapis.com/google.appengine.logging.v1.RequestLog",
      "appId": "f~vidgrind",
      "versionId": "7",
      "requestId": "6494cb6800ff0695aeb53761ef0001667e7669646772696e64000137000100",
      "ip": "0.1.0.2",
      "startTime": "2023-06-22T22:30:00.431695Z",
      "endTime": "2023-06-22T22:40:12.010807Z",
      "latency": "611.579112s",
      "megaCycles": "51",
      "method": "GET",
      "resource": "/broadcastcheck",
      "httpVersion": "HTTP/1.1",
      "status": 500,
      "userAgent": "AppEngine-Google; (+http://code.google.com/appengine)",
      "urlMapEntry": "auto",
      "host": "vidgrind.ts.r.appspot.com",
      "taskQueueName": "__cron",
      "taskName": "59935039591201681161",
      "instanceIndex": -1,
      "finished": true,
      "instanceId": "00c61b117c1c6865453608c293e076925aeaaf282081e628039f29bde1f1736046d057c0572e0eadb02f4bd89b49cb18d96cc1791d68d5cb8905ee",
      "line": [
        {
          "time": "2023-06-22T22:40:12.010685Z",
          "severity": "ERROR",
          "logMessage": "Process terminated because the request deadline was exceeded. Please ensure that your HTTP server is listening for requests on 0.0.0.0 and on the port defined by the PORT environment variable. (Error code 123)"
        }
      ],
      "appEngineRelease": "1.9.71",
      "traceId": "ca7c64054b490dc3a465f04d79f37353",
      "first": true,
      "spanId": "13678098746368304055"
    },
    "insertId": "6494cdcc00002ad675c46cb5",
    "httpRequest": {
      "status": 500
    },
    "resource": {
      "type": "gae_app",
      "labels": {
        "project_id": "vidgrind",
        "module_id": "default",
        "zone": "australia-southeast1-3",
        "version_id": "7"
      }
    },
    "timestamp": "2023-06-22T22:30:00.431695Z",
    "severity": "ERROR",
    "labels": {
      "clone_id": "00c61b117c1c6865453608c293e076925aeaaf282081e628039f29bde1f1736046d057c0572e0eadb02f4bd89b49cb18d96cc1791d68d5cb8905ee"
    },
    "logName": "projects/vidgrind/logs/appengine.googleapis.com%2Frequest_log",
    "operation": {
      "id": "6494cb6800ff0695aeb53761ef0001667e7669646772696e64000137000100",
      "producer": "appengine.googleapis.com/request_id",
      "first": true,
      "last": true
    },
    "trace": "projects/vidgrind/traces/ca7c64054b490dc3a465f04d79f37353",
    "receiveTimestamp": "2023-06-22T22:40:12.015456953Z",
    "spanId": "13678098746368304055"
  }
> ERROR 2023-06-22T22:40:12.010685Z Process terminated because the request deadline was exceeded. Please ensure that your HTTP server is listening for requests on 0.0.0.0 and on the port defined by the PORT environment variable. (Error code 123)

then at 8:10 we see:

INFO 2023-06-22T22:40:12.011715Z [pid1] Received terminated signal [session:PDNRCGG]
INFO 2023-06-22T22:40:12.012038Z [pid1] Sending terminated signal to processes [session:PDNRCGG]
INFO 2023-06-22T22:40:12.012049Z [pid1] Waiting 2s to allow shutdown processing [session:PDNRCGG]
INFO 2023-06-22T22:40:12.012057Z [pid1-nginx] Propagating terminated signal to nginx (pid 21) [session:PDNRCGG]
INFO 2023-06-22T22:40:12.012064Z [pid1-app] Propagating terminated signal to app (pid 11) [session:PDNRCGG]
WARNING 2023-06-22T22:40:12.015193Z [pid1-app] Exit -1 app (pid 11): /layers/google.go.build/bin/main [session:PDNRCGG]
INFO 2023-06-22T22:40:12.019840Z [pid1-nginx] Exit 0 nginx (pid 21): /usr/sbin/nginx -c /tmp/nginxconf-2303604805/nginx.conf [session:PDNRCGG]
INFO 2023-06-22T22:40:14.013Z [pid1] Stopping all processes [session:PDNRCGG]
INFO 2023-06-22T22:40:14.013340Z [pid1] Done [session:PDNRCGG]
INFO 2023-06-22T22:40:14.013361Z [pid1] shutdown [session:PDNRCGG]

The hypothesis is that the broadcast check handler is timing out and killing the app engine process.

This may be happening because the camera is not outputting data which may be a separate issue. When this happens though we should handle it in such a way that the process doesn’t get terminated.

This requires a bit more investigation before a solution can be proposed. It seems like we need to reduce the time our broadcast checking processes are running.

Comments (0)