search: searching for large audio data causes VidGrind to crash

Issue #274 resolved
Trek Hopton created an issue

Sometimes when a user is searching for audio data, if the request is for over 3 - 5 minutes of audio, it will crash vidgrind. This is the error:

ERROR 2023-08-04T04:08:54.039569Z [protoPayload.method: POST] [protoPayload.status: 500] [protoPayload.latency: 20.989 s] [protoPayload.userAgent: Chrome 114.0.0.0] /search
  206.83.113.112 - - [03/Aug/2023:21:08:54 -0700] POST /search HTTP/1.1 500 - https://vidgrind.ausocean.org/search "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "vidgrind.ausocean.org" ms=20989 cpu_ms=0 cpm_usd=0 loading_request=0 instance=- app_engine_release=1.9.71 trace_id=77c5ba8367a37d9bb1cd56a2d7bc04cd
  {
    "protoPayload": {
      "@type": "type.googleapis.com/google.appengine.logging.v1.RequestLog",
      "appId": "f~vidgrind",
      "versionId": "7",
      "requestId": "64cc79d500ff0f0262a0684ab40001667e7669646772696e6400013700010101",
      "ip": "206.83.113.112",
      "startTime": "2023-08-04T04:08:54.039569Z",
      "endTime": "2023-08-04T04:09:15.029548Z",
      "latency": "20.989979s",
      "method": "POST",
      "resource": "/search",
      "httpVersion": "HTTP/1.1",
      "status": 500,
      "referrer": "https://vidgrind.ausocean.org/search",
      "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
      "urlMapEntry": "auto",
      "host": "vidgrind.ausocean.org",
      "pendingTime": "10.336928390s",
      "instanceIndex": -1,
      "finished": true,
      "line": [
        {
          "time": "2023-08-04T04:09:15.029539Z",
          "severity": "ERROR",
          "logMessage": "Request was aborted after waiting too long to attempt to service your request."
        }
      ],
      "appEngineRelease": "1.9.71",
      "traceId": "77c5ba8367a37d9bb1cd56a2d7bc04cd",
      "first": true,
      "spanId": "16818262233046634674"
    },
    "insertId": "64cc79eb00007413e51a9e0d",
    "httpRequest": {
      "status": 500
    },
    "resource": {
      "type": "gae_app",
      "labels": {
        "project_id": "vidgrind",
        "version_id": "7",
        "zone": "australia-southeast1-3",
        "module_id": "default"
      }
    },
    "timestamp": "2023-08-04T04:08:54.039569Z",
    "severity": "ERROR",
    "labels": {
      "clone_id": ""
    },
    "logName": "projects/vidgrind/logs/appengine.googleapis.com%2Frequest_log",
    "operation": {
      "id": "64cc79d500ff0f0262a0684ab40001667e7669646772696e6400013700010101",
      "producer": "appengine.googleapis.com/request_id",
      "first": true,
      "last": true
    },
    "trace": "projects/vidgrind/traces/77c5ba8367a37d9bb1cd56a2d7bc04cd",
    "receiveTimestamp": "2023-08-04T04:09:15.031783175Z",
    "spanId": "16818262233046634674"
  }
> ERROR 2023-08-04T04:09:15.029539Z Request was aborted after waiting too long to attempt to service your request.

Originally I though we might be requesting too much data at once, but upon revisiting this problem, I found that I can now search larger amounts fine, until I start searching for hours of audio at a time, then VidGrind crashes and I get this different error:

INFO 2023-08-16T10:19:33.579798Z Exceeded hard memory limit of 384 MiB with 387 MiB after servicing 870 requests total. Consider setting a larger instance class in app.yaml.

In both of these cases, VidGrind should not crash, it should instead report the error to the user and cancel the request.

I’m not sure how to recreate the former error but let’s at least handle the latter error.

Comments (4)

  1. Alan Noble

    We could increase the instance_class in app.yaml (as suggested by the error message), but we’ll just end up pay more $$$. The root cause is we’re using too much memory, which is turn because we’re requesting too much data from the datastore, no doubt because the query time range is simply too large.

    To reliably serve large query results, I think we need to do things differently.

    For example, we could first execute a ‘quick’ query, which returns just the keys. Then, using those keys (which could be cached), we could retrieve the corresponding media spread across multiple requests, vs. one big request.

    We could also start caching frequently accessed media too.

  2. Log in to comment