Medium to large data exports exceed web server timeout defaults

Issue #19 resolved
Ed McDonagh
created an issue

Task should be carried out in the background rather than as a template response to a server request - I don't know how the finished object is then passed to the user. Should probably use Celery.

Comments (66)

  1. Ed McDonagh reporter

    As a workaround for my gunicorn/nginx setup, I have increased the gunicorn timeout to 15 minutes by editing /var/conquest/openrem/bin/gunicorn_start:

    exec bin/gunicorn ${DJANGO_WSGI_MODULE}:application \
      --name $NAME \
      --workers $NUM_WORKERS \
      --user=$USER --group=$GROUP \
      --log-level=debug \
      --timeout 900

    And I have used the same value in the nginx definition /etc/nginx/sites-available/openremTCP

    location / {
        proxy_pass          http://app_server;
        proxy_redirect      off;
        proxy_set_header    Host            $host;
        proxy_set_header    X-Real-IP       $remote_addr;
        proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout   900s;
        proxy_read_timeout      900s;
  2. Ed McDonagh reporter

    Celery with RabbitMQ seems to be to big for the task, and RabbitMQ needs a lot of setup and configuration.

    Alternatives are Huey with Redis or Celery with Redis.

    Go with Celery for now, as I can't see any examples of Huey with status updates other than finished or not finished.

  3. Ed McDonagh reporter

    Now established that Redis needs to have the server installed too, not just the pypi package. Redis is also not supported on Windows, though it can work.

    Back to RabbitMQ, which has installation instructions for the major operating systems at least, and it looks like it might just work once installed.

  4. Ed McDonagh reporter

    Moved the future bit to the top as it has to be there. For some reason, I also had to import local_settings as openrem.local_settings... need to ensure that this still works with a pip installed version. Refs #19.

    → <<cset 24b827753974>>

  5. Ed McDonagh reporter

    Progress bar task now works - Celery server must be running (celery -A openrem worker -l info) and CELERY_TASK_SERIALIZER = 'json' and CELERY_RESULT_SERIALIZER = 'json' had to be added to settings to force use of json instead of pickle, which was being blocked by the security. Refs #19.

    → <<cset 8c4c8f469f4d>>

  6. Ed McDonagh reporter

    Added path to MEDIA_ROOT in local_settings. Modified exportCT2excel to save the csv file to disk. Celery export from the web page now correctly launches the celery task and the file is created and saved. Now need to do status updates and serve the file url back to the web page. Refs #19.

    → <<cset 2c3f92d38b65>>

  7. Ed McDonagh reporter

    Tidied up ctfiltered slightly, no functional change. Added in a new URL of ct to see if that was the issue, it wasn't. Added the do_ct_csv without the ct prefix for the adapted 1000 objects test page. Refs #19.

    → <<cset 12fa713e17a0>>

  8. Ed McDonagh reporter

    Changed result backend to django in anticipation of storing resulting file locations for display in different template. Not sure this is necessary at the moment though. Refs #19. Removed django_extensions that has been hashed out since the early days of the project.

    → <<cset f7f2bb119888>>

  9. Ed McDonagh reporter

    Added in code to parse the string that is sent in the ajax request into a dictionary. This means getQueryFilters is now not required for this workflow. Next task is to change the fields in the export csv code to use the database field names used in the querystring. Refs #19.

    → <<cset d96256780489>>

  10. Ed McDonagh reporter

    Modified the names of the keys to match those in the query string, and took account of the fact the values are being sent as single value lists. CT CSV is now correctly created with the filtered range of studies. Refs #19.

    → <<cset 6202ad6abce6>>

  11. Ed McDonagh reporter

    Added some print statements that appear in the celery shell. Original 'all study data written' status update was at the wrong indent, hence why it reached that status so quickly. Now only appears at the end. Added index to the exam for loop to enable a x of y status update and print statement for each study. Refs #19.

    → <<cset f2083a85dab7>>

  12. Ed McDonagh reporter

    Added in view initial view handler that was missing from my version of the example. Text link from ctfiltered page now launches this view, which throws up the new exports page in the interface. A further launch of the request is then required. Refs #19.

    → <<cset 592ed088abbc>>

  13. Ed McDonagh reporter

    Correct URL is now launched from the javascript ajax request and it nominally works - task is started and there is some sort of feedback. The task reference and progress bar are repeated twice at the moment... Refs #19.

    → <<cset 878e0b63c293>>

  14. Ed McDonagh reporter

    Removed status bar code from export html template and added some task status javascript. Didn't work until realised that the poll_state url wasn't resolving properly so was serving up the original page again instead of polling the state. Now fixed. Refs #19.

    → <<cset a4365b95aca2>>

  15. Ed McDonagh reporter

    Added in a modality code and an export id to the GET parameters, which then dictate which job is run. Works, but currently struggling to then remove those queries, or all the get request parameters before moving on so that the job isn't launched each time the view is loaded. Refs #19.

    → <<cset 435c1abeb553>>

  16. Ed McDonagh reporter

    Abbreviated all the long CT query filter variables to make working with them easier. Changedgetting of query variables and then filtering by them into one move, with a test that the variable exists and has content before using it. This allows for no filters, some filters and empty filters. Changed CT link from home page accordingly for station name and removed unused accession_number element. Refs #19.

    → <<cset 5769d07abf38>>

  17. Ed McDonagh reporter

    Created two new query objects of current and complete and replaced the single table in the exports template with a permanent table for the completed tasks and a table for the current tasks if there are any. Refs #19.

    → <<cset 83c62104e1d3>>

  18. Ed McDonagh reporter

    Applied asynchronous task format to the CT xlsx export. Removed dummy tasks. Status is bounching between all data and protocol sheets, but otherwise works nicely. Mammo and Fluoro csv exports remain. Refs #19.

    → <<cset 86096af34b58>>

  19. Log in to comment