Medium to large data exports exceed web server timeout defaults

Issue #19 resolved
Ed McDonagh created an issue

Task should be carried out in the background rather than as a template response to a server request - I don't know how the finished object is then passed to the user. Should probably use Celery.

Comments (66)

  1. Ed McDonagh reporter

    As a workaround for my gunicorn/nginx setup, I have increased the gunicorn timeout to 15 minutes by editing /var/conquest/openrem/bin/gunicorn_start:

    exec bin/gunicorn ${DJANGO_WSGI_MODULE}:application \
      --name $NAME \
      --workers $NUM_WORKERS \
      --user=$USER --group=$GROUP \
      --log-level=debug \
      --timeout 900
    

    And I have used the same value in the nginx definition /etc/nginx/sites-available/openremTCP

    location / {
        proxy_pass          http://app_server;
        proxy_redirect      off;
        proxy_set_header    Host            $host;
        proxy_set_header    X-Real-IP       $remote_addr;
        proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout   900s;
        proxy_read_timeout      900s;
    }
    
  2. Ed McDonagh reporter

    Celery with RabbitMQ seems to be to big for the task, and RabbitMQ needs a lot of setup and configuration.

    Alternatives are Huey with Redis or Celery with Redis.

    Go with Celery for now, as I can't see any examples of Huey with status updates other than finished or not finished.

  3. Ed McDonagh reporter

    Now established that Redis needs to have the server installed too, not just the pypi package. Redis is also not supported on Windows, though it can work.

    Back to RabbitMQ, which has installation instructions for the major operating systems at least, and it looks like it might just work once installed.

  4. Ed McDonagh reporter

    Added celery.py, edited openrem/init.py and openrem/settings as per the first steps tutorial and django tutorial. Haven't tested yet (or added any tasks). Refs #19.

    → <<cset 704caa538463>>

  5. Ed McDonagh reporter

    Moved the future bit to the top as it has to be there. For some reason, I also had to import local_settings as openrem.local_settings... need to ensure that this still works with a pip installed version. Refs #19.

    → <<cset 24b827753974>>

  6. Ed McDonagh reporter

    Progress bar task now works - Celery server must be running (celery -A openrem worker -l info) and CELERY_TASK_SERIALIZER = 'json' and CELERY_RESULT_SERIALIZER = 'json' had to be added to settings to force use of json instead of pickle, which was being blocked by the security. Refs #19.

    → <<cset 8c4c8f469f4d>>

  7. Ed McDonagh reporter

    Attempt to get ct csv export running using celery. Task is imported correctly and listed by celery, but the request isn't being passed correctly. Propose refactor csv export routine. Refs #19

    → <<cset 74fb3c08418c>>

  8. Ed McDonagh reporter

    Moved the view/task launcher to the csv export file. Moved the url accordingly and tidied up the csv export url section. Added link to the celery export to the html template. refs #19.

    → <<cset 9a5900950cec>>

  9. Ed McDonagh reporter

    Adding json and csrf_exempt imports. Now in the same state as prior to moving everything to the exportcsv file. I.e. doesn't work :-) Refs #19.

    → <<cset 590807d2a9b1>>

  10. Ed McDonagh reporter

    Added path to MEDIA_ROOT in local_settings. Modified exportCT2excel to save the csv file to disk. Celery export from the web page now correctly launches the celery task and the file is created and saved. Now need to do status updates and serve the file url back to the web page. Refs #19.

    → <<cset 2c3f92d38b65>>

  11. Ed McDonagh reporter

    Changed exports urls to allow for some exports to be in different files within the exports folder - initially the new file ajaxviews for poll_state. Refs #19.

    → <<cset d8d40d48a348>>

  12. Ed McDonagh reporter

    Tidied up ctfiltered slightly, no functional change. Added in a new URL of ct to see if that was the issue, it wasn't. Added the do_ct_csv without the ct prefix for the adapted 1000 objects test page. Refs #19.

    → <<cset 12fa713e17a0>>

  13. Ed McDonagh reporter

    Added a new ct_csv routine so I can fiddle it whilst the original is still working, sort of. Commit prior to changing results to use django database. Refs #19

    → <<cset 791093db1357>>

  14. Ed McDonagh reporter

    Changed result backend to django in anticipation of storing resulting file locations for display in different template. Not sure this is necessary at the moment though. Refs #19. Removed django_extensions that has been hashed out since the early days of the project.

    → <<cset f7f2bb119888>>

  15. Ed McDonagh reporter

    Added in code to parse the string that is sent in the ajax request into a dictionary. This means getQueryFilters is now not required for this workflow. Next task is to change the fields in the export csv code to use the database field names used in the querystring. Refs #19.

    → <<cset d96256780489>>

  16. Ed McDonagh reporter

    Modified the names of the keys to match those in the query string, and took account of the fact the values are being sent as single value lists. CT CSV is now correctly created with the filtered range of studies. Refs #19.

    → <<cset 6202ad6abce6>>

  17. Ed McDonagh reporter

    Added some print statements that appear in the celery shell. Original 'all study data written' status update was at the wrong indent, hence why it reached that status so quickly. Now only appears at the end. Added index to the exam for loop to enable a x of y status update and print statement for each study. Refs #19.

    → <<cset f2083a85dab7>>

  18. Ed McDonagh reporter

    Added in view initial view handler that was missing from my version of the example. Text link from ctfiltered page now launches this view, which throws up the new exports page in the interface. A further launch of the request is then required. Refs #19.

    → <<cset 592ed088abbc>>

  19. Ed McDonagh reporter

    Correct URL is now launched from the javascript ajax request and it nominally works - task is started and there is some sort of feedback. The task reference and progress bar are repeated twice at the moment... Refs #19.

    → <<cset 878e0b63c293>>

  20. Ed McDonagh reporter

    Revert customisation to ajaxviews, not sure why it had changed. Attempting to pull some progress indicators into exports.html - without success so far. Refs #19.

    → <<cset 164b2e2eb47c>>

  21. Ed McDonagh reporter

    Removed status bar code from export html template and added some task status javascript. Didn't work until realised that the poll_state url wasn't resolving properly so was serving up the original page again instead of polling the state. Now fixed. Refs #19.

    → <<cset a4365b95aca2>>

  22. Ed McDonagh reporter

    Minor change to formatting in exports.html. Now only passing task.result in data from poll_state so that the statupdate value remains available when task is finished. Refs #19.

    → <<cset 89a1a31e9b47>>

  23. Ed McDonagh reporter

    Changed filename to incorporate current date and time successfully, and attempted to feed this into the web page, unsuccessfully. Refs #19

    → <<cset 3f9d07c1be0b>>

  24. Ed McDonagh reporter

    Modified the media URL in settings. Corrected mistake and completed preliminary table of tasks in the export list, including non-functioning link to the file to download. Refs #19.

    → <<cset f7f2da2fdd0c>>

  25. Ed McDonagh reporter

    Job is now successfully started on export click from filter page. AJAX on export page currently creating a new job each time it is refreshed. Refs #19.

    → <<cset 7a1a2f127060>>

  26. Ed McDonagh reporter

    Added in a modality code and an export id to the GET parameters, which then dictate which job is run. Works, but currently struggling to then remove those queries, or all the get request parameters before moving on so that the job isn't launched each time the view is loaded. Refs #19.

    → <<cset 435c1abeb553>>

  27. Ed McDonagh reporter

    New approach - specific export link view just launches job and sets job.id into session, though we might not need this, then redirects to the generic results page. Refs #19.

    → <<cset a1f18314411b>>

  28. Ed McDonagh reporter

    Abbreviated all the long CT query filter variables to make working with them easier. Changedgetting of query variables and then filtering by them into one move, with a test that the variable exists and has content before using it. This allows for no filters, some filters and empty filters. Changed CT link from home page accordingly for station name and removed unused accession_number element. Refs #19.

    → <<cset 5769d07abf38>>

  29. Ed McDonagh reporter

    Created two new query objects of current and complete and replaced the single table in the exports template with a permanent table for the completed tasks and a table for the current tasks if there are any. Refs #19.

    → <<cset 83c62104e1d3>>

  30. Ed McDonagh reporter

    Exports page now has a http-equiv refresh and no javascript. Refreshes every one second whilst there is an active export in progress. Refs #19.

    → <<cset 3237ce1d5c95>>

  31. Ed McDonagh reporter

    Now gets the filters from the CTSummaryListFilter instead of explicitly listing them - more dry :-) The trick to having keywork arguments as strings is the ** thing. Refs #19.

    → <<cset 762196824980>>

  32. Ed McDonagh reporter

    Applied asynchronous task format to the CT xlsx export. Removed dummy tasks. Status is bounching between all data and protocol sheets, but otherwise works nicely. Mammo and Fluoro csv exports remain. Refs #19.

    → <<cset 86096af34b58>>

  33. Log in to comment