Medium to large data exports exceed web server timeout defaults

Issue #19 resolved

Ed McDonagh created an issue 2014-02-11

Task should be carried out in the background rather than as a template response to a server request - I don't know how the finished object is then passed to the user. Should probably use Celery.

Comments (66)

Ed McDonagh reporter

As a workaround for my gunicorn/nginx setup, I have increased the gunicorn timeout to 15 minutes by editing /var/conquest/openrem/bin/gunicorn_start:

exec bin/gunicorn ${DJANGO_WSGI_MODULE}:application \
  --name $NAME \
  --workers $NUM_WORKERS \
  --user=$USER --group=$GROUP \
  --log-level=debug \
  --timeout 900

And I have used the same value in the nginx definition /etc/nginx/sites-available/openremTCP

location / {
    proxy_pass          http://app_server;
    proxy_redirect      off;
    proxy_set_header    Host            $host;
    proxy_set_header    X-Real-IP       $remote_addr;
    proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_connect_timeout   900s;
    proxy_read_timeout      900s;
}

2014-02-11T21:58:10+00:00

Ed McDonagh reporter
- changed component to Interface
- 2014-03-13T15:45:47+00:00
Ed McDonagh reporter
Added celery to the requires list. Refs ~~#19~~.

→ <<cset 25b4d332e766>>
- 2014-04-24T21:51:22+00:00
Ed McDonagh reporter
Celery with RabbitMQ seems to be to big for the task, and RabbitMQ needs a lot of setup and configuration.

Alternatives are Huey with Redis or Celery with Redis.

Go with Celery for now, as I can't see any examples of Huey with status updates other than finished or not finished.
- 2014-04-27T10:41:13+00:00
Ed McDonagh reporter
Now established that Redis needs to have the server installed too, not just the pypi package. Redis is also not supported on Windows, though it can work.

Back to RabbitMQ, which has installation instructions for the major operating systems at least, and it looks like it might just work once installed.
- 2014-04-28T20:16:52+00:00
Ed McDonagh reporter
Added celery.py, edited openrem/init.py and openrem/settings as per the first steps tutorial and django tutorial. Haven't tested yet (or added any tasks). Refs ~~#19~~.

→ <<cset 704caa538463>>
- 2014-04-28T20:50:18+00:00
Ed McDonagh reporter
Moved the future bit to the top as it has to be there. For some reason, I also had to import local_settings as openrem.local_settings... need to ensure that this still works with a pip installed version. Refs ~~#19~~.

→ <<cset 24b827753974>>
- 2014-04-29T22:07:39+00:00
Ed McDonagh reporter
Added some remapp tasks and templates from the getting started with celery documentation and from http://iambusychangingtheworld.blogspot.co.uk/2013/07/django-celery-display-progress-bar-of.html. However, the task doesn't seem to start as the status is always pending. Refs ~~#19~~.

→ <<cset 26e36ee16cc4>>
- 2014-04-29T22:07:39+00:00
Ed McDonagh reporter
Progress bar task now works - Celery server must be running (celery -A openrem worker -l info) and CELERY_TASK_SERIALIZER = 'json' and CELERY_RESULT_SERIALIZER = 'json' had to be added to settings to force use of json instead of pickle, which was being blocked by the security. Refs ~~#19~~.

→ <<cset 8c4c8f469f4d>>
- 2014-04-30T11:46:44+00:00
Ed McDonagh reporter
Attempt to get ct csv export running using celery. Task is imported correctly and listed by celery, but the request isn't being passed correctly. Propose refactor csv export routine. Refs ~~#19~~

→ <<cset 74fb3c08418c>>
- 2014-05-01T20:53:30+00:00
Ed McDonagh reporter
Moved the view/task launcher to the csv export file. Moved the url accordingly and tidied up the csv export url section. Added link to the celery export to the html template. refs ~~#19~~.

→ <<cset 9a5900950cec>>
- 2014-05-01T20:53:30+00:00
Ed McDonagh reporter
Adding json and csrf_exempt imports. Now in the same state as prior to moving everything to the exportcsv file. I.e. doesn't work :-) Refs ~~#19~~.

→ <<cset 590807d2a9b1>>
- 2014-05-01T20:53:31+00:00
Ed McDonagh reporter
Factored out getting the query filters from the string. Now the task gets launched before crashing. Refs ~~#19~~.

→ <<cset 9730eb391d7b>>
- 2014-05-01T20:53:31+00:00
Ed McDonagh reporter
- changed status to open
- 2014-05-01T21:09:32+00:00
Ed McDonagh reporter
Added path to MEDIA_ROOT in local_settings. Modified exportCT2excel to save the csv file to disk. Celery export from the web page now correctly launches the celery task and the file is created and saved. Now need to do status updates and serve the file url back to the web page. Refs ~~#19~~.

→ <<cset 2c3f92d38b65>>
- 2014-05-02T08:16:55+00:00
Ed McDonagh reporter
Added current task status updates throughout CT csv export. Refs ~~#19~~.

→ <<cset c1f74e54a9cf>>
- 2014-05-06T12:54:22+00:00
Ed McDonagh reporter
Modified the task that is called from the demonstration to the ct csv export for testing - correct task is completed. Refs ~~#19~~.

→ <<cset e88d99f376eb>>
- 2014-05-06T12:54:22+00:00
Ed McDonagh reporter
Added just the button and the task launcher javascript from the example to launch the ct csv export with no arguments. Refs ~~#19~~.

→ <<cset 705c37ea1996>>
- 2014-05-06T12:54:22+00:00
Ed McDonagh reporter
Modifying the url pattern to make the javascript work. Refs ~~#19~~.

→ <<cset 2752241ef3fe>>
- 2014-05-06T12:54:22+00:00
Ed McDonagh reporter
Added in the code from the example to set the session task_id. Not yet working. Refs ~~#19~~.

→ <<cset 756ff4632a50>>
- 2014-05-06T15:40:03+00:00
Ed McDonagh reporter
Added progress container and javascript from the example. Doesn't yet work. Refs ~~#19~~.

→ <<cset 8a6c452ba173>>
- 2014-05-06T15:40:03+00:00
Ed McDonagh reporter
Changed exports urls to allow for some exports to be in different files within the exports folder - initially the new file ajaxviews for poll_state. Refs ~~#19~~.

→ <<cset d8d40d48a348>>
- 2014-05-06T15:40:03+00:00
Ed McDonagh reporter
New ajaxviews file for the poll_state code. Refs ~~#19~~

→ <<cset a646b5dcf511>>
- 2014-05-06T15:40:03+00:00
Ed McDonagh reporter
Moved do_ct_csv from exportcsv to ajaxviews to make it easier to work with. Refs ~~#19~~.

→ <<cset 80a800cfa5ff>>
- 2014-05-07T22:15:45+00:00
Ed McDonagh reporter
Modified task to be more like the original example, hasn't made a difference. Refs ~~#19~~.

→ <<cset 3c13901e83d9>>
- 2014-05-07T22:15:45+00:00
Ed McDonagh reporter
Tidied up ctfiltered slightly, no functional change. Added in a new URL of ct to see if that was the issue, it wasn't. Added the do_ct_csv without the ct prefix for the adapted 1000 objects test page. Refs ~~#19~~.

→ <<cset 12fa713e17a0>>
- 2014-05-07T22:15:45+00:00
Ed McDonagh reporter
Added a new ct_csv routine so I can fiddle it whilst the original is still working, sort of. Commit prior to changing results to use django database. Refs ~~#19~~

→ <<cset 791093db1357>>
- 2014-05-12T08:20:43+00:00
Ed McDonagh reporter
Changed result backend to django in anticipation of storing resulting file locations for display in different template. Not sure this is necessary at the moment though. Refs ~~#19~~. Removed django_extensions that has been hashed out since the early days of the project.

→ <<cset f7f2bb119888>>
- 2014-05-15T21:41:22+00:00
Ed McDonagh reporter
Various experiments to pass the data to the backend initially, which it is currently doing. Refs ~~#19~~.

→ <<cset 7618b7aee35b>>
- 2014-05-15T21:41:22+00:00
Ed McDonagh reporter
Added in code to parse the string that is sent in the ajax request into a dictionary. This means getQueryFilters is now not required for this workflow. Next task is to change the fields in the export csv code to use the database field names used in the querystring. Refs ~~#19~~.

→ <<cset d96256780489>>
- 2014-05-15T21:41:22+00:00
Ed McDonagh reporter
Changed the meta status to statupdate to avoid potential clashes. Refs ~~#19~~.

→ <<cset 4844918f5405>>
- 2014-05-15T21:41:22+00:00
Ed McDonagh reporter
Modified the names of the keys to match those in the query string, and took account of the fact the values are being sent as single value lists. CT CSV is now correctly created with the filtered range of studies. Refs ~~#19~~.

→ <<cset 6202ad6abce6>>
- 2014-05-16T08:39:11+00:00
Ed McDonagh reporter
Added some print statements that appear in the celery shell. Original 'all study data written' status update was at the wrong indent, hence why it reached that status so quickly. Now only appears at the end. Added index to the exam for loop to enable a x of y status update and print statement for each study. Refs ~~#19~~.

→ <<cset f2083a85dab7>>
- 2014-05-16T18:51:10+00:00
Ed McDonagh reporter
Added in missing question mark. Refs ~~#19~~.

→ <<cset d2986c0a38d2>>
- 2014-05-26T20:11:21+00:00
Ed McDonagh reporter
Added in view initial view handler that was missing from my version of the example. Text link from ctfiltered page now launches this view, which throws up the new exports page in the interface. A further launch of the request is then required. Refs ~~#19~~.

→ <<cset 592ed088abbc>>
- 2014-05-26T20:11:21+00:00
Ed McDonagh reporter
Correct URL is now launched from the javascript ajax request and it nominally works - task is started and there is some sort of feedback. The task reference and progress bar are repeated twice at the moment... Refs ~~#19~~.

→ <<cset 878e0b63c293>>
- 2014-05-26T20:21:28+00:00
Ed McDonagh reporter
Revert customisation to ajaxviews, not sure why it had changed. Attempting to pull some progress indicators into exports.html - without success so far. Refs ~~#19~~.

→ <<cset 164b2e2eb47c>>
- 2014-05-26T21:23:35+00:00
Ed McDonagh reporter
Removed status bar code from export html template and added some task status javascript. Didn't work until realised that the poll_state url wasn't resolving properly so was serving up the original page again instead of polling the state. Now fixed. Refs ~~#19~~.

→ <<cset a4365b95aca2>>
- 2014-05-28T20:28:08+00:00
Ed McDonagh reporter
Added test to prevent infinite loop of polling. Refs ~~#19~~.

→ <<cset e2c5e8752b63>>
- 2014-05-28T20:28:08+00:00
Ed McDonagh reporter
Removed the javascript related code that now takes place in exports.html. Refs ~~#19~~

→ <<cset 987a24f7d09d>>
- 2014-05-28T20:28:08+00:00
Ed McDonagh reporter
Minor change to formatting in exports.html. Now only passing task.result in data from poll_state so that the statupdate value remains available when task is finished. Refs ~~#19~~.

→ <<cset 89a1a31e9b47>>
- 2014-05-28T20:28:08+00:00
Ed McDonagh reporter
Changed filename to incorporate current date and time successfully, and attempted to feed this into the web page, unsuccessfully. Refs ~~#19~~

→ <<cset 3f9d07c1be0b>>
- 2014-05-28T20:28:08+00:00
Ed McDonagh reporter
Added new table to the database to hold exort information. Refs ~~#19~~.

→ <<cset 07e017eacd0d>>
- 2014-05-30T17:49:31+00:00
Ed McDonagh reporter
New task is created in the database each time this is run, and the task ID, status and filename are sucessfully stored. Refs ~~#19~~.

→ <<cset bddbfefbf25e>>
- 2014-05-30T17:49:31+00:00
Ed McDonagh reporter
Attempt to pass the exports objects into the template and display them - not currently working. Refs ~~#19~~.

→ <<cset 8b2b389e7a28>>
- 2014-05-30T17:49:31+00:00
Ed McDonagh reporter
Modified the media URL in settings. Corrected mistake and completed preliminary table of tasks in the export list, including non-functioning link to the file to download. Refs ~~#19~~.

→ <<cset f7f2da2fdd0c>>
- 2014-05-30T21:23:12+00:00
Ed McDonagh reporter
Download link now has a match in urls.py which loads a download view. Works. Refs ~~#19~~.

→ <<cset 41c4343ef265>>
- 2014-05-31T21:39:10+00:00
Ed McDonagh reporter
Added database status updates. Refs ~~#19~~.

→ <<cset 703fcd621af2>>
- 2014-05-31T21:49:02+00:00
Ed McDonagh reporter
Added in modality, export date/time, export type description and number of records. Refs ~~#19~~.

→ <<cset 6774ca7a8bde>>
- 2014-05-31T22:10:24+00:00
Ed McDonagh reporter
Calling export ct csv job from initial view on export click. Reverted exportCT2excel to look for filters in the request. Refs ~~#19~~.

→ <<cset 2a0605949177>>
- 2014-06-01T20:01:35+00:00
Ed McDonagh reporter
Job is now successfully started on export click from filter page. AJAX on export page currently creating a new job each time it is refreshed. Refs ~~#19~~.

→ <<cset 7a1a2f127060>>
- 2014-06-02T20:51:34+00:00
Ed McDonagh reporter
Added in a modality code and an export id to the GET parameters, which then dictate which job is run. Works, but currently struggling to then remove those queries, or all the get request parameters before moving on so that the job isn't launched each time the view is loaded. Refs ~~#19~~.

→ <<cset 435c1abeb553>>
- 2014-06-02T20:51:34+00:00
Ed McDonagh reporter
New approach - specific export link view just launches job and sets job.id into session, though we might not need this, then redirects to the generic results page. Refs ~~#19~~.

→ <<cset a1f18314411b>>
- 2014-06-02T21:06:56+00:00
Ed McDonagh reporter
Removed setting of task_id in session - not currently being used. Refs ~~#19~~.

→ <<cset 62fa953de26f>>
- 2014-06-02T22:01:21+00:00
Ed McDonagh reporter
Abbreviated all the long CT query filter variables to make working with them easier. Changedgetting of query variables and then filtering by them into one move, with a test that the variable exists and has content before using it. This allows for no filters, some filters and empty filters. Changed CT link from home page accordingly for station name and removed unused accession_number element. Refs ~~#19~~.

→ <<cset 5769d07abf38>>
- 2014-06-02T22:01:21+00:00
Ed McDonagh reporter
Swiched order of exports to put the most recent at the top. Refs ~~#19~~.

→ <<cset 1bff21d9974f>>
- 2014-06-02T22:05:51+00:00
Ed McDonagh reporter
Replaced the status field with progress, prior to adding an 'in progress' and 'complete' status function. Refs ~~#19~~.

→ <<cset 7cda613a27ad>>
- 2014-06-03T08:50:55+00:00
Ed McDonagh reporter
Created two new query objects of current and complete and replaced the single table in the exports template with a permanent table for the completed tasks and a table for the current tasks if there are any. Refs ~~#19~~.

→ <<cset 83c62104e1d3>>
- 2014-06-03T08:50:55+00:00
Ed McDonagh reporter
Exports page now has a http-equiv refresh and no javascript. Refreshes every one second whilst there is an active export in progress. Refs ~~#19~~.

→ <<cset 3237ce1d5c95>>
- 2014-06-03T20:02:11+00:00
Ed McDonagh reporter
Now gets the filters from the CTSummaryListFilter instead of explicitly listing them - more dry :-) The trick to having keywork arguments as strings is the ** thing. Refs ~~#19~~.

→ <<cset 762196824980>>
- 2014-06-03T21:47:57+00:00
Ed McDonagh reporter
Applied asynchronous task format to the CT xlsx export. Removed dummy tasks. Status is bounching between all data and protocol sheets, but otherwise works nicely. Mammo and Fluoro csv exports remain. Refs ~~#19~~.

→ <<cset 86096af34b58>>
- 2014-06-04T16:50:07+00:00
Ed McDonagh reporter
Fixed xlsx status update messages. Refs ~~#19~~.

→ <<cset 40ac637609a4>>
- 2014-06-04T16:50:07+00:00
Ed McDonagh reporter
Experimenting with less AJAX was successful and the main task has been completed so now tidying up. Refs ~~#19~~.

→ <<cset 3b2aa58ae64f>>
- 2014-06-04T21:15:58+00:00
Ed McDonagh reporter
- changed status to resolved
Celery has now been implemented successfully for all existing exports. Some work remains. Fixes ~~#19~~.

→ <<cset d379da015e71>>
- 2014-06-04T21:15:58+00:00
Ed McDonagh reporter
- changed milestone to 0.4.3
- 2014-06-04T21:17:56+00:00
Ed McDonagh reporter
Added the new Exports table and Size_upload table to the admin interface. Refs ~~#19~~ and ~~#21~~.

→ <<cset 8ee2510a7b7d>>
- 2014-06-16T08:47:11+00:00
Log in to comment

Assignee: Ed McDonagh

Type: bug

Priority: major

Status: resolved

Component: Interface

Milestone: 0.4.3

Votes: 0

Watchers: 1