Medium to large data exports exceed web server timeout defaults
Task should be carried out in the background rather than as a template response to a server request - I don't know how the finished object is then passed to the user. Should probably use Celery.
Comments (66)
-
reporter -
reporter - changed component to Interface
-
reporter Added celery to the requires list. Refs
#19.→ <<cset 25b4d332e766>>
-
reporter Celery with RabbitMQ seems to be to big for the task, and RabbitMQ needs a lot of setup and configuration.
Alternatives are Huey with Redis or Celery with Redis.
Go with Celery for now, as I can't see any examples of Huey with status updates other than finished or not finished.
-
reporter Now established that Redis needs to have the server installed too, not just the pypi package. Redis is also not supported on Windows, though it can work.
Back to RabbitMQ, which has installation instructions for the major operating systems at least, and it looks like it might just work once installed.
-
reporter Added celery.py, edited openrem/init.py and openrem/settings as per the first steps tutorial and django tutorial. Haven't tested yet (or added any tasks). Refs
#19.→ <<cset 704caa538463>>
-
reporter Moved the future bit to the top as it has to be there. For some reason, I also had to import local_settings as openrem.local_settings... need to ensure that this still works with a pip installed version. Refs
#19.→ <<cset 24b827753974>>
-
reporter Added some remapp tasks and templates from the getting started with celery documentation and from http://iambusychangingtheworld.blogspot.co.uk/2013/07/django-celery-display-progress-bar-of.html. However, the task doesn't seem to start as the status is always pending. Refs
#19.→ <<cset 26e36ee16cc4>>
-
reporter Progress bar task now works - Celery server must be running (celery -A openrem worker -l info) and CELERY_TASK_SERIALIZER = 'json' and CELERY_RESULT_SERIALIZER = 'json' had to be added to settings to force use of json instead of pickle, which was being blocked by the security. Refs
#19.→ <<cset 8c4c8f469f4d>>
-
reporter Attempt to get ct csv export running using celery. Task is imported correctly and listed by celery, but the request isn't being passed correctly. Propose refactor csv export routine. Refs
#19→ <<cset 74fb3c08418c>>
-
reporter Moved the view/task launcher to the csv export file. Moved the url accordingly and tidied up the csv export url section. Added link to the celery export to the html template. refs
#19.→ <<cset 9a5900950cec>>
-
reporter Adding json and csrf_exempt imports. Now in the same state as prior to moving everything to the exportcsv file. I.e. doesn't work :-) Refs
#19.→ <<cset 590807d2a9b1>>
-
reporter Factored out getting the query filters from the string. Now the task gets launched before crashing. Refs
#19.→ <<cset 9730eb391d7b>>
-
reporter - changed status to open
-
reporter Added path to MEDIA_ROOT in local_settings. Modified exportCT2excel to save the csv file to disk. Celery export from the web page now correctly launches the celery task and the file is created and saved. Now need to do status updates and serve the file url back to the web page. Refs
#19.→ <<cset 2c3f92d38b65>>
-
reporter Added current task status updates throughout CT csv export. Refs
#19.→ <<cset c1f74e54a9cf>>
-
reporter Modified the task that is called from the demonstration to the ct csv export for testing - correct task is completed. Refs
#19.→ <<cset e88d99f376eb>>
-
reporter Added just the button and the task launcher javascript from the example to launch the ct csv export with no arguments. Refs
#19.→ <<cset 705c37ea1996>>
-
reporter Modifying the url pattern to make the javascript work. Refs
#19.→ <<cset 2752241ef3fe>>
-
reporter Added in the code from the example to set the session task_id. Not yet working. Refs
#19.→ <<cset 756ff4632a50>>
-
reporter Added progress container and javascript from the example. Doesn't yet work. Refs
#19.→ <<cset 8a6c452ba173>>
-
reporter Changed exports urls to allow for some exports to be in different files within the exports folder - initially the new file ajaxviews for poll_state. Refs
#19.→ <<cset d8d40d48a348>>
-
reporter New ajaxviews file for the poll_state code. Refs
#19→ <<cset a646b5dcf511>>
-
reporter Moved do_ct_csv from exportcsv to ajaxviews to make it easier to work with. Refs
#19.→ <<cset 80a800cfa5ff>>
-
reporter Modified task to be more like the original example, hasn't made a difference. Refs
#19.→ <<cset 3c13901e83d9>>
-
reporter Tidied up ctfiltered slightly, no functional change. Added in a new URL of ct to see if that was the issue, it wasn't. Added the do_ct_csv without the ct prefix for the adapted 1000 objects test page. Refs
#19.→ <<cset 12fa713e17a0>>
-
reporter Added a new ct_csv routine so I can fiddle it whilst the original is still working, sort of. Commit prior to changing results to use django database. Refs
#19→ <<cset 791093db1357>>
-
reporter Changed result backend to django in anticipation of storing resulting file locations for display in different template. Not sure this is necessary at the moment though. Refs
#19. Removed django_extensions that has been hashed out since the early days of the project.→ <<cset f7f2bb119888>>
-
reporter Various experiments to pass the data to the backend initially, which it is currently doing. Refs
#19.→ <<cset 7618b7aee35b>>
-
reporter Added in code to parse the string that is sent in the ajax request into a dictionary. This means getQueryFilters is now not required for this workflow. Next task is to change the fields in the export csv code to use the database field names used in the querystring. Refs
#19.→ <<cset d96256780489>>
-
reporter Changed the meta status to statupdate to avoid potential clashes. Refs
#19.→ <<cset 4844918f5405>>
-
reporter Modified the names of the keys to match those in the query string, and took account of the fact the values are being sent as single value lists. CT CSV is now correctly created with the filtered range of studies. Refs
#19.→ <<cset 6202ad6abce6>>
-
reporter Added some print statements that appear in the celery shell. Original 'all study data written' status update was at the wrong indent, hence why it reached that status so quickly. Now only appears at the end. Added index to the exam for loop to enable a x of y status update and print statement for each study. Refs
#19.→ <<cset f2083a85dab7>>
-
reporter Added in missing question mark. Refs
#19.→ <<cset d2986c0a38d2>>
-
reporter Added in view initial view handler that was missing from my version of the example. Text link from ctfiltered page now launches this view, which throws up the new exports page in the interface. A further launch of the request is then required. Refs
#19.→ <<cset 592ed088abbc>>
-
reporter Correct URL is now launched from the javascript ajax request and it nominally works - task is started and there is some sort of feedback. The task reference and progress bar are repeated twice at the moment... Refs
#19.→ <<cset 878e0b63c293>>
-
reporter Revert customisation to ajaxviews, not sure why it had changed. Attempting to pull some progress indicators into exports.html - without success so far. Refs
#19.→ <<cset 164b2e2eb47c>>
-
reporter Removed status bar code from export html template and added some task status javascript. Didn't work until realised that the poll_state url wasn't resolving properly so was serving up the original page again instead of polling the state. Now fixed. Refs
#19.→ <<cset a4365b95aca2>>
-
reporter Added test to prevent infinite loop of polling. Refs
#19.→ <<cset e2c5e8752b63>>
-
reporter Removed the javascript related code that now takes place in exports.html. Refs
#19→ <<cset 987a24f7d09d>>
-
reporter Minor change to formatting in exports.html. Now only passing task.result in data from poll_state so that the statupdate value remains available when task is finished. Refs
#19.→ <<cset 89a1a31e9b47>>
-
reporter Changed filename to incorporate current date and time successfully, and attempted to feed this into the web page, unsuccessfully. Refs
#19→ <<cset 3f9d07c1be0b>>
-
reporter Added new table to the database to hold exort information. Refs
#19.→ <<cset 07e017eacd0d>>
-
reporter New task is created in the database each time this is run, and the task ID, status and filename are sucessfully stored. Refs
#19.→ <<cset bddbfefbf25e>>
-
reporter Attempt to pass the exports objects into the template and display them - not currently working. Refs
#19.→ <<cset 8b2b389e7a28>>
-
reporter Modified the media URL in settings. Corrected mistake and completed preliminary table of tasks in the export list, including non-functioning link to the file to download. Refs
#19.→ <<cset f7f2da2fdd0c>>
-
reporter Download link now has a match in urls.py which loads a download view. Works. Refs
#19.→ <<cset 41c4343ef265>>
-
reporter Added database status updates. Refs
#19.→ <<cset 703fcd621af2>>
-
reporter Added in modality, export date/time, export type description and number of records. Refs
#19.→ <<cset 6774ca7a8bde>>
-
reporter Calling export ct csv job from initial view on export click. Reverted exportCT2excel to look for filters in the request. Refs
#19.→ <<cset 2a0605949177>>
-
reporter Job is now successfully started on export click from filter page. AJAX on export page currently creating a new job each time it is refreshed. Refs
#19.→ <<cset 7a1a2f127060>>
-
reporter Added in a modality code and an export id to the GET parameters, which then dictate which job is run. Works, but currently struggling to then remove those queries, or all the get request parameters before moving on so that the job isn't launched each time the view is loaded. Refs
#19.→ <<cset 435c1abeb553>>
-
reporter New approach - specific export link view just launches job and sets job.id into session, though we might not need this, then redirects to the generic results page. Refs
#19.→ <<cset a1f18314411b>>
-
reporter Removed setting of task_id in session - not currently being used. Refs
#19.→ <<cset 62fa953de26f>>
-
reporter Abbreviated all the long CT query filter variables to make working with them easier. Changedgetting of query variables and then filtering by them into one move, with a test that the variable exists and has content before using it. This allows for no filters, some filters and empty filters. Changed CT link from home page accordingly for station name and removed unused accession_number element. Refs
#19.→ <<cset 5769d07abf38>>
-
reporter Swiched order of exports to put the most recent at the top. Refs
#19.→ <<cset 1bff21d9974f>>
-
reporter Replaced the status field with progress, prior to adding an 'in progress' and 'complete' status function. Refs
#19.→ <<cset 7cda613a27ad>>
-
reporter Created two new query objects of current and complete and replaced the single table in the exports template with a permanent table for the completed tasks and a table for the current tasks if there are any. Refs
#19.→ <<cset 83c62104e1d3>>
-
reporter Exports page now has a http-equiv refresh and no javascript. Refreshes every one second whilst there is an active export in progress. Refs
#19.→ <<cset 3237ce1d5c95>>
-
reporter Now gets the filters from the CTSummaryListFilter instead of explicitly listing them - more dry :-) The trick to having keywork arguments as strings is the ** thing. Refs
#19.→ <<cset 762196824980>>
-
reporter Applied asynchronous task format to the CT xlsx export. Removed dummy tasks. Status is bounching between all data and protocol sheets, but otherwise works nicely. Mammo and Fluoro csv exports remain. Refs
#19.→ <<cset 86096af34b58>>
-
reporter Fixed xlsx status update messages. Refs
#19.→ <<cset 40ac637609a4>>
-
reporter Experimenting with less AJAX was successful and the main task has been completed so now tidying up. Refs
#19.→ <<cset 3b2aa58ae64f>>
-
reporter - changed status to resolved
Celery has now been implemented successfully for all existing exports. Some work remains. Fixes
#19.→ <<cset d379da015e71>>
-
reporter - changed milestone to 0.4.3
-
reporter Added the new Exports table and Size_upload table to the admin interface. Refs
#19and#21.→ <<cset 8ee2510a7b7d>>
- Log in to comment
As a workaround for my gunicorn/nginx setup, I have increased the gunicorn timeout to 15 minutes by editing
/var/conquest/openrem/bin/gunicorn_start
:And I have used the same value in the nginx definition
/etc/nginx/sites-available/openremTCP