Optimise queries, particularly exports and detail display of fluoro.
Start by adding https://django-debug-toolbar.readthedocs.io for developer use.
Comments (57)
-
reporter -
reporter Configured to work with the debug toolbar. However not sure if change to STATICFILES_DIRS will have consequence for existing installations? Refs
#484→ <<cset 1ea5a7aee61e>>
-
reporter Fleshed out the testing docs a bit (not tested!) and added section on enabling the debug toolbar. Refs
#484→ <<cset c8e5f258868b>>
-
reporter Adding django-debug-toolbar to requirements refs
#484→ <<cset 013a9f927566>>
-
reporter Configured to work with the debug toolbar. However not sure if change to STATICFILES_DIRS will have consequence for existing installations? Refs
#484→ <<cset 1ea5a7aee61e>>
-
reporter Fleshed out the testing docs a bit (not tested!) and added section on enabling the debug toolbar. Refs
#484→ <<cset c8e5f258868b>>
-
reporter Factored out get_series_data. Now writing all_data and series data based on xlsx, but better :-) Refs
#488,#484→ <<cset 534e7ea83266>>
-
reporter Removed commented out code. Modified xlsx as per dx_export. Possibly half the number of queries. Refs
#488,#484→ <<cset 5f869a84218c>>
-
reporter Unused imports and removing string_to_float calls Refs
#488,#484→ <<cset 8d258149b93c>>
-
reporter Set csv to use the export code as xlsx for DX. Factored out series header generation. Refs
#488,#484→ <<cset f134057482fb>>
-
reporter Writing series data during group sorting. Refs
#488,#484→ <<cset 8ec782435afe>>
-
reporter Removed functions no longer used by xlsx. Will break csv export for now. Refs
#488,#484→ <<cset 5f010a331427>>
-
reporter Factored out get_xray_filterinfo from dx and rf exports. Refs
#488,#484→ <<cset 8e7e6b0ff689>>
-
reporter Brought csv up to date with xlsx for RF. Refs
#488,#484→ <<cset 64df88feb4e3>>
-
reporter Added ref
#488,#484to changes.→ <<cset 66b0cb20c64d>>
-
reporter Merged in Issue488MostExportsDontIncludeStudyTime (pull request #125)
Issue488MostExportsDontIncludeStudyTime
→ <<cset e1dc2ae8c2a8>>
-
reporter Using with in template to reduce queries to the database. Refs
#484→ <<cset 374866417823>>
-
reporter Making good progress with with. Much more to do. Refs
#484→ <<cset 21dbe3157ce6>>
-
reporter Completely unscientific single request for RF detail view of study with 316 events, in debug mode (so slower, but I can see the number of database queries and CPU time:
branch No. queries time taken (ms) CPU time (ms) develop 7,597 15,231 414,500 issue484... 4,109 2,821 53,436 -
reporter Replaced some template lookups with same lookup done in view. Refs
#484→ <<cset aa6fb7bab19b>>
-
reporter More withs Refs
#484→ <<cset af77a84fe2e1>>
-
reporter Added in some select_related. Refs
#484316 event RF now:branch No. queries time taken (ms) CPU time (ms) issue484optimisefluoro 2,527 2,151 38,116 → <<cset a128133af8ac>>
-
reporter Some
with
optimisations on the rf filter page. Refs#484→ <<cset b76852567549>>
-
reporter Comparison for listing of 4 study RF filtered list:
branch No. queries time taken (ms) CPU time (ms) develop 76 52 836 issue484optimisefluoro 32 25 364 -
reporter Attempting the same with CT detail page - 9 series CT down from 121 database hits to 41 :-) Refs
#484→ <<cset f4b6e32b4d75>>
-
reporter CT filtered page with seven studies: 71 database hits down to 36. Refs
#484→ <<cset 48730656b814>>
-
reporter MG filtered page with four studies: 55 database hits down to 39. Refs
#484→ <<cset 0adc63f387be>>
-
These improvements look great - they should result in a significant increase in the system responsiveness.
-
reporter Exciting isn't it! Well, I'm finding it exciting!
I think they could be optimised further if I understood django better, but I'm going for any low hanging fruit I can find and it is making a significant difference I reckon. I tried to improve the table that summarises the dose and times for the fluoro detail page for example, but I got stuck so reverted.
-
reporter MG detail page with two exposures: 48 database hits down to 26. Refs
#484→ <<cset 18454e7e2a4c>>
-
reporter DX filtered page with five studies: 77 database hits down to 34. Refs
#484→ <<cset af5475262675>>
-
reporter DX detail page with five events: 120 database hits down to 48. Refs
#484→ <<cset 125129bdc1e8>>
-
reporter Merged in issue484optimisefluoro (pull request #136)
Issue484optimisefluoro
Refs
#484Stats from two identical servers with the same PostgreSQL database, both running builtin webserver.
Times are wait times in seconds, recorded by the browser, and are the mean of three refreshes.
Page loaded v0.7.4 issue484 branch CT Filtered, 150,962 studies, first page of 6,039 6.51 3.39 CT Detail, study with four events 1.26 1.16 Fluoro Filtered, 1,335 studies, first page of 67 8.31 4.40 Fluoro Detail, study with 146 events 55.3 30.6 Mammo Filtered, 66,051 studies, first page of 1,322 17.4 12.2 Mammo Detail, study with four events 2.02 1.28 DX Filtered, 32,462 studies, first page of 650 11.24 4.37 DX Detail, study with 14 events 8.25 4.17 DX Detail, study with one event 1.44 1.05 → <<cset 3885a37f4b61>>
-
reporter Improving, down to 40 from 94. refs
#484→ <<cset 194d4247c3b9>>
-
reporter I'm confused. I've exhausted all query optimisations I can think of, and now have 58 queries for my current dev home page. But the last commit was 40... Also, accidentally did a code reformat, which is unfortunate as it makes comparisons hard. refs
#484→ <<cset 61bfd336f98f>>
-
reporter Back down to 39 (from 94 to start with, 58 at last commit). refs
#484→ <<cset 5b702ddff3de>>
-
reporter Update on progress (or lack of) for home page rendition. Unfortunately, despite reducing the number of queries from 94 down to 39 on my development environment laptop, when testing with a real (large) database the version with fewer queries takes longer :-(
Using the database used for the tests of the filtered and detail pages above, I get the following approximate stats. The first three are from the Django Debug Toolbar, the last from Chrome's tools.
Branch Number of queries SQL time (ms) CPU time (ms) Total? (ms) Chrome Wait (s) develop 151 ~ 6400 ~ 2000 ~ 8000 ~ 8 484 branch 58 ~ 5500 ~ 7000 ~ 11,500 ~ 11.7 I have been experimenting with adding indexes, and whilst I can shave a second or two off the 484 branch time, I can't get down to the develop times (which are the same as the 0.7.4 times). It seems that in this case doing more queries is cheaper than doing fewer!
SQL explained Executed SQL SELECT ••• FROM "remapp_generalstudymoduleattr" INNER JOIN "remapp_generalequipmentmoduleattr" ON ( "remapp_generalstudymoduleattr"."id" = "remapp_generalequipmentmoduleattr"."general_study_module_attributes_id" ) INNER JOIN "remapp_uniqueequipmentnames" ON ( "remapp_generalequipmentmoduleattr"."unique_equipment_name_id" = "remapp_uniqueequipmentnames"."id" ) WHERE ("remapp_generalstudymoduleattr"."modality_type" = 'MG' AND "remapp_uniqueequipmentnames"."display_name" = 'RMH Sutton RDAC 3') ORDER BY "remapp_generalstudymoduleattr"."study_date" DESC, "remapp_generalstudymoduleattr"."study_time" DESC Time 165.947914124 ms Database default QUERY PLAN Sort (cost=14567.08..14572.80 rows=2287 width=360) (actual time=160.022..160.716 rows=11805 loops=1) Sort Key: remapp_generalstudymoduleattr.study_date DESC, remapp_generalstudymoduleattr.study_time DESC Sort Method: quicksort Memory: 3468kB -> Nested Loop (cost=25.12..14439.47 rows=2287 width=360) (actual time=0.354..145.173 rows=11805 loops=1) -> Hash Join (cost=24.70..9438.28 rows=8780 width=4) (actual time=0.327..100.635 rows=11805 loops=1) Hash Cond: (remapp_generalequipmentmoduleattr.unique_equipment_name_id = remapp_uniqueequipmentnames.id) -> Seq Scan on remapp_generalequipmentmoduleattr (cost=0.00..8384.11 rows=251111 width=8) (actual time=0.027..39.684 rows=251111 loops=1) -> Hash (cost=24.57..24.57 rows=10 width=4) (actual time=0.265..0.265 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on remapp_uniqueequipmentnames (cost=0.00..24.57 rows=10 width=4) (actual time=0.034..0.256 rows=10 loops=1) Filter: (display_name = 'RMH Sutton RDAC 3'::text) Rows Removed by Filter: 276 -> Index Scan using remapp_general_study_module_attributes_pkey on remapp_generalstudymoduleattr (cost=0.42..0.56 rows=1 width=360) (actual time=0.003..0.003 rows=1 loops=11805) Index Cond: (id = remapp_generalequipmentmoduleattr.general_study_module_attributes_id) Filter: ((modality_type)::text = 'MG'::text) Planning time: 2.037 ms Execution time: 161.459 ms
There are obviously some database optimisations to be made, and whilst my knowledge of the database queries is improving, so far I haven't been able to get over the fact that we need to the hash join between the GeneralEquipmentModuleAttr table and the UniqueEquipmentNames table, and then do a sequential scan on the result. With 11,805 rows and 251,111 rows in this example for the join and the scan respectively, that takes 100 ms and 40 ms to plan and execute. For one query of 58, many of which are very similar to this one.
If anyone understands this stuff better than I do, please shout!
-
reporter Starting again from the faster dev version. Reduce queries regarding group existence. 147 down to 142. Refs
#484→ <<cset a5111cf9aae4>>
-
reporter Removed check for non-admin groups. 142 fown to 131. Refs
#484→ <<cset 574d238619e6>>
-
reporter Copying the other 484 branch for the admin questions. Refs
#484→ <<cset b40c97338b8e>>
-
reporter Made start on reducing queries to the database on export. Refs
#484→ <<cset 71995afd6020>>
-
reporter Added pid group information to latest completed template. Changed hiding of pid exports to adding message about privileges in place of download link. PyCharm code reformat. Refs
#484→ <<cset ba85cc87790d>>
-
reporter Reorganised the ct source parameters section Refs
#484→ <<cset b79d477e31aa>>
-
reporter Quick export test with latest code.
Branch Number of CT studies Export time 0.7.4 41, variety of Philips and Siemens 4 minutes and 10 seconds develop 41, same studies as above 1 minute and 39 seconds issue484optimiseexports 41, same studies as above 49.5 seconds Wonderful :-)
-
reporter Reduced queries for DX exports Refs
#484→ <<cset 04b59f8d0bb9>>
-
reporter Again for DX.
Branch Number of DX studies Export time 0.7.4 59, various 4 minutes and 14 seconds develop 59, as above 2 minutes and 3 seconds issue484optimiseexports 59, as above 1 minute and 10 seconds -
reporter Reduced queries for MG exports Refs
#484→ <<cset 86bd9ff7525a>>
-
reporter Reduced queries for RF exports. Changed order of similar event filtering. Refs
#484→ <<cset fe85fb085cfe>>
-
reporter Passing filter details to series from group sorting to reduce DB queries. Refs
#484→ <<cset 1f01dd81864c>>
-
reporter Adding information regarding progress within a study to report to the interface. Refs
#484→ <<cset a50ebcb0bb9b>>
-
reporter Abbreviating the message slightly Refs
#484→ <<cset 18fb9cbaf115>>
-
reporter Again for RF:
Branch Format Number of studies Export time 0.7.4 XLSX 7, various 11 minutes and 47 seconds develop XLSX 7, as above 9 minutes and 28 seconds issue484optimiseexports XLSX 7, as above 4 minutes and 12 seconds 0.7.4 CSV 56, various 48.3 seconds develop CSV 56, as above 42.2 seconds issue484optimiseexports CSV 56, as above 29.9 seconds -
reporter And finally some Mammo stats:
Branch Format Number of studies Export time 0.7.4 CSV 94 10 minutes and 6 seconds develop CSV 94, as above 5 minutes and 38 seconds issue484optimiseexports CSV 94, as above 2 minutes and 48 seconds issue484optimiseexports XLSX 94, as above 2 minutes and 57 seconds -
reporter Removed some unused imports. Updated changes and added to release notes. Refs
#484→ <<cset 1837fc1662e4>>
-
reporter Removing defunct private function. Refs
#484→ <<cset 1880ea2bee59>>
-
reporter - changed status to resolved
-
reporter Issue
#555was marked as a duplicate of this issue. - Log in to comment
Adding django-debug-toolbar to requirements refs
#484→ <<cset 013a9f927566>>