Home page is slow to populate

Issue #984 resolved
David Platten created an issue

The code that populates each modality table on the home page has a for loop which runs a query for each display name. If we had the study date and time available in a single datetime field then this loop would be unnecessary, and the loop could be replaced with a single query using annotations.

Comments (20)

  1. Ed McDonagh

    I’ve long thought that we probably aren’t doing the right thing regarding time and date fields.

  2. Ed McDonagh

    And I doubt we deal with time zone information properly, though I don’t know how many modalities use the time zone functionality in DICOM.

  3. David Platten reporter

    In models.py we can add this new class which makes a calculated field called study_date_time available in QuerySets from GeneralStudyModuleAttr:

    class GeneralStudyModuleAttrManager(models.Manager):
        def get_queryset(self):
            qs = super(GeneralStudyModuleAttrManager, self).get_queryset().annotate(
                study_date_time=models.ExpressionWrapper(
                    models.F("study_date") + models.F("study_time"),
                    output_field=models.DateTimeField()
                )
            )
            return qs
    

    We also need to add the following line to the end of the GeneralStudyModuleAttr definition in models.py to make the above manager get used:

    objects = GeneralStudyModuleAttrManager()
    

    You can then use study_date_time as a field in GeneralStudyModuleAttr QuerySets, so the for loop in update_latest_studies within views.py can be replaced with this single query:

    test_data_new = studies.values("generalequipmentmoduleattr__unique_equipment_name__display_name").annotate(
        num_studies=Count("pk"),
        latest_study_date_time=Max("study_date_time"),
        timedelta=ExpressionWrapper(now - F("latest_entry_date_time"), output_field=DurationField()),
    ).order_by("-latest_entry_date_time")
    

    test_data_new can be passed to the template, where num_studies, latest_study_date_time and timedelta can all be accessed. What is missing with the method above is the pk of the display name, which is present in the current code, and I assume is needed when displaying study data for the two time deltas, which is added using Ajax. The solution to that is probably to calculate the studies within the time deltas in the above code, if needed, and dispense with the Ajax call.

  4. David Platten reporter

    I’ll check on my slow laptop.

    Just worked out that we don’t need to use the pk of the display name for HTML id: can use the display name itself with the whitespace stripped.

  5. David Platten reporter

    Trying new method of populating home page that avoids looping through each display name. Note: I've removed a chunk of the template file, and don't understand what the bit that I have removed does. Refs issue #984

    → <<cset 8e87d4be75e0>>

  6. David Platten reporter

    Some timings on my laptop (almost 2,000,000 studies in total). Having done this I have no concern about using the new datetime field in GeneralStudyModuleAttr:

    Develop (s) This branch (s)
    Home page (no time deltas) 62 3 (!)
    Home page (inc time deltas) 119 49 (but broken…)

    Looks like I have somehow broken the Ajax updating of studies within the time deltas.

    I will aim to calculate the studies within the time deltas without the Ajax calls.

    Leaving this now for the time being.

  7. David Platten reporter

    Some timings from my laptop (almost 2,000,000 studies) after my changes to remove Ajax calls for workload since delta a and delta b:

    Develop (s) This branch (s)
    Home page (no time deltas) 62 3
    Home page (inc time deltas) 119 3 (no longer broken)

    That’s very satisfying.

  8. David Platten reporter

    Fixed table sorting on home page. I don't know how long this has been broken for, but it doesn't work on demo.openrem.org. A side-effect of this fix is that I no longer need the timedelta annotation. However, I've not deleted it, just commented it out, because it took me a while to work out how to do it, and it may be useful in the future. Refs issue #984

    → <<cset 890f7a11ac54>>

  9. David Platten reporter

    For date_a and date_b comparing with today.date() rather than a datetime so that the results exactly match the develop branch. Refs issue #984

    → <<cset b96aa4f186c3>>

  10. David Platten reporter

    Removed unnecessary Ajax update of total number of studies and number of studies in each modality - this data was already being populated when the home page is first loaded. Also replaced multiple calls to .count() with a single aggregate query. I am definitely done with updating the home page code now. Refs issue #984

    → <<cset 8ee483afe7c1>>

  11. David Platten reporter

    What I’ve learned from doing this is that querying the database in a loop is a bad idea which results in many more hits to the database than necessary. I've also learned that using annotate and aggregate on Django QuerySets are good for reducing the number of database queries required.

    For a visit to the homepage on my laptop (1.9 million studies) the develop branch makes 722 database queries in a total query time of 240.8 seconds. This #984 branch code makes 66 database queries in a total time of 6.6 seconds.

  12. Ed McDonagh

    Timings comparing fresh server installs of the b1 release (without this branch) against b2 release (includes this branch). Both servers have a migrated database of 650,000 studies. Three Shift-Ctrl-R refreshes, timed using the Chrome dev tools Network tab using the “Finish” value at the bottom.

    1.0.0b1 1.0.0b2
    Home page, deltas on, logged in 23.85 s 23.29 s 23.41 s 2.11 s 2.17 s 2.04 s

  13. Log in to comment