Separate study and series level queries?

Issue #472 resolved
Tim de Wit created an issue

Would it make sense to separate study and series level queries (in qrscu.py) to achieve better QR-performance? Suggestion:

study-level QR
study-level filtering on studydesc, stationname, modality (e.g. ignore RF/XA without SR)
series-level QR
series-level filtering on stationname, modality (e.g. delete all series except SR for FL)

Comments (139)

  1. Ed McDonagh

    Modality isn't a study level attribute. PACS systems can choose to populate and return 'ModalitiesInStudy', which is an optional study level tag.

    So as it stands, if modalities in study is not returned in the study level query, it is generated by the code at the series level query.

    I guess there might be scope to not move onto the series level query if we can satisfy ourselves at the study level that we are not interested in the study?

  2. Tim de Wit reporter

    Do you know a good reference, listing the tags at series/study level, specifying whether they are optional or not?

    Perhaps make this configurable? A local PACS system either does or does not support ModalitiesInStudy, but won't change it's behavior for each query I guess?

  3. Ed McDonagh

    https://www.medicalconnections.co.uk/kb/Query_Parameters is very useful in this regard.

    One of the other tags I have just noticed that is also optional but would be very useful if it is returned would be SOP Classes in Study - that would help massively with distinguishing general SR from Radiation Dose SR. Although GE's insistence on using Enhanced SR instead of Radiation Dose SR would still need to be managed.

    As for making it configurable, I'd be nervous of that as whilst it is true you wouldn't expect it to change, it isn't something that most users would know about. Potentially you could have it as an advanced option, if there were a real advantage to setting it one way or another.

  4. Tim de Wit reporter

    What about the above proposal, with the addition that studies with empty ModalitiesInStudy won't be filtered out? Same with possible filters on SOPClassUID's, etc. If present in response, perform filtering, if not then skip the filter. Study-level filtering should then be moved from the qrscu() function to _query_study(), which is basically what you said in your first response.

  5. Tim de Wit reporter

    Can i branch from issue468_exclude_stationnames_v2 as there's some overlap, or do you prefer a clean branch from develop?

  6. Ed McDonagh

    Yes, that would be fine as there is overlap, as you said. I prefer to get PRs into new branches so I can play with them before merging into develop. Annoyingly as the recipient of the PR I can't change the target branch!

    Although at the end of the day I can pull from your fork to have a look at the code so it's not too important.

    What I do like to avoid though is lots of changes in the same PR, particularly if there are functional changes (like you are making) wrapped up with non-function changing edits to improve layout or format - that makes it really difficult to work out what is going on!

  7. Tim de Wit reporter

    Before writing code, I think its best to agree on the approach. Please have a look at the following pseudo-code (first quick draft, needs improvement and more detail):

    loop over requested modalities
    for each modality perform:
    
       C-FIND STUDY
         StudyInstanceUID
         StudyDescription
         StationName       (optional)
         ModalitiesInStudy (optional, even if returned possibly not filterable)
         SOPClassesInStudy (optional)
    
       If SOPClassesInStudy Null in any of the responses set SOPClassesInStudy_whitelist=False
       If modality not in ModalitiesInStudy set modality_matching=False (not filterable). Exit loop.
    
    Checking for duplicate studies (vs DB)
    
    Study_filter:
      whitelist for StudyDesc (only allow specific terms; delete all others)
      whitelist for SOPClassesInStudy (apply only when SOPClassesInStudy_whitelist is True: only allow specific uids; delete all others)
      blacklist for StudyDesc (disallow specific terms)
      blacklist for StationName (apply only when study.station_name is not None: disallow specific terms, otherwise skip filter)
      blacklist for SOPClassesInStudy (apply only when study.sop_classes_in_study is not None: disallow specific uids, otherwise skip filter)
    
    C-FIND SERIES for remaining studies
      SeriesInstanceUID
      SeriesDescription
      StationName                    (optional)
      Modality
      NumberOfSeriesRelatedInstances (optional)
    
      if modality_matching=False build set_modalities_in_study from series_rsp
    
    Series filter:
      blacklist for StationName (apply only when series.station_name is not None: disallow specific terms, otherwise skip filter)
      whitelist for Modality
    
    C-FIND INSTANCE for selected series
    Delete these series from series_rsp
    Delete unwanted instances from instances_rsp
    
    use movescu at series level for remaining series
    use movescu at instance level for remaining instances
    
  8. Ed McDonagh

    Great start. I think I'll suspend my modifications of qrscu.py as this looks like a significant rewrite of the logic and what I have done might be a waste of effort! You might like to base any changes on issue473testqueries branch though rather than issue468_exclude_stationnames_v2.

    Commenting on the draft above:

    1. If ModalitiesInStudy is not returned populated, then splitting off the series query till later means you can't establish if the responses include modalities other than the one you asked for, indicating that there is no point looping over to the next modality because you will already have the data. However, if ModatiesInStudy is not populated, then we can presumably assume that no modality filtering has taken place, and therefore exit the query loop. We can then continue, and whitelist on Modality at a series level. Getting extra series information shouldn't be a big issue in most circumstances. Or we could blindly carry on asking for each modality on the basis that most PACS will probably not get to this point, and if they do, an extra query isn't too heavy. We'd just need to add a check that we only had one copy of each study level response, probably just before doing the duplicate study check vs the db.
    2. I think we should be able to whitelist against StationName as well as blacklist. I can see usecases for that.
    3. I wonder if you might get SOPClassUID at a series level? It might only be provided at an image level, but worth an experiment as it would be very useful for situations where studies have Enhanced SR and or Radiation Dose SR series (typically containing the radiologist's report). Otherwise we can get these at Instance level.
    4. Again, I'd like to be able to whitelist StationName at series level.
    5. I like that you have an instance level query - this would only be necessary in certain circumstances, but could be very useful.

    i hope that might be useful feedback? If you go ahead and write this code, please try and think about how you might test it, and preferably write the tests too!

    Have you done any testing? If not, you can simply run python manage.py test remapp and Django will run through all the tests in openrem/remapp/tests/ and tell you if any fail.

    Thanks for your contributions Tim!

  9. Tim de Wit reporter
    1. "If modality not in ModalitiesInStudy" already deals with ModalitiesInStudy not being populated I guess? So the loop is indeed exited. But why skip the study_filter in that case? This might stil save on a bunch of series-level queries, since StudyDesc, StationName and SOPClassesInStudy could stil be present.
      Also, doesn't the QuerySet already contain unique entries (check perhaps not needed)?
    2. agreed
    3. as unlikely as being supported on study level but we could certainly include it
    4. agreed

    I don't have any experience with unit testing but I'll have a look at it; sounds useful.

  10. Tim de Wit reporter

    (3) According to section C.6, part 4 of the dicom standard, SOPClassUID is only supported at Instance level apparently (which makes sense). SOPClassesinSeries doesn't seem to exist, so I wonder if it might be better to skip this one at series level?

    Some additional logic could be:

    If study contains SR:
       perform instance level query to check whether they are dose reports
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:         only keep seriesdesc="dose info" or number_of_series_related_instances<=5
           if modality = fl:          delete study
       If so:
           remove all non-SR series
    

    The most logical place to put this, would be after the series_filter, I guess.

  11. Ed McDonagh
    1. Oops, missed that. I didn't mean to suggest skipping the study_filter.
    2. Lovely
    3. I agree that there is no field SOPClassesInSeries - I checked when I was writing the comment above earlier. Just though it might worth experimenting with the PACS available to us to make sure, as I would expect to only find one SOP Class UID per series? Maybe not? Thinking about it, if it isn't returned at series level we'll have to do instance level queries to establish which SOP Class UID belongs to which objects anyway!
    4. Hoorah.

    Unit testing is awesome once you get into it! Problem is having to write them all in the first place 😄

  12. Ed McDonagh

    I forgot to comment on the additional logic. Generally looks good, some suggested modifications:

    If study contains SR:
       perform instance level query to check whether they are dose reports
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:         only keep seriesdesc="dose info" or (number_of_series_related_instances<=5 and no series descriptions returned), keep ESR if GE and no RDSR
           if modality = fl:          delete study
       If so:
           remove all non-RDSR
    
  13. Tim de Wit reporter

    New version with suggested modifications and also:
    - removed SOPClassesInStudy_whitelist flag, since checking for each study is sufficient
    - removed instance level movescu

    loop over requested modalities
    for each modality perform:
    
       C-FIND STUDY
         StudyInstanceUID
         StudyDescription
         StationName       (optional)
         ModalitiesInStudy (optional, even if returned possibly not filterable)
         SOPClassesInStudy (optional)
    
       If modality not in ModalitiesInStudy set modality_matching=False (not filterable). Exit loop.
    
    Checking for duplicate studies (vs DB)
    
    Study_filter:
      whitelist for StudyDesc (only allow specific terms; delete all others)
      whitelist for StationName (apply only when study.station_name is not None: only allow specific terms, otherwise skip filter)
      whitelist for SOPClassesInStudy (apply only when study.SOPClassesInStudy is not None: only allow specific uids; delete all others)
      blacklist for StudyDesc (disallow specific terms)
      blacklist for StationName (apply only when study.station_name is not None: disallow specific terms, otherwise skip filter)
      blacklist for SOPClassesInStudy (apply only when study.sop_classes_in_study is not None: disallow specific uids, otherwise skip filter)
    
    C-FIND SERIES for remaining studies
      SeriesInstanceUID
      SeriesDescription
      StationName                    (optional)
      Modality
      NumberOfSeriesRelatedInstances (optional)
    
      if modality_matching=False build set_modalities_in_study from series_rsp
    
    Series filter:
      whitelist for Modality
      whitelist for StationName (apply only when series.station_name is not None: only allow specific terms, otherwise skip filter)
      blacklist for StationName (apply only when series.station_name is not None: disallow specific terms, otherwise skip filter)
    
    Loop over all studies
    If study contains SR:
       perform C-FIND IMAGE to check whether they are dose reports
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:     only keep seriesdesc="dose info" or (number_of_series_related_instances<=5 and no series descriptions returned), keep ESR if GE and no RDSR
           if modality = fl:     delete study
       If so:
           remove all non-RDSR
    
    use movescu at series level for remaining series
    
  14. Tim de Wit reporter

    One thing I still don't understand: RDSR and ESR both are of modality=SR, but have different SOPClassUID, right? If so, why "keep ESR if GE and no RDSR"? This would already be checked for at step:
    "perform C-FIND IMAGE to check whether they are dose reports"

  15. Ed McDonagh

    I haven't reviewed the modified logic as I'm just looking at this on my phone. But the ESR thing is because GE tend to use the ESR SOPClassUID with the RDSR template. But you won't know that till you get the instance and start integrating the content.

  16. Tim de Wit reporter
    loop over requested modalities
    for each modality perform:
    
       C-FIND STUDY
         StudyInstanceUID
         StudyDescription
         StationName       (optional)
         ModalitiesInStudy (optional, even if returned possibly not filterable)
         SOPClassesInStudy (optional)
    
       If modality not in ModalitiesInStudy set modality_matching=False (not filterable). Exit loop.
    
    Checking for duplicate studies (vs DB)
    
    Study_filter:
      whitelist for StudyDesc (only allow specific terms; delete all others)
      whitelist for StationName (apply only when study.station_name is not None: only allow specific terms, otherwise skip filter)
      whitelist for SOPClassesInStudy (apply only when study.SOPClassesInStudy is not None: only allow specific uids; delete all others)
      blacklist for StudyDesc (disallow specific terms)
      blacklist for StationName (apply only when study.station_name is not None: disallow specific terms, otherwise skip filter)
      blacklist for SOPClassesInStudy (apply only when study.sop_classes_in_study is not None: disallow specific uids, otherwise skip filter)
    
    C-FIND SERIES for remaining studies
      SeriesInstanceUID
      SeriesDescription
      StationName                    (optional)
      Modality
      NumberOfSeriesRelatedInstances (optional)
    
      if modality_matching=False build set_modalities_in_study from series_rsp
    
    Series filter:
      whitelist for Modality
      whitelist for StationName (apply only when series.station_name is not None: only allow specific terms, otherwise skip filter)
      blacklist for StationName (apply only when series.station_name is not None: disallow specific terms, otherwise skip filter)
    
    Loop over all studies
    If study contains SR:
       perform C-FIND IMAGE to check whether they are dose reports (SOPClassUID=RDSR)
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:
              if series_descriptions not empty:
                  if 'dose info' in series_descriptions:
                       delete all other series
                  elif SOPClassUID of SR = ESR:
                       delete all other series
                  else:
                       delete study
              else:
                   remove all series with number_of_series_related_instances>5
           if modality = fl:     delete study
       If so:
           remove all non-RDSR
    
    use movescu at series level for remaining series
    
  17. Tim de Wit reporter

    As for the sr_inc option (perhaps rename to sr_only?), perhaps that part of code could be placed at the very beginning? Since it would be incompatible with modality options, and we're only interested in SR series, why not immediately perform a C-FIND at series level with Modality=SR? Modality is a required key at series level, so this would prevent unnecessary retrieves in case ModalitiesInStudy is not supported by PACS.

  18. Ed McDonagh

    I understand that StudyInstanceUID is a required key for the SCU to provide when performing a Series level C-Find.

  19. Tim de Wit reporter

    Here's the pseudocode, also including the sr_only option (not compatible with ct/mg/fl/dx).
    Suggestion: add SR to the all_mods dictionary.

    loop over requested modalities (if sr_only then only all_mods['SR']['inc'] = True)
    for each modality perform:
    
       C-FIND STUDY
         StudyInstanceUID
         StudyDescription
         StationName       (optional)
         ModalitiesInStudy (optional, even if returned possibly not filterable)
         SOPClassesInStudy (optional)
    
       If modality not in ModalitiesInStudy set modality_matching=False (not filterable). Exit loop.
    
    If sr_only and modality_matching: remove studies where ModalitiesInStudy != ['SR']
    
    Checking for duplicate studies (vs DB)
    
    Study_filter:
      whitelist for StudyDesc (only allow specific terms; delete all others)
      whitelist for StationName (apply only when study.station_name is not None: only allow specific terms, otherwise skip filter)
      whitelist for SOPClassesInStudy (apply only when study.SOPClassesInStudy is not None: only allow specific uids; delete all others)
      blacklist for StudyDesc (disallow specific terms)
      blacklist for StationName (apply only when study.station_name is not None: disallow specific terms, otherwise skip filter)
      blacklist for SOPClassesInStudy (apply only when study.sop_classes_in_study is not None: disallow specific uids, otherwise skip filter)
    
    C-FIND SERIES for remaining studies
      SeriesInstanceUID
      SeriesDescription
      StationName                    (optional)
      Modality                       (=SR if sr_only)
      NumberOfSeriesRelatedInstances (optional)
    
      if modality_matching=False build set_modalities_in_study from series_rsp
    
    Series filter:
      whitelist for Modality
      whitelist for StationName (apply only when series.station_name is not None: only allow specific terms, otherwise skip filter)
      blacklist for StationName (apply only when series.station_name is not None: disallow specific terms, otherwise skip filter)
    
    Loop over all studies
    If study contains SR:
       perform C-FIND IMAGE to check whether they are dose reports (SOPClassUID=RDSR)
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:
              if series_descriptions not empty:
                if 'dose info' in series_descriptions:
                   delete all other series
                elif SOPClassUID of SR = ESR:
                   delete all other series
                else
                   delete study
              else remove all series with number_of_series_related_instances>5
           if modality = fl:     delete study
           if modality = sr:     delete study
       If so:
           if not sr_only remove all non-RDSR
    
    use movescu at series level for remaining series
    
  20. Ed McDonagh

    The sr_only option should only need to consider cases where ModalitiesInStudy (returned or generated) is only SR. Otherwise the SRs will be caught by the normal methods.

    Can you remind me what ModalitiesInStudy (optional, even if returned possibly not filterable) is referring to? If it is returned (populated), why couldn't you use it?

  21. Tim de Wit reporter

    It's referring to ModalitiesInStudy being queried. When returned populated it's indeed used...

    Perhaps add an additional check before the line "Checking for duplicate studies (vs DB)"?

    loop over all studies:
    If sr_only and modality_matching and ModalitiesInStudy != ['SR'] delete study

  22. Tim de Wit reporter

    Btw I just came accross the following issue (at least at my end):

    rsp.set_modalities_in_study(ss[1].ModalitiesInStudy.split(','))

    This line will fail when ModalitiesInStudy consists of multiple modalities, since the returned object will be of type dicom.multival.MultiValue instead of str, for which split() will raise an exception.

    Could you check if this is happening for you as well?

  23. Ed McDonagh

    Interesting. You are right, I would expect that to be of type MultiValue. But the fact I have that code there suggests I must have developed it against a system that returns a comma separated string rather than a backslash separated MultiValue. I assume.

    But because there is a wide open Except, you wouldn't normally notice as it would just go and generate its own one assuming one wasn't provided. I guess that's why you aren't supposed to have catch all Except statements!

  24. Tim de Wit reporter

    The exception shouldn't be necessary I guess... alternative suggestion:

    if isinstance(ss[1].ModalitiesInStudy, str):   # if single modality, then type = string ('XA')
        rsp.set_modalities_in_study(ss[1].ModalitiesInStudy.split(','))
    else:   # if multiple modalities, type = MultiValue (['XA', 'RF'])
        rsp.set_modalities_in_study(ss[1].ModalitiesInStudy)
    

    This covers the situation as well where multiple modalities are returned as a comma separated string.

  25. Ed McDonagh

    We'd need to refer to the particular error the split generated for the except.

    And we need to work out why I've stored it as json in the database model...

  26. Tim de Wit reporter
    rsp.set_modalities_in_study(ss[1].ModalitiesInStudy.split(','))
    AttributeError: 'MultiValue' object has no attribute 'split'
    

    I guess a json charfield can easily be mapped onto a python list on retrieval from the DB? Makes usage pretty easy.

  27. Tim de Wit reporter

    Added line for deleting study for XA/RF when no SR present, based on (populated) ModalitiesInStudy
    Added line to filter out unneeded studies when modality_matching=False, after C-FIND SERIES
    Implementation is almost ready btw...

    loop over requested modalities (if sr_only then only all_mods['SR']['inc'] = True)
    for each modality perform:
    
       C-FIND STUDY
         StudyInstanceUID
         StudyDescription
         StationName       (optional)
         ModalitiesInStudy (optional, even if returned possibly not filterable)
         SOPClassesInStudy (optional)
    
       If modality not in ModalitiesInStudy set modality_matching=False (not filterable). Exit loop.
    
    if modality_matching:
        If sr_only and ModalitiesInStudy != ['SR'] : delete study
        If ('RF' or 'XA') in ModalitiesInStudy and 'SR' not in ModalitiesInStudy : delete study
    
    Checking for duplicate studies (vs DB)
    
    Study_filter:
      whitelist for StudyDesc (only allow specific terms; delete all others)
      whitelist for StationName (apply only when study.station_name is not None: only allow specific terms, otherwise skip filter)
      whitelist for SOPClassesInStudy (apply only when study.SOPClassesInStudy is not None: only allow specific uids; delete all others)
      blacklist for StudyDesc (disallow specific terms)
      blacklist for StationName (apply only when study.station_name is not None: disallow specific terms, otherwise skip filter)
      blacklist for SOPClassesInStudy (apply only when study.sop_classes_in_study is not None: disallow specific uids, otherwise skip filter)
    
    C-FIND SERIES for remaining studies
      SeriesInstanceUID
      SeriesDescription
      StationName                    (optional)
      Modality                       (=SR if sr_only)
      NumberOfSeriesRelatedInstances (optional)
    
      if modality_matching=False build modalities_in_study from series_rsp
    
    if modality_matching=False: filter out unneeded studies, based on modalities_in_study
    
    Series filter:
      whitelist for Modality
      whitelist for StationName (apply only when series.station_name is not None: only allow specific terms, otherwise skip filter)
      blacklist for StationName (apply only when series.station_name is not None: disallow specific terms, otherwise skip filter)
    
    Loop over all studies
    If study contains SR:
       perform C-FIND IMAGE to check whether they are dose reports (SOPClassUID=RDSR)
       If not:
           if modality = dx/mg:  remove the SR's from the study and keep the others
           if modality = ct:
              if series_descriptions not empty:
                if 'dose info' in series_descriptions:
                   delete all other series
                elif SOPClassUID of SR = ESR:
                   delete all other series
                else
                   delete study
              else remove all series with number_of_series_related_instances>5
           if modality = fl:     delete study
           if modality = sr:     delete study
       If so:
           if not sr_only remove all non-RDSR
    
    use movescu at series level for remaining series
    
  28. Tim de Wit reporter

    Actually, whitelist might not be the best naming... depending on what we want. Whitelist doesn't mean "only allow items from whitelist; delete the rest", so it might be best to use "include" and "exclude" lists and document it as such?
    include-list: only allow these items; delete the rest
    exclude-list: delete matching items

  29. Tim de Wit reporter

    A whitelist is making sure that items are not excluded; an email whitelist for example. It doesn't mean that all other email-addresses are discarded?

  30. Ed McDonagh

    I see what you mean. You can have a whitelist that is exclusive or not. You might have a whitelist of applications that are allowed on a system - nothing else is allowed, or you might have an email whitelist that indicates you never do bad things to those email addresses.

    I think what we have here is an exclusive whitelist. However, I am comfortable with either naming scheme, I think both are clear.

  31. Ed McDonagh

    Added test with modality matching. Also extended non-modality matching to make response more realistic. All relies on iter and pop working on lists in the same direction everytime - I don't know if this is safe. Refs #472 testing.

    → <<cset 772d316c2722>>

  32. Ed McDonagh

    At last! I think I've worked my way through the code for this. Just need to test it against real PACS systems now...!

  33. Ed McDonagh

    Distinct doesn't work on SQLite3 databases. Now works, but need to add SR image level filtering to _prune_series and enable SR only from web interface. Also errors in web interface don't work. Refs #472

    → <<cset d1f5f759d8bc>>

  34. Ed McDonagh

    Added validation to clear modality selections if inc_sr is selected. Works. Validation error if no modalities are selected and not inc_sr works but error messages are lost. Added default date from of today. Refs #472

    → <<cset ce7d3a0bbbfe>>

  35. Ed McDonagh

    Added SpecificCharacterSet to the C-Find, added decode() to the response. Might fix encoding issue, which means I need to change it everywhere else... Refs #472

    → <<cset 9c0b7cf19d02>>

  36. Ed McDonagh

    Added test with modality matching. Also extended non-modality matching to make response more realistic. All relies on iter and pop working on lists in the same direction everytime - I don't know if this is safe. Refs #472 testing.

    → <<cset 772d316c2722>>

  37. Ed McDonagh

    Distinct doesn't work on SQLite3 databases. Now works, but need to add SR image level filtering to _prune_series and enable SR only from web interface. Also errors in web interface don't work. Refs #472

    → <<cset d1f5f759d8bc>>

  38. Ed McDonagh

    Added validation to clear modality selections if inc_sr is selected. Works. Validation error if no modalities are selected and not inc_sr works but error messages are lost. Added default date from of today. Refs #472

    → <<cset ce7d3a0bbbfe>>

  39. Ed McDonagh

    Added SpecificCharacterSet to the C-Find, added decode() to the response. Might fix encoding issue, which means I need to change it everywhere else... Refs #472

    → <<cset 9c0b7cf19d02>>

  40. Ed McDonagh

    Added explanatory comment to MG, DX and FL series pruning. Removed code to delete 'other' SR types as this will have already been done. FL retains ESR even though we can't currently process them as we will need this for Siemens Arcardis when ref #433 is dealt with. Will help make ref #506 easier to write! Refs #472.

    → <<cset b6255d658431>>

  41. Log in to comment