If two objects from the same study are imported in parallel, they might create two studies

Issue #262 resolved
Ed McDonagh created an issue

Problem with Celery and multiple workers - for the first two, being imported simultaneously, the other doesn't exist yet, so two studies are created.

A third image from the same study then crashes out of the import due to multiple responses to the query against the study UID.

Comments (19)

  1. Ed McDonagh reporter

    Set the store scp task to use a particular queue, added instruction to docs so I remember how to create the queues, added name of default queue to settings - not sure this is required, but the default queue otherwise is called celery. This seems to work, though some imports are failings still on the retrieve. However, it might be unrelated (see ref #262). Refs #260

    → <<cset 2766c79e73fd>>

  2. Ed McDonagh reporter

    Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs #262

    → <<cset 6d1d0fae4360>>

  3. Ed McDonagh reporter

    Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs #262

    → <<cset 6013f42680b0>>

  4. Ed McDonagh reporter

    To find duplicates:

    from remapp.models import GeneralStudyModuleAttr
    from django.db.models import Count
    
    GeneralStudyModuleAttr.objects.values('study_instance_uid').annotate(Count('id')).order_by().filter(id__count__gt=1)
    
  5. Ed McDonagh reporter

    Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs #262

    → <<cset e777c66756e1>>

  6. Ed McDonagh reporter

    Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs #262

    → <<cset bacd722f2298>>

  7. Ed McDonagh reporter

    Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs #262

    → <<cset 193b180cb254>>

  8. Ed McDonagh reporter

    Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs #262

    → <<cset a89c660e0023>>

  9. Ed McDonagh reporter

    Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs #262

    → <<cset d990b674e83a>>

  10. Ed McDonagh reporter

    Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs #262

    → <<cset 6d1d0fae4360>>

  11. Ed McDonagh reporter

    Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs #262

    → <<cset 6013f42680b0>>

  12. Ed McDonagh reporter

    Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs #262

    → <<cset e777c66756e1>>

  13. Ed McDonagh reporter

    Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs #262

    → <<cset bacd722f2298>>

  14. Ed McDonagh reporter

    Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs #262

    → <<cset 193b180cb254>>

  15. Ed McDonagh reporter

    Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs #262

    → <<cset a89c660e0023>>

  16. Ed McDonagh reporter

    Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs #262

    → <<cset d990b674e83a>>

  17. Log in to comment