If two objects from the same study are imported in parallel, they might create two studies
Problem with Celery and multiple workers - for the first two, being imported simultaneously, the other doesn't exist yet, so two studies are created.
A third image from the same study then crashes out of the import due to multiple responses to the query against the study UID.
Comments (19)
-
reporter -
reporter Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs
#262→ <<cset 6d1d0fae4360>>
-
reporter Added logic to check very carefully that the studies aren't created more than once. Just in DX so far. Refs
#262→ <<cset 14c2e8c125d0>>
-
reporter Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs
#262→ <<cset 6013f42680b0>>
-
reporter To find duplicates:
from remapp.models import GeneralStudyModuleAttr from django.db.models import Count GeneralStudyModuleAttr.objects.values('study_instance_uid').annotate(Count('id')).order_by().filter(id__count__gt=1)
-
reporter Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs
#262→ <<cset e777c66756e1>>
-
reporter Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs
#262→ <<cset bacd722f2298>>
-
reporter Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs
#262→ <<cset 193b180cb254>>
-
reporter Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs
#262→ <<cset a89c660e0023>>
-
reporter Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs
#262→ <<cset d990b674e83a>>
-
reporter - changed status to resolved
Fixed by merge c7084d7
-
reporter Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs
#262→ <<cset 6d1d0fae4360>>
-
reporter Added logic to check very carefully that the studies aren't created more than once. Just in DX so far. Refs
#262→ <<cset 14c2e8c125d0>>
-
reporter Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs
#262→ <<cset 6013f42680b0>>
-
reporter Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs
#262→ <<cset e777c66756e1>>
-
reporter Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs
#262→ <<cset bacd722f2298>>
-
reporter Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs
#262→ <<cset 193b180cb254>>
-
reporter Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs
#262→ <<cset a89c660e0023>>
-
reporter Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs
#262→ <<cset d990b674e83a>>
- Log in to comment
Set the store scp task to use a particular queue, added instruction to docs so I remember how to create the queues, added name of default queue to settings - not sure this is required, but the default queue otherwise is called celery. This seems to work, though some imports are failings still on the retrieve. However, it might be unrelated (see ref
#262). Refs#260→ <<cset 2766c79e73fd>>