If two objects from the same study are imported in parallel, they might create two studies

Issue #262 resolved

Ed McDonagh created an issue 2015-08-24

Problem with Celery and multiple workers - for the first two, being imported simultaneously, the other doesn't exist yet, so two studies are created.

A third image from the same study then crashes out of the import due to multiple responses to the query against the study UID.

Comments (19)

Ed McDonagh reporter
Set the store scp task to use a particular queue, added instruction to docs so I remember how to create the queues, added name of default queue to settings - not sure this is required, but the default queue otherwise is called celery. This seems to work, though some imports are failings still on the retrieve. However, it might be unrelated (see ref ~~#262~~). Refs ~~#260~~

→ <<cset 2766c79e73fd>>
- 2015-08-24T15:23:18+00:00
Ed McDonagh reporter
Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs ~~#262~~

→ <<cset 6d1d0fae4360>>
- 2015-08-27T12:35:55+00:00
Ed McDonagh reporter
Added logic to check very carefully that the studies aren't created more than once. Just in DX so far. Refs ~~#262~~

→ <<cset 14c2e8c125d0>>
- 2015-08-27T21:25:43+00:00
Ed McDonagh reporter
Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs ~~#262~~

→ <<cset 6013f42680b0>>
- 2015-08-27T22:04:57+00:00

Ed McDonagh reporter

To find duplicates:

from remapp.models import GeneralStudyModuleAttr
from django.db.models import Count

GeneralStudyModuleAttr.objects.values('study_instance_uid').annotate(Count('id')).order_by().filter(id__count__gt=1)

2015-08-29T21:52:14+00:00

Ed McDonagh reporter
Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs ~~#262~~

→ <<cset e777c66756e1>>
- 2015-08-29T22:00:21+00:00
Ed McDonagh reporter
Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs ~~#262~~

→ <<cset bacd722f2298>>
- 2015-08-30T21:26:02+00:00
Ed McDonagh reporter
Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs ~~#262~~

→ <<cset 193b180cb254>>
- 2015-08-31T21:24:09+00:00
Ed McDonagh reporter
Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs ~~#262~~

→ <<cset a89c660e0023>>
- 2015-08-31T21:24:09+00:00
Ed McDonagh reporter
Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs ~~#262~~

→ <<cset d990b674e83a>>
- 2015-09-01T20:55:06+00:00
Ed McDonagh reporter
- changed status to resolved
Fixed by merge c7084d7
- 2015-09-02T11:01:27+00:00
Ed McDonagh reporter
Instead of returning 1 if uid exists, it now returns the number of responses. This shouldn't break exisiting uses, but will allow us to check for duplicates. Refs ~~#262~~

→ <<cset 6d1d0fae4360>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Added logic to check very carefully that the studies aren't created more than once. Just in DX so far. Refs ~~#262~~

→ <<cset 14c2e8c125d0>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Implemented the duplicate study uid check changes to mammo and tested. Didn't quite work. I have a GeneralStudyModuleAttr object with just the Study UID field compeleted. Refs ~~#262~~

→ <<cset 6013f42680b0>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Attempt to get around the fact that study_instance_uid should have been a unique field in the database from the start. Tested, results not yet conclusive. Refs ~~#262~~

→ <<cset e777c66756e1>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Switched print statements for logging to help find why some objects aren't being imported. Statements only printed in celery log if celery is set to -l info. Need to repeat in mg.py and the other extractors. Refs ~~#262~~

→ <<cset bacd722f2298>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Converted all the print statements to logging.info to reduce the chattiness in the celery log file if logging set to warning. Messages could be more informative, but taking them out makes it easier to see where the extractor import errors are occuring, and so refs ~~#262~~

→ <<cset 193b180cb254>>
- 2015-09-02T11:04:20+00:00
Ed McDonagh reporter
Added in a lot of messages at logging.debug level to try and see where the errors are occuring. Added try/except to the 'for processing' 'for presentation' duplicate check so that a/ we can see where the error is occuring and b/ allow the import to proceed. This works, but I haven't yet worked out what the error is caused by. Refs ~~#262~~

→ <<cset a89c660e0023>>
- 2015-09-02T11:04:21+00:00
Ed McDonagh reporter
Adding a 2 second delay to creation of additional events to reduce the chance of the initial study not getting far enough through before the next event is added and the process failing. Issue with an error with the AnatomicRegionSequence was due to duplicates in the ContextID table. Presumably this is the same error that led to the try except sequence in the first place. Needs to be fixed by saving earlier, and choosing the first response if there are several. Ideally, the table should be unique. Refs ~~#262~~

→ <<cset d990b674e83a>>
- 2015-09-02T11:04:21+00:00
Log in to comment

Assignee: Ed McDonagh

Type: bug

Priority: critical

Status: resolved

Component: Import: All

Milestone: 0.7.0

Votes: 0

Watchers: 1