Remove unwanted entries from DB

Issue #479 resolved
Tim de Wit
created an issue

Due to regular crashes of qrscu.py (issues #476, #478) some data is only partially present in the DB (e.g. with empty station name, empty study description), resulting in entries "Error has occurred - import probably unsuccessful" in the summary table.
When trying to remove these from the remapp_generalstudymoduleattr table (using phppgadmin), I'm running against foreign key constraints. Apparenty Django doesn't support "delete on cascade", but instead recursively deletes all related entries itself (for instance when deleting data via the web interface).
After fixing a particular bug I would like to reprocess the missing data, but I can't remove it, and also can't reprocess because the studyinstanceuid's are already present.
There's no way of deleting this data from the web interface as filtering on displayname=None doesn't work. Is there any other way?

Comments (43)

  1. Ed McDonagh

    For you, yes, for others, not so much.

    phppgadmin should have worked - I've successfully done this with pgAdmin before - I just had to make sure that delete on cascade was enabled.

    I usually use a variation on the instructions to delete all the records, but do some filtering first. Always remember to backup the database before doing this!.

    I find it helpful to assign one of the records to a variable in the shell to enable 'dot-tab' completion (b=a[0]). Also there are a couple of out-of-date png files in the stuff folder that show the database with and without CT. The with CT one is horrible to review!

    We should make it easier to filter for these studies though, and to delete them from the web interface.

  2. Tim de Wit reporter

    phppgadmin: did you do anything special for that? Django by default doesn't create tables/triggers with delete on cascade enabled...
    Thanks for the links; I'll have a look at it.

  3. Ed McDonagh

    It was pgAdmin III, which was a client GUI that used to be shipped with postgres when installed on Windows or Ubuntu desktop. There was a menu item 'drop cascaded', and I think I remember it working! However, I could be wrong. And I can't remember if I ever used it to delete individual records.

    I see that pgAdmin III has now been replaced by a web service installed pgAdmin 4.

  4. Tim de Wit reporter

    pgadmin3 results in the same error as phppgadmin, so I think I'll have to use Django. Never used Django and/or filtering though, so I'll have to brush up on that first :)

    ERROR: update or delete on table "remapp_generalstudymoduleattr" violates foreign key constraint "ce1c45a50ae89eb00ce7e9096fdae11e" on table "remapp_projectionxrayradiationdose"
    SQL state: 23503
    Detail: Key (id)=(14880) is still referenced from table "remapp_projectionxrayradiationdose".
    
  5. Tim de Wit reporter

    Works like a charm!

    python manage.py shell
    
    from remapp.models import GeneralStudyModuleAttr, GeneralEquipmentModuleAttr
    
    # delete entries with study instance uid = NULL
    a = GeneralStudyModuleAttr.objects.filter(study_instance_uid=None)
    a.count()
    a.delete()
    a.count()
    
    # delete entries without corresponding entry in the GeneralEquipmentModuleAttr table
    a = GeneralStudyModuleAttr.objects.exclude(id__in = GeneralEquipmentModuleAttr.objects.values_list('general_study_module_attributes_id', flat=True))
    a.count()
    a.delete()
    a.count()
    exit()
    
  6. Ed McDonagh

    You might do something like this. Make sure you are in your virtualenv if you are using one, and navigate to the openrem folder that contains manage.py.

    python manage.py shell

    >>> from remapp.models import GeneralStudyModuleAttr
    >>> a = GeneralStudyModuleAttr.objects.all()
    >>> a.count()
    >>> b = a[123]
    >>> b.  # press tab
    b.DoesNotExist(                          b._perform_date_checks(
    b.MultipleObjectsReturned(               b._perform_unique_checks(
    b.__class__(                             b._save_parents(
    b.__delattr__(                           b._save_table(
    b.__dict__                               b._set_pk_val(
    b.__doc__                                b._state
    b.__eq__(                                b.accession_hashed
    b.__format__(                            b.accession_number
    b.__getattribute__(                      b.check(
    b.__hash__(                              b.clean(
    b.__init__(                              b.clean_fields(
    b.__module__                             b.ctradiationdose_set(
    b.__ne__(                                b.date_error_message(
    b.__new__(                               b.delete(
    b.__reduce__(                            b.from_db(
    b.__reduce_ex__(                         b.full_clean(
    b.__repr__(                              b.generalequipmentmoduleattr_set(
    b.__setattr__(                           b.get_deferred_fields(
    b.__setstate__(                          b.id
    b.__sizeof__(                            b.modality_type
    b.__str__(                               b.name_of_physician_reading_study
    b.__subclasshook__(                      b.operator_name
    b.__unicode__(                           b.patientmoduleattr_set(
    b.__weakref__                            b.patientstudymoduleattr_set(
    b._base_manager                          b.performing_physician_name
    b._check_column_name_clashes(            b.physician_of_record
    b._check_field_name_clashes(             b.pk
    b._check_fields(                         b.prepare_database_save(
    b._check_id_field(                       b.procedure_code_meaning
    b._check_index_together(                 b.procedure_code_value
    b._check_local_fields(                   b.projectionxrayradiationdose_set(
    b._check_long_column_names(              b.referring_physician_identification
    b._check_m2m_through_same_relationship(  b.referring_physician_name
    b._check_managers(                       b.refresh_from_db(
    b._check_model(                          b.requested_procedure_code_meaning
    b._check_ordering(                       b.requested_procedure_code_value
    b._check_swappable(                      b.save(
    b._check_unique_together(                b.save_base(
    b._default_manager                       b.serializable_value(
    b._deferred                              b.study_date
    b._do_insert(                            b.study_description
    b._do_update(                            b.study_id
    b._get_FIELD_display(                    b.study_instance_uid
    b._get_next_or_previous_by_FIELD(        b.study_time
    b._get_next_or_previous_in_order(        b.study_workload_chart_time
    b._get_pk_val(                           b.unique_error_message(
    b._get_unique_checks(                    b.validate_unique(
    b._meta
    

    Now if you have an accession number or study instance uid or date or modality that you want to delete, it's easy!

    >>> c = a.filter(study_instance_uid__exact = '1.3.51.0.1.1.192.134.90.77.1000234153.16601234')
    >>> c.count()
    1
    >>> c.delete()
    >>> c.count()
    0
    

    If you want to filter on field in other tables, then it gets more interesting.

    You can see what fields are available using the same trick as before:

    >>> c = b.generalequipmentmoduleattr_set.get()
    >>> c. # press tab
    c.DoesNotExist(                           c._get_next_or_previous_in_order(
    c.MultipleObjectsReturned(                c._get_pk_val(
    c.__class__(                              c._get_unique_checks(
    c.__delattr__(                            c._meta
    c.__dict__                                c._perform_date_checks(
    c.__doc__                                 c._perform_unique_checks(
    c.__eq__(                                 c._save_parents(
    c.__format__(                             c._save_table(
    c.__getattribute__(                       c._set_pk_val(
    c.__hash__(                               c._state
    c.__init__(                               c._unique_equipment_name_cache
    c.__module__                              c.check(
    c.__ne__(                                 c.clean(
    c.__new__(                                c.clean_fields(
    c.__reduce__(                             c.date_error_message(
    c.__reduce_ex__(                          c.date_of_last_calibration
    c.__repr__(                               c.delete(
    c.__setattr__(                            c.device_serial_number
    c.__setstate__(                           c.from_db(
    c.__sizeof__(                             c.full_clean(
    c.__str__(                                c.gantry_id
    c.__subclasshook__(                       c.general_study_module_attributes
    c.__unicode__(                            c.general_study_module_attributes_id
    c.__weakref__                             c.get_deferred_fields(
    c._base_manager                           c.id
    c._check_column_name_clashes(             c.institution_address
    c._check_field_name_clashes(              c.institution_name
    c._check_fields(                          c.institutional_department_name
    c._check_id_field(                        c.manufacturer
    c._check_index_together(                  c.manufacturer_model_name
    c._check_local_fields(                    c.pk
    c._check_long_column_names(               c.prepare_database_save(
    c._check_m2m_through_same_relationship(   c.refresh_from_db(
    c._check_managers(                        c.save(
    c._check_model(                           c.save_base(
    c._check_ordering(                        c.serializable_value(
    c._check_swappable(                       c.software_versions
    c._check_unique_together(                 c.spatial_resolution
    c._default_manager                        c.station_name
    c._deferred                               c.time_of_last_calibration
    c._do_insert(                             c.unique_equipment_name
    c._do_update(                             c.unique_equipment_name_id
    c._general_study_module_attributes_cache  c.unique_error_message(
    c._get_FIELD_display(                     c.validate_unique(
    c._get_next_or_previous_by_FIELD(
    >>> d = a.filter(generalequipmentmoduleattr__station_name__exact = 'CTAWP34582')
    >>> d.count()
    2342
    >>> d.delete()
    >>> d.count()
    0
    

    You can see that to get between tables, you use a __ (double underscore), and then you use one of the filtering terms, such as exact, contains, icontains for a case insensitive version etc. See the Django docs for more.

  7. Ed McDonagh

    Gah! You got there first whilst I was writing my response!

    Looks like a very useful query you have formed there though. I'll hang on to that one! We should probably incorporate them into the interface if we thing they are safe...

  8. Ed McDonagh

    The studies that can't be managed in the interface currently are those with no GeneralEquipmentModuleAttr table, which leads to the 'import probably unsuccessful' message originally commented on in this issue.

    In my database, with hundreds of thousands of studies, the queries proposed by TIm do not return (at least they don't return in a timeframe I am willing to wait!).

    However, the same can be achieved in no time at all with, for example:

    >>> from remapp.models import GeneralStudyModuleAttr
    >>> studies = GeneralStudyModuleAttr.objects.filter(modality_type__exact='CT')
    >>> studies.count()
    153537
    >>> broken = studies.filter(generalequipmentmoduleattr__unique_equipment_name__display_name__isnull=True)
    >>> broken.count()
    4
    >>> broken.delete()  # or whatever you want to do with them
    

    At this stage, I'm going to put management of this to a later version.

  9. Log in to comment