Duplicate rows in xlsx radiography export when a study contains multiple acquisitions
For example, if a study contains two Abdo AP acquisitions then the "all data" worksheet will list the complete study twice. The "abdo_ap" worksheet lists every acquisition twice.
If there were three acquisitions in each study, then you get three identical rows in "all data", and every acquisition three times in the study-specific sheet.
The csv export works correctly.
Comments (13)
-
reporter -
reporter Commented out xlsx export for radiographic studies due to presence of duplicate data in the file that is created. See issue
#186.→ <<cset c9b63d69396d>>
-
reporter This duplicate data issue also affects the csv export of radiographic data. I would appreciate some help in sorting this out.
-
reporter I've investigated this a bit further. The duplicates only appear in the csv output if the displayed data has been filtered using 'Acquisition protocol'. If this filter is blank then the csv output is correct, with no duplicates present.
-
Is that true for the xlsx export too?
-
reporter Hi Ed,
I've disabled the xlsx export, but I would think it is the same issue.
Do you have any DX or CR data in your system? If so, do you also have this problem?
It is the filtering of the data in dx_export.py that is the problem. The following lines return too many results when text in the "Acquisition protocol" is used. However, I can't debug this routine in PyCharm - it doesn't respect my breakpoints for some reason:
f = DXSummaryListFilter.base_filters for filt in f: if filt in filterdict and filterdict[filt]: # One Windows user found filterdict[filt] was a list. See https://bitbucket.org/openrem/openrem/issue/123/ if isinstance(filterdict[filt], basestring): filterstring = filterdict[filt] else: filterstring = (filterdict[filt])[0] if filterstring != '': e = e.filter(**{f[filt].name + '__' + f[filt].lookup_type : filterstring})
-
Acquisition protocol is introduced in the charts branch, so I wouldn't expect to have this problem in any of my installs.
I notice that acquisition protocol (and min/max DAP) are the only filters that are at the level of the irradiation event - all the previously available filtering fields were common to the whole study, hence why this hasn't been seen before.
I have seen something similar though when I've been writing the RF xlsx exports and I've ended up with too much data. My solution there was to get the data a second time at the IrradEventXRayData level after I had established which studies I wanted - the study uid was used as the match I think.
Not sure how or if that can be applied here? I didn't find
.unique
worked for me. -
- changed component to Export: Radiography
-
reporter The behaviour is exactly the same for the xlsx files: if no "Acquisition protocol" filtering is done, then all is well; with "Acquisition protocol" filtering, duplicates are present in the resulting file.
@edmcdonagh, I'll have a look into your suggestion - thanks.
-
For example:
# Get the study instance uid for all the studies in e expInclude = [o.study_instance_uid for o in e] # some other stuff # Now start again, getting all the objects in IrradEventXrayData, filtering by the acquisition protocol of interest, # and then filtering by only the study instance uid's that were collected together earlier. # Result should be just the instances you are after! p_events = IrradEventXRayData.objects.filter( acquisition_protocol__exact = protocol ).filter( projection_xray_radiation_dose__general_study_module_attributes__study_instance_uid__in = expInclude )
-
reporter - changed status to resolved
Added an additional filter to the code for the csv and xlsx export to remove duplicate rows. The additional filter ensures that there is only once instance of each study_instance_uid in the results by applying the distinct() operation to these values. This fixes issue
#186.e = e.filter(projection_xray_radiation_dose__general_study_module_attributes__study_instance_uid__isnull = False).distinct()
→ <<cset d4cf2afeac28>>
-
This hadn't been updated for the database name changes in 0.5.1 Refs
#186→ <<cset 4dca3ee68237>>
-
-
assigned issue to
-
assigned issue to
- Log in to comment
I've had a brief look at this. The problem must be contained within the dxxlsx routine in the dx_export.py file. However, I can't see where the problem is. I think that for the time being xlsx exports of radiographic data should be disabled.