Duplicate rows in xlsx radiography export when a study contains multiple acquisitions

Issue #186 resolved
David Platten created an issue

For example, if a study contains two Abdo AP acquisitions then the "all data" worksheet will list the complete study twice. The "abdo_ap" worksheet lists every acquisition twice.

If there were three acquisitions in each study, then you get three identical rows in "all data", and every acquisition three times in the study-specific sheet.

The csv export works correctly.

Comments (13)

  1. David Platten reporter

    I've had a brief look at this. The problem must be contained within the dxxlsx routine in the dx_export.py file. However, I can't see where the problem is. I think that for the time being xlsx exports of radiographic data should be disabled.

  2. David Platten reporter

    This duplicate data issue also affects the csv export of radiographic data. I would appreciate some help in sorting this out.

  3. David Platten reporter

    I've investigated this a bit further. The duplicates only appear in the csv output if the displayed data has been filtered using 'Acquisition protocol'. If this filter is blank then the csv output is correct, with no duplicates present.

  4. David Platten reporter

    Hi Ed,

    I've disabled the xlsx export, but I would think it is the same issue.

    Do you have any DX or CR data in your system? If so, do you also have this problem?

    It is the filtering of the data in dx_export.py that is the problem. The following lines return too many results when text in the "Acquisition protocol" is used. However, I can't debug this routine in PyCharm - it doesn't respect my breakpoints for some reason:

    f = DXSummaryListFilter.base_filters
    
    for filt in f:
        if filt in filterdict and filterdict[filt]:
            # One Windows user found filterdict[filt] was a list. See https://bitbucket.org/openrem/openrem/issue/123/
            if isinstance(filterdict[filt], basestring):
                filterstring = filterdict[filt]
            else:
                filterstring = (filterdict[filt])[0]
            if filterstring != '':
                e = e.filter(**{f[filt].name + '__' + f[filt].lookup_type : filterstring})
    
  5. Ed McDonagh

    Acquisition protocol is introduced in the charts branch, so I wouldn't expect to have this problem in any of my installs.

    I notice that acquisition protocol (and min/max DAP) are the only filters that are at the level of the irradiation event - all the previously available filtering fields were common to the whole study, hence why this hasn't been seen before.

    I have seen something similar though when I've been writing the RF xlsx exports and I've ended up with too much data. My solution there was to get the data a second time at the IrradEventXRayData level after I had established which studies I wanted - the study uid was used as the match I think.

    Not sure how or if that can be applied here? I didn't find .unique worked for me.

  6. David Platten reporter

    The behaviour is exactly the same for the xlsx files: if no "Acquisition protocol" filtering is done, then all is well; with "Acquisition protocol" filtering, duplicates are present in the resulting file.

    @edmcdonagh, I'll have a look into your suggestion - thanks.

  7. Ed McDonagh

    For example:

        # Get the study instance uid for all the studies in e
        expInclude = [o.study_instance_uid for o in e]
    
        # some other stuff
    
        # Now start again, getting all the objects in IrradEventXrayData, filtering by the acquisition protocol of interest,
        # and then filtering by only the study instance uid's that were collected together earlier.
        # Result should be just the instances you are after!
                p_events = IrradEventXRayData.objects.filter(
                    acquisition_protocol__exact = protocol
                ).filter(
                    projection_xray_radiation_dose__general_study_module_attributes__study_instance_uid__in = expInclude
                )
    
  8. David Platten reporter

    Added an additional filter to the code for the csv and xlsx export to remove duplicate rows. The additional filter ensures that there is only once instance of each study_instance_uid in the results by applying the distinct() operation to these values. This fixes issue #186.

    e = e.filter(projection_xray_radiation_dose__general_study_module_attributes__study_instance_uid__isnull = False).distinct()
    

    → <<cset d4cf2afeac28>>

  9. Log in to comment