AttributeError: 'numpy.ndarray' object has no attribute 'keys' when thresholding and when there is only one region

Issue #71 resolved
Thomas Gilgenast created an issue

this happens because in lib5c.algorithms.thresholding.two_way_thresholding()

    # cluster size filtering
    if size_threshold > 1:
        print 'size thresholding'
        if not concordant:
            d.add_columns_from_counts_superdict(
                {cond: size_filter(
                    d.counts(name='significant_unfiltered', rep=cond),
                    size_threshold)
                 for cond in d.conditions},
                'significant',
                rep_order=d.conditions
            )
        else:
            d.add_columns_from_counts_superdict(
                {rep: size_filter(
                    d.counts(name='rep_significant_unfiltered', rep=rep),
                    size_threshold)
                 for rep in d.reps},
                'rep_significant',
                rep_order=d.reps
            )

in either case of this conditional, there’s a call to Dataset.counts() - at the end of this function (in lib5c.structures.dataset.Dataset.counts()):

        # squeeze trivial levels of the returned object
        if len(region_order) == 1:
            return counts[region_order[0]]
        return counts

this squeeze is intended to return a dict (keys are rep names) of arrays (a “regional counts superdict”) when the region kwarg is passed, but in this case it gets applied even though region=None because by chance len(region_order) == 1 (there is only one region). this makes the returned data structure definitely not a “counts superdict“ (which needs to be a dict of dict of array) and therefore crashes the call to Dataset.add_columns_from_counts_superdict()

a proposed solution is to simply change this conditional in Dataset.counts() to check if the region kwarg was passed. if region=None, it’s not likely that the user actually wanted the squeezing to occur. this is not explicitly stated in the docstring but seems like a fair assumption. the new contract would be:

  • when region is None and rep is None. get a counts superdict for sure
  • when region is not None and rep is None, get a regional counts superdict
  • when region is None and rep is not None, get a standard counts dict
  • when region is not None and rep is not None, get an array

this sounds pretty reasonable and intuitive at first glance

Comments (2)

  1. Log in to comment