Number of studies shown when histogram bar clicked on sometimes more than the the number represented by the bar

The histogram data is calculated using numpy's histogram function from the individual DAP or DLP values. Numpy treats the last bin of the histogram differently from all the rest: all but the last (righthand-most) bin is half-open. In other words, if bins is [1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

I think it is quite likely that OpenREM users will often click on the righthand-most bin to see which studies are contained there - this bin may corresponds to high-dose outliers. To ensure that all studies are shown when this bin is clicked on, I have set the dose range criteria for every bin to be greater-than-or-equal-to and less-than-or-equal-to the supplied min and max values (see the definition of 'f' in dx_histogram_list_filter in view.py and you'll see I've used gte and lte). Really I should be using gte and lt for all bins but the last one, and then gte / lte for that. Unfortunately I'm stuck with having to choose one or the other, and have chosen to make it correct for the righthand-most bin.

A side-effect of my decision is that for all the other bins there may be one or more studies that fall into two bins if their dose happens to fall on a bin upper boundary. The numpy-calculated histogram will show the correct number of counts, but when the user clicks on the bar the summary data will show an additional study.

I can live with this for the sake of not missing out displaying any studies in the highest bin.

Comments (2)