Add charts that calculate a user-defined percentile of the data

Issue #918 new
David Platten created an issue

Like the charts that calculate median, but using a user-defined percentile of the data. I have a need to be able to calculate 75th percentiles, and want to be able to export these percentiles from the charts. The box plots in the develop code include 75th percentile values, but I don’t think these statistics can be exported easily from the charts.

Comments (20)

  1. David Platten reporter

    In the create_dataframe_aggregates method of chart_functions.py a user-defined percentile can be calculated using a lambda function within data frame aggregation like this:

    df.groupby(groupby_cols).agg({df_agg_col: ['median', 'count', lambda x: np.percentile(x,75)]})
    

    The resulting column is called “<lambda_0>” on my system, which isn’t very user-friendly. You can control what the column is called like this:

    df.groupby(groupby_cols).agg({df_agg_col: ['median', 'count', ('percentile', lambda x: np.percentile(x,75))]})
    

  2. David Platten reporter

    See this discussion, for a neater version of calculating the percentile:

    https://stackoverflow.com/questions/47637774/pandas-groupby-quantile-values/58535752

    The code I like is:

    def rename(newname):
        def decorator(f):
            f.__name__ = newname
            return f
        return decorator
    
    def q_at(y):
        @rename(f'q{y:0.2f}')
        def q(x):
            return x.quantile(y)
        return q
    
    f = {'number': ['median', 'std', q_at(0.25) ,q_at(0.75)]}
    df1 = df.groupby('x').agg(f)
    df1
    
    Out[]:
    number                            
      median           std  q0.25  q0.75
    x                                   
    0  52500  17969.882211  40000  61250
    1  43000  16337.584481  35750  55000
    

  3. David Platten reporter

    Added the percentile value to the CT chart options form and humanized it. Still just implemented for the chart of CT requested procedure DLP. Refs issue #918

    → <<cset 738a8f211fef>>

  4. David Platten reporter

    Added percentile chart for other CT charts. I've broken the average choice box with my custom template code - need to fix this (if a user submits two options, only one is maintained when the page reloads with the charts). Refs issue #918

    → <<cset 001235278c1f>>

  5. David Platten reporter

    Replaced common template block with an import statement and new template file. Made chart average choices work on new page visit, or on form submission. Refs issue #918

    → <<cset 3f2ffd4d28b5>>

  6. David Platten reporter

    Corrected bar chart csv data routine so that frequency data is included, rather than a duplicate of the average value (changed with Plotly update I think). Refs issue #918

    → <<cset d837f5bdc8e7>>

  7. Ed McDonagh

    Hi @David Platten - do you remember how far this one was along? I think most of the merge conflicts are standard name ones that don’t look too difficult to tease out…

  8. David Platten reporter

    Hi @Ed McDonagh - my colleagues and I no longer need the functionality that this branch brings. Do you think you would make use of it?

  9. Ed McDonagh

    Hi @David Platten no, we wouldn’t use it. It would make the “must have 75th centile for DRLs” crowd happy, but I generally don’t want to feed that desire… And you can get the data from the box-plots.

    I think in general it would be a good addition, but if you aren’t using it in your team then it is probably better to drop it for now and concentrate on other things.

  10. Log in to comment