pca tool's -s/--separate_colors should be extracted as a re-usable helper

this helper would be nice to have on other diagnostic plotting tools such as the distribution tool

it would also be nice to have for scripting whenever a counts_superdict with reps from different conditions has been loaded

for reference, the usage for this on the tool side is to request a parameter

pca_parser.add_argument(
    '-s', '--separate_colors',
    type=str,
    help='''Specify a shell-quoted, comma-separated list of class names
    (which must be substrings of the replicate names) to color-code the
    output with. For example, 'ES,NPC'.''')

on the API-side, the color-coding relies on two kwargs passed to the plotting function: labels (equivalent to rep_order, simply the name for each rep if the reps are passed in an unlabeled data structure) and levels (a parallel list storing the condition name for each replicate in the same order as labels)

under this API spec, the levels can be computed with the following code block:

# determine levels
levels = None
if args.separate_colors is not None:
    classes = args.separate_colors.split(',')
    levels = []
    for rep_name in rep_names:
        target_class = None
        for c in classes:
            if c in rep_name:
                print('assigning rep %s to class %s' % (rep_name, c))
                target_class = c
                break
        if target_class is None:
            raise ValueError('could not assign replicate %s to any of the '
                             'color-coding classes %s'
                             % (rep_name, classes))
        else:
            levels.append(target_class)

there is an alternate API possible, which is that if the plotting function accepts a labeled data structure such as a counts_superdict, then labels can be a dict mapping from keys of the counts_superdict to short replicate names suitable for plotting as labels (defaulting to the identity map), and levels can be a dict mapping from the keys of the counts_superdict to condition names

currently lib5c.plotters.distribution.plot_global_distributions() and lib5c.plotters.distribution.plot_regional_distributions() use half of this second API, with the labels being a dict

probably both approaches should be supported by the extracted helper - the code block above can be easily modified to return a dict instead of a list if needed

Comments (3)