reduce boilerplate re-use in toolbox
in particular, the following sequence is reused way too often
# expand infiles
expanded_infiles = []
for infile in args.countsfiles:
expanded_infiles.extend(glob.glob(infile))
# resolve level
primermap, resolved_level, resolved_primerfile = resolve_level(
expanded_infiles[0], primerfile=args.primerfile, level=args.level)
# load counts
print('loading counts')
if resolved_level == 'fragment':
counts_superdict = {infile: load_primer_counts(infile, primermap)
for infile in expanded_infiles}
elif resolved_level == 'bin':
counts_superdict = {infile: load_counts(infile)
for infile in expanded_infiles}
else:
raise ValueError('invalid level')
though of course there may be others
in general, a *_tool()
function should do the following steps:
- imports
- calls to helpers (parallelization, loading from disk, discerning labels)
- custom code to convert the exposed API (command line flags) to the upstream API (kwargs on the high-level scripting function)
- calls to high-level scripting functions (may be nested in logic ladders that depend on the command line flags)
- write plots to disk (if this is a plotting tool and it is obeying the new-style plotting API, where high-level plotting functions return axes and do not actually save the figure to disk)
if any other kind of logic is being performed in a *_tool()
function, it should either be extracted as a helper, or the high-level scripting function should be refactored to simplify its API
Comments (5)
-
reporter -
reporter marking this down in priority because it doesn't feel particularly "broken"
-
reporter - marked as trivial
-
reporter - changed status to on hold
if the changes proposed in
are accepted and#21resolve_level()
is dropped i think the boilerplate will be reduced to just those steps which a client script would be expected to take anyway (reading input files from command line input, loading counts with simple iteration over this list of input files)putting this on hold until
is tackled#21 -
reporter - Log in to comment
another way to clean this up could be to leverage the new universal primerfile and countsfile parsers, removing
resolve_level()
and-l/--level
from all tools except those whose activity is dependent on the level of the data (we could still allow-l auto
to be the default on these, just checking'BIN' in primermap[primermap.keys()[0]][0]['name']
) to automatically guess the level