reduce boilerplate re-use in toolbox

Issue #17 on hold
Thomas Gilgenast created an issue

in particular, the following sequence is reused way too often

# expand infiles
expanded_infiles = []
for infile in args.countsfiles:
    expanded_infiles.extend(glob.glob(infile))

# resolve level
primermap, resolved_level, resolved_primerfile = resolve_level(
    expanded_infiles[0], primerfile=args.primerfile, level=args.level)

# load counts
print('loading counts')
if resolved_level == 'fragment':
    counts_superdict = {infile: load_primer_counts(infile, primermap)
                        for infile in expanded_infiles}
elif resolved_level == 'bin':
    counts_superdict = {infile: load_counts(infile)
                        for infile in expanded_infiles}
else:
    raise ValueError('invalid level')

though of course there may be others

in general, a *_tool() function should do the following steps:

  • imports
  • calls to helpers (parallelization, loading from disk, discerning labels)
  • custom code to convert the exposed API (command line flags) to the upstream API (kwargs on the high-level scripting function)
  • calls to high-level scripting functions (may be nested in logic ladders that depend on the command line flags)
  • write plots to disk (if this is a plotting tool and it is obeying the new-style plotting API, where high-level plotting functions return axes and do not actually save the figure to disk)

if any other kind of logic is being performed in a *_tool() function, it should either be extracted as a helper, or the high-level scripting function should be refactored to simplify its API

Comments (5)

  1. Thomas Gilgenast reporter

    another way to clean this up could be to leverage the new universal primerfile and countsfile parsers, removing resolve_level() and -l/--level from all tools except those whose activity is dependent on the level of the data (we could still allow -l auto to be the default on these, just checking 'BIN' in primermap[primermap.keys()[0]][0]['name']) to automatically guess the level

  2. Thomas Gilgenast reporter

    if the changes proposed in #21 are accepted and resolve_level() is dropped i think the boilerplate will be reduced to just those steps which a client script would be expected to take anyway (reading input files from command line input, loading counts with simple iteration over this list of input files)

    putting this on hold until #21 is tackled

  3. Log in to comment