- changed status to on hold
refactor pipeline visualizations to be more modular
the current pipeline visualization API includes only one real option (heatmaps) and rests on a verbose logic ladder with no modularization
ideally, any visualization-type subcommand from the toolbox should be able to be simply added as a post-execution hook
one UX option would be to have the user select the desired visualizations by filling in a list parameter in the config
Comments (8)
-
reporter -
reporter - changed status to open
-
reporter an alternative to this proposal is to make pipeline visualizations first-class citizens on the graph
it would make checking completion and redrawing easier
it would make it easier to add custom visualizations
it would remove the artificial distinction between visualizations (a kind of summary report) and other computations
it would clean up the ugliness of the current class decorator approach, which is hard to extend. addition of visualization tasks to the dependency graph can be automated by the pipeline-building infrastructure
it is technically challenging because it requires a Task to know about multiple outputs, whether or not this is feasible should be debated
-
reporter just to keep the discussion rolling, it seems like adding a wrapper task should allow parallelization across replicates (since the replicates are explicitly passed around everywhere in the pipeline state), but in order to know what regions to parallelize over (for visualizations that should be drawn per-region, e.g. heatmaps) the pipeline initializer would need to either peek at a bedfile to read the list of regions, or the list of regions would need to be hard-coded in the config
if one of these two is done and the region list is stored on the pipeline state, then these wrapper tasks should be equivalent to e.g. MakeJointExpress, just inheriting from a version of JointTask that can be parameterized by rep and by region
-
reporter as a counterpoint to the above, keeping the visualizations as second-class citizens allows them to be easily added as post-run hooks on all steps of the pipeline without explicitly requesting each visualization at each stage of each alternate path
from this perspective, the visualizations are only in the pipeline to provide "quick-and-dirty" diagnostic output, without making the user run the heatmap tool in a giant for loop over every single folder created by the pipeline
another reading of this issue could be "add the other visualizations as post-run hooks and refactor whatever doesn't make sense"
-
reporter - marked as trivial
-
reporter - removed responsible
-
reporter this might be best addressed via #80 - if all our functions can also be tasks in the task graph at no additional cost, then requesting a visualization for a step by adding another task to the graph feels very inexpensive from the client perspective
the counterargument is that if the client wants a heatmap drawn at every stage (as an example) then making the pipeline visualizations tasks requires the client to write many more extra “desired outputs“
the alternative perspective is that being explicit about what outputs you want the pipeline to produce is a good thing and reduces the cognitive overhead of understanding which outputs are requested in which way (asking for everything in the output specification versus some complex combination of parameters and post-run hooks)
- Log in to comment