refactor pipeline visualizations to be more modular

Thomas Gilgenast reporter

changed status to on hold

2016-06-22T13:16:52+00:00

Thomas Gilgenast reporter

changed status to open

2016-06-22T13:18:02+00:00

Thomas Gilgenast reporter

an alternative to this proposal is to make pipeline visualizations first-class citizens on the graph

it would make checking completion and redrawing easier

it would make it easier to add custom visualizations

it would remove the artificial distinction between visualizations (a kind of summary report) and other computations

it would clean up the ugliness of the current class decorator approach, which is hard to extend. addition of visualization tasks to the dependency graph can be automated by the pipeline-building infrastructure

it is technically challenging because it requires a Task to know about multiple outputs, whether or not this is feasible should be debated

2016-10-19T21:36:15+00:00

Thomas Gilgenast reporter

just to keep the discussion rolling, it seems like adding a wrapper task should allow parallelization across replicates (since the replicates are explicitly passed around everywhere in the pipeline state), but in order to know what regions to parallelize over (for visualizations that should be drawn per-region, e.g. heatmaps) the pipeline initializer would need to either peek at a bedfile to read the list of regions, or the list of regions would need to be hard-coded in the config

if one of these two is done and the region list is stored on the pipeline state, then these wrapper tasks should be equivalent to e.g. MakeJointExpress, just inheriting from a version of JointTask that can be parameterized by rep and by region

2017-05-17T18:36:19+00:00

Thomas Gilgenast reporter

as a counterpoint to the above, keeping the visualizations as second-class citizens allows them to be easily added as post-run hooks on all steps of the pipeline without explicitly requesting each visualization at each stage of each alternate path

from this perspective, the visualizations are only in the pipeline to provide "quick-and-dirty" diagnostic output, without making the user run the heatmap tool in a giant for loop over every single folder created by the pipeline

another reading of this issue could be "add the other visualizations as post-run hooks and refactor whatever doesn't make sense"

2017-05-17T19:51:20+00:00

Thomas Gilgenast reporter

marked as trivial

2017-05-17T19:51:44+00:00

Thomas Gilgenast reporter

removed responsible

2020-02-10T20:58:53+00:00

Thomas Gilgenast reporter

this might be best addressed via #80 - if all our functions can also be tasks in the task graph at no additional cost, then requesting a visualization for a step by adding another task to the graph feels very inexpensive from the client perspective

the counterargument is that if the client wants a heatmap drawn at every stage (as an example) then making the pipeline visualizations tasks requires the client to write many more extra “desired outputs“

the alternative perspective is that being explicit about what outputs you want the pipeline to produce is a good thing and reduces the cognitive overhead of understanding which outputs are requested in which way (asking for everything in the output specification versus some complex combination of parameters and post-run hooks)

2020-05-16T19:47:59+00:00

Comments (8)