refactor pipeline visualizations to be more modular

Issue #6 open
Thomas Gilgenast created an issue

the current pipeline visualization API includes only one real option (heatmaps) and rests on a verbose logic ladder with no modularization

ideally, any visualization-type subcommand from the toolbox should be able to be simply added as a post-execution hook

one UX option would be to have the user select the desired visualizations by filling in a list parameter in the config

Comments (8)

  1. Thomas Gilgenast reporter

    an alternative to this proposal is to make pipeline visualizations first-class citizens on the graph

    it would make checking completion and redrawing easier

    it would make it easier to add custom visualizations

    it would remove the artificial distinction between visualizations (a kind of summary report) and other computations

    it would clean up the ugliness of the current class decorator approach, which is hard to extend. addition of visualization tasks to the dependency graph can be automated by the pipeline-building infrastructure

    it is technically challenging because it requires a Task to know about multiple outputs, whether or not this is feasible should be debated

  2. Thomas Gilgenast reporter

    just to keep the discussion rolling, it seems like adding a wrapper task should allow parallelization across replicates (since the replicates are explicitly passed around everywhere in the pipeline state), but in order to know what regions to parallelize over (for visualizations that should be drawn per-region, e.g. heatmaps) the pipeline initializer would need to either peek at a bedfile to read the list of regions, or the list of regions would need to be hard-coded in the config

    if one of these two is done and the region list is stored on the pipeline state, then these wrapper tasks should be equivalent to e.g. MakeJointExpress, just inheriting from a version of JointTask that can be parameterized by rep and by region

  3. Thomas Gilgenast reporter

    as a counterpoint to the above, keeping the visualizations as second-class citizens allows them to be easily added as post-run hooks on all steps of the pipeline without explicitly requesting each visualization at each stage of each alternate path

    from this perspective, the visualizations are only in the pipeline to provide "quick-and-dirty" diagnostic output, without making the user run the heatmap tool in a giant for loop over every single folder created by the pipeline

    another reading of this issue could be "add the other visualizations as post-run hooks and refactor whatever doesn't make sense"

  4. Thomas Gilgenast reporter

    this might be best addressed via #80 - if all our functions can also be tasks in the task graph at no additional cost, then requesting a visualization for a step by adding another task to the graph feels very inexpensive from the client perspective

    the counterargument is that if the client wants a heatmap drawn at every stage (as an example) then making the pipeline visualizations tasks requires the client to write many more extra “desired outputs“

    the alternative perspective is that being explicit about what outputs you want the pipeline to produce is a good thing and reduces the cognitive overhead of understanding which outputs are requested in which way (asking for everything in the output specification versus some complex combination of parameters and post-run hooks)

  5. Log in to comment