pipeline runs one ThresholdTask per replicate (only one is needed)

Issue #30 resolved
Thomas Gilgenast created an issue

could this be addressed by making ThresholdTask extend a new CombiningTask task class (extending luigi.WrapperTask) which would wrap an inner task class that is not parametrized by rep (so it will only be scheduled once)?

Comments (2)

  1. Thomas Gilgenast reporter

    MakeThreshold extends JointInnerMixin, so as long as it gets scheduled only once it will trigger scheduling of all upstream tasks in a per replicate fashion

    in PipelineTask.requires(), a conditional checks to see if the task key ends in the substring 'threshold', and if it does, then only one of that task is scheduled (instead of one per replicate)

    this bug therefore reduces to the failure of that conditional to check the value of the task - it should not depend on the key since that is user-controlled

  2. Thomas Gilgenast reporter

    multiple changes

    fixes #30 by changing the logic used to decide whether or not a task should be parallelized over replicates: previously the string key of the task was tested to see if it ended in 'threshold', now the top-level task is resolved to a task class name and checked for equality with 'MakeThreshold'

    change in Docker image version management: previously inside the Docker image lib5c would always report version "unknown" because versioneer cannot determine the version information since the .git folder is excluded from the image build by the .dockerignore, now the output of "git describe" can be passed to the Dockerfile at build time to pass this through via a folder naming trick

    bumped package versions in requirements (notably python-daemon may now be installed at its latest version, previously 2.1.1 was being forced by setup.py)

    changed version reporting style in versioneer to match "git describe" output for simplicity

    → <<cset 44104d138f6b>>

  3. Log in to comment