Allow Background and Plugin Driven Scheduling of Workflows

#561 Merged
Repository
galaxy-central-fork-1
Branch
default
Repository
galaxy-central
Branch
default
Author
  1. John Chilton
Reviewers
Description

Models:

Workflow invocations have been augmented with significantly more state - inputs, parameters, runtime step state, are all being tracked now. Workflow invocations have a state that can be changed over time, the UUIDs generated for workflow invocations in pull request #465 have to be persisted so they can be reused when scheduling new jobs for theworkflow invocation. Workflow invocation steps now have an action parameter for persisting state provided by users during the execution of the workflow (see forthcoming PauseModule for further details). Some initial elements of these model changes were based on model changes in Kyle Ellrott's Galaxy farm work (https://bitbucket.org/kellrott/galaxy-farm/branch/workflow_migrate). I made heavy modifications to the model to enforce referential integrity on parameter to workflow step mappings and made some cosmetic changes various other details.

Scheduling Plugins:

Used the pattern setup with dependency resolvers and job metrics to build a dynamic plugin infrastructure for defining workflow schedulers. I hesistate calling anything with only one implementation a plugin infrastructure, but I am confident enough that the combination of persisted workflow request combined with scheduler tag could be used to build a galaxy-farm plugin that would wait for another Galaxy instance to become available and it would pull the workflow down and

This work piggy backs on Galaxy job handlers to have workflow scheduled in the background (i.e. during submission each workflow being scheduled in the background is assigned a unique job handler and only that job handler thread will process the workflow). It should be pretty easy to allow the definition of a new kind of handler - that is a workflow handler instead of a job handler if that is of interest.

I will probably move a bunch of stuff that is happening in workflow/scheduling_manager.py more into the scheduler itself so that it can be more configurable and closer to a true plugin.

A new API has been added for submitting workflow in this fashion - but one can cause the traditional workflow submission routes (UI and existing API) to submit workflows in this fashion for very large workflows by setting the options force_beta_workflow_scheduled_min_steps (which defaults to 250 and causes workflows with more steps than that to be submitted in this fashion) and force_beta_workflow_scheduled_for_collections (which defaults to False and causes workflow involving collections to be submitted this way). Finally, submitting a workflow with a pause step (described below) will always cause it to be scheduled in this fashion.

API:

There are a number of new API points here for flushing out dealing with workflow invocations (called usages in existing parlance).

POST /api/workflows/{encoded_workflow_id}/usage

Schedule a worklfow to be run in the background and return just the workflow invocation information.

RESTfully speaking this should be plural but the matching GET endpoint is likewise usage and not usages - so I am favoring consistency over RESTful correctness here. Also, likewise creating a 'usage' feel like odd - I would like to make all of the usage endpoints aliases to a more RESTfully correct invocations endpoints.

The existing workflow run API endpoints still work and still work the way they use usually - but the output now includes all of the workflow invocation to_dict stuff as well as the list of outputs it initially used. Once everything is scheduled this way - that list of outputs is going to have to disappear but hopefully people can start using the invocation stuff now to help the transition.

DELETE /api/workflows/{workflow_id}/usage/{usage_id}

Cancel a scheduled workflow invocation.

GET /api/workflows/{workflow_id}/usage/{usage_id}/steps/{step_id}

Get information about a workflow invocation step.

PUT /api/workflows/{workflow_id}/usage/{usage_id}/steps/{step_id}

Update a workflow invocation step - for ones with modifiable state. Extension point added to workflow modules to support this but it is unused by all existing worklfow modules. A subsequent PauseModule will use this to either continue or cancel a workflow invocation at a particular step.

Modules:

Workflow modules can now define new methods for dealing with recovering state and interacting with user requests.

New Module Pause

A new workflow pause module type - meant to stop a workflow scheduling and wait for dataset review by a user - has been added but is disabled in the workflow editor by default since there is no UI for "un-pausing" a paused step.There is and API and API tests that verify the correctness of the pause module.

The short term configuration option enable_beta_workflow_modules can be enabled to show these steps in the workflow editor.

Testing:

One can test various aspects of the new APIs using the API tests added with this.

./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.NAME_OF_TEST

Comments (1)