Wiki

Clone wiki

jummp / submission_pipeline

Model submission pipeline

Overview

In addition to the batch uploading facility, Jummp provides a wizard style model create/update functionality, accessed through the web interface. Users must be logged in to submit models, although this constraint may be relaxed to allow anonymous submissions. To edit a model, users must have edit rights to the model or have admin level access. As the create/update workflows are similar, one design goal has been to reuse code components during implementation.

Architecture

Grails provides the webflow plugin for creating wizards. This allows the specification of each step in the model creation pipeline as a state machine, within a Controller action. The processing is delegated to a service, the submission service, which maintains state and interacts with other services such as ModelService and ModelFileFormatService. The architecture is shown below.

Architecture of Submission Pipeline in Jummp

Design

The submission pipeline is designed to have the states shown in the diagram below. States may be action or view. The former relate to processing, while the latter have user interfaces corresponding to them. In addition to the states shown in the diagram, an error during the submission process leads to an error page. Users also have the option to abort the submission at any step.

Submission Pipeline State Diagram

Implementation

Implementation of the submission pipeline is broken down in three parts:

  • Model Controller - Contains the implementation of the state diagram, and input validation
  • SubmissionService - Processing related to action states
  • Views - UI corresponding to view states

Data is passed between components of the pipeline in a workingMemory data structure, a map with values corresponding to specific strings.

Implementation in Model Controller

The core of the submission pipeline is implemented in the upload subflow, accessed by the create and update flows with an initialisation parameter specifying type of flow. The create and update flows have their owm cancellation and confirmation pages (i.e. terminal states that end the flow). The upload flow implements the remaining states.

The main complexity of the upload flow lies in the upload files state. Here the controller performs file validation (using a custom command object named UploadFilesCommand). If the files do not validate (for instance if no main file is provided), the user is redirected to the upload files page with an error message. This is more complicated if the user is editing an existing model, or has uploaded files already, and came back to this step (to verify the list of files for instance). Therefore existing files need to be considered in the validation. This necessitates that the update model workflow extract files associated with the current revision of the model to be updated, and place them in memory. Once validation is performed, the files uploaded are placed in an intermediate location, and passed on to the submission service. To better structure the code, handling the uploaded files was split into two states, one responsible for validating files, the second responsible for placing files in an intermediate location and calling the submission service.

Implementation in Submission Service

The submission service implementation relies on the strategy pattern, allowing specification of abstract methods that are implemented for the update and create functions in different ways. The figure below shows the UML class diagram for the service.

Submission Service Class Diagram

The Submission Service relies on a function getStrategyFromContext returning a state machine strategy object, depending on the isUpdateOnExistingModel parameter in working memory. Correct implementation of the strategy pattern requires specific parameters of the working memory to be processed and updated as the flow moves forward. These are defined below:

initialise

  • Requires: Nothing

  • Updates: Does nothing for new models. For existing models, current files associated with the models are extracted from the repository as a list of RepositoryFilesTransportCommand objects and placed in working memory with the repository_files label. A copy of this map is stored as existing in the working memory, used in further processing subsequently.

handleFileUpload

  • Requires: A list of files in the working memory, with the label submitted_mains. A map of files to descriptions in the working memory, with the label submitted_additionals. A list of files with the label deleted_filenames which comprises a list of files to be removed.
  • Updates: Appends to the list of files stored in working memory under repository_files (creating it if it doesnt exist). Files corresponding to deleted_filenames are removed from the repository_files, and are checked against files from the previous revision (stored in existing) and any existing files are marked for removal from the disk. There are additional checks, such as ensuring that the same file is not being deleted and added or vice versa, in which case whichever operation is performed later is performed, with the previous operation ignored.

inferModelFormatType

  • Requires: A list of files in working memory under repository_files with at least one main file.
  • Updates: Uses the ModelFileFormatService to infer the model format type, and stores the identifier for the format in working memory under model_type.

performValidation

  • Requires: A list of files in working memory under repository_files with at least one main file, the identifier for the model format in working memory under model_type.
  • Updates: Performs validation, storing the validation result in model_validation_result in working memory. If there is an error in validation it is stored in validation_error.

inferModelInfo

  • Requires: A list of files in working memory under repository_files
  • Updates: Creates RevisionTransportCommand object (if one doesnt exist) and updates it with the name, description and validation info. The object is stored under the label RevisionTC

refineModelInfo This function is currently not used, as we do not allow modification of the model name/description. * Requires: The RevisionTransportCommand object with label RevisionTC. The new information (name and description) provided by the user * Updates: The RevisionTransportCommand object is to be updated with the new information.

updatePublicationLink

  • Requires: The ModelTransportCommand object with label ModelTC, A map of modifications containing the publication link type and link with keys PubLinkProvider and PubLink respectively.
  • Updates: The model publication object is updated with the values supplied (if different from the existing values). If the publication exists in the database it is retrieved. If the publication link type is pubmed, the details of the publication are retrieved. The ModelTransportCommand object is to be updated.

updateRevisionComments

  • Requires: The RevisionTransportCommand object with label RevisionTC
  • Updates: The RevisionTransportCommand object is to be updated with the commit message

handleSubmission

  • Requires: The RevisionTransportCommand object with label RevisionTC, a list of files in working memory under repository_files with at least one main file, the identifier for the model format in working memory under model_type.
  • Updates: Uses the model service to process the submission. If the submission is successful, the model_id parameter is set in working memory with the id of the model created/updated. Otherwise an exception is thrown.

Implementation in Views

View states are implemented using standard GSP. The gsp page with noteworthy complexity is the uploadFiles view. It relies on javascript to give users the ability to upload as many additional files as they wish, by adding or removing upload widgets as necessary as shown in the figure below If there is a file validation error, it is shown to users in a flash message as shown below. If the files are valid, but the model is not valid, the page shows a dialog box giving the user the option to proceed with the submission regardless. This is done in the gsp page by showing a dialog box when the flash memory contains a showProceedWithoutValidationDialog parameter set to true.

Upload files

A further complexity of the upload files view is the need to show files already uploaded. This is shown as a list of files (currently not editable) above the file upload box. The list is populated by looking at the repository_files parameter in working memory

Back

In the next step the user is asked to provide a publication link. This can be skipped by leaving the boxes blank. A number of different link types can be provided (such as pubmed, doi, isbn etc). Each of these known types are validated against a pattern stored for their links. If an invalid link is provided an error message is shown and the user is asked to supply a correct link.

Model publication link

If the user supplies a pubmed link, the publication is retrieved using the Europe PMC web service. If an existing publication link is provided, the details are retrieved from the database. The user is then directed to a form that allows editing of any data retrieved based on the link.

Publication details page

Finally, before submission, a submission summary is shown to the user. Here we include a collapsible div based display of the publication, which includes a link to the publication. The publication is shown as an identifiers.org url, enabling users to click on the link for link types that would otherwise be difficult to access directly (e.g. DOI, ISBN etc).

Submission Summary

Testing

The submission flow has been tested using the WebFlowTestCase class provided by Grails. This allows the simulation of the flow by triggering events. While testing the flows can ensure correct operation, the flow from the user's perspective is not tested. This would require the use of external tools, such as Selenium or httpunit. This remains on our to-do list.

Updated