Wiki

Clone wiki

jummp / Version Control System

Version Control System

Overview

Jummp has an abstraction layer allowing support for multiple Version Control Systems (VCS). This facility was demonstrated by plugins for Subversion and Git. Subsequent development of the system has focussed on the Git plugin, while the Subversion plugin is not actively maintained.

Interaction with the VCS

Model Repository

Separate repositories are maintained for each model.

The previous iteration of the VCS system in Jummp required a single working directory to be specified, which was where all the models were versioned. This is no longer the case. Model files are contained in separate folders for each model. This location is generated using the File System Service, which uses the working directory parameter to determine the base for model directories. Each model has its own repository. As model directories are created dynamically, the VCS Manager expects implementations to initialise model repositories on first use. Therefore the 'working copy' parameter is no longer relevant to the VCS Manager and Git Manager, although references to it remain in the subversion plugin, which requires updating to be current.

The model repository contains a complete copy of every file in the version control system. This allows to retrieve the latest revision without any interaction with the actual Version Control System. Nevertheless it is not safe to use File handles to any file in the working copy from the web application directly. This is mostly due to the fact that the content of a file might change at any time by uploading a new revision or by retrieving an older revision (e.g. in git this requires a change of all files in the repository).

Exchange Directory

The exchange directory is used for temporary storage of model files during submission, and any other access to the model later on. This saves us from using the working directory directly, the contents of which may be updated by other users. The location of the repository can be configured, but does not need to. If not configured a directory in the extracted war archive is created and used.

Whenever a file needs to be retrieved from the VCS, the implementation will copy the file into the exchange directory. To ensure uniqueness of filenames, each call to retrieveFiles creates a separate, uniquely named directory, to check out files from the model repository. This ensures that the same files can be accessed by multiple, concurrent requests, while also allowing correspondence between file names stored in the database and those copied into exchange. In case of the HEAD revision, the retrieve call is a simple copy to the location, in case of a previous revision files needs to be retrieved from the repository and copied to the exchange location.

We have optimised the above described process to a degree by incorporating a lazy loading mechanism in the revision transport command object. Files are only retrieved from the repository when the files field of the revision transport command is directly referenced. Furthermore, when files are retrieved from the repository a weak reference is created, associated with the revision transport command object. This allows Jummp to use the garbage collector to track references to the revision, and to remove the directory from the exchange when the revision transport command is marked as weak. While this will generally ensure a clean exchange directory, it depends on the behaviour of the garbage collector, which can vary depending on the policy. As a failsafe a quartz job runs periodically, removing directories that were last modified over six hours ago.

Concurrent Access

The previous, single-repository implementation, required that no two threads modify the VCS concurrently. The problem can be illustrated with Git. Consider a previous version of a file needed to be retrieved. Therefore the clone had to be set to the revision by either creating a branch or checking out the revision and going into detached head mode. If at the same time another thread wants to update a file, it will either do a commit to a branch (which could not be pushed to a remote repository, as there is no tracking branch) or the commit would completely fail due to being in detached head state.

This problem is considerably mitigated by having model-level repositories. This allows concurrent access to different models. However, for multiple threads accessing the same model, synchronisation needs to be performed. The current implementation maintains locks in memory for models currently being written or read. Any user of the (Git implementation of) VcsManager needs to block until it acquires the lock for the model, if one exists. If a lock does not exist, it is created, and stored in memory. When the read/write is completed, the lock is unlocked, and if there are no waiting threads, is removed from memory. This ensures that the model repository is in a consistent state.

To cater for access to the repository from multiple instances of the VcsManager, an additional file-based locking mechanism has been implemented. Once the in-memory lock has been acquired, excluding other threads from accessing the directory, a lock-file is created in the repository git DirCache. This excludes any other process, in theory from accessing the file. In practice, Java's implementation of file-locks is only advisory, therefore we rely on other access to the repository observing the same locking protocol. It is therefore possible to have multiple instances of Jummp (for example running on different servers for scalability) accessing repositories concurrently on a shared file system.

Considering the external change of the repository it is important to remember to synchronize the web application's checkout. If the system has been changed externally it is possible that the VCS gets into an inconsistent state as described for the concurrent access. The API provides means to update the working copy, so it is possible to implement an administration interface to synchronise the copy.

Implementation

VcsManager

The VcsManager is an interface describing the interaction with the concrete VCS. The interface is part of the Core-API plugin. A plugin providing VCS functionality needs to implement this interface. The VcsManager needs the exchange directory. It is only handling files and does not know anything about Models or Model Revisions. As the VcsManager does not use any security it is important to not interact with it directly. Accessing files from the repository should only be done through the more higher level APIs ensuring security. An implementing plugin should provide a bean vcsManager implementing the interface. The bean should only be created if the plugin is selected in the configuration.

VcsService

The VcsService is the root web application's entry point to the VCS. It holds a reference to a VcsManager gained through dependency injection. The VcsService provides convenient methods for all the functionality provided by VcsManager. Each other part of the core application needing direct access to the VCS should use this service instead of a dependency injection of vcsManager. More preferably, Model Service should be used, which uses VcsService, but performs consistency checks on the model and generates events where appropriate.

Updated