Wiki

Clone wiki

ieeg / ObjectStorage

We need to allow an arbitrary number of objects to be associated with Studies/Experiments (recordings).

Uploading and Downloading Objects

Summary

  1. Non-admin users will be able to create empty recordings on which they will have ACL editing permission.
  2. Our recordings are grouped at the top level by organization. There will be a way to specify users as organization level owners. An org owner will be able to create recordings under the specified org. An org owner will also be able to delete any recording under their org regardless of the ACL. An org owner will not have any other permissions on a recording except for those allowed by the recording's acl. A user can be an org owner for more than one org.
  3. The objects associated with recordings will be files that users can upload and download.
  4. Files will be uploaded using the ieeg command line program which will be extended to include this ability. Anyone with write access to a recording will be able to upload a file for that recording.
  5. Files may also be uploaded through the Portal.
  6. The Internet media type, MD5 hash, size, creating user, and creation date for each file will also be stored. The uploader will also have the option to specify a description. There will also be a JSON string associated with each object which can be used to store additional information about the object.
  7. The files will be accessible in the Portal's search results tree under their parent recording to anyone with read access to the recording. PDF files will be viewable in the Portal. All other file types will be downloadable.
  8. The ieeg program will be used to delete recordings and objects. These operations will require write permission on the recording.
  9. In addition to downloads from the Portal, one will also be able to download files using the ieeg program.
  10. Objects are not backed up.
  11. Uploaded objects will be added to "queue" for possible batch processing. See Processing Objects.
  12. The existing DICOM zips and documentation PDFs should be incorporated into any solution. This will require further db, code, and S3 changes.
    • These may not have a pre-calculated MD5 hash in S3.
    • These are of course not indexed.

Components (more or less front to back)

Command Line

New sub-commands for recording creation and deletion will be added to ieeg. Also sub-commands for object upload, download, and deletion. See the documentation for the sub-commands for more detail. There will be no modification sub-commands. The sub-commands will rely on new RESTful web services. The sub-commands for object upload and download will retry in case of errors and also resumes.

Portal

The Portal will display a recording's objects in the search results tree under the recording. Clicking on an object in tree view will replace the Dataset Details pane with a pane displaying the object's metadata. It will be possible to upload new objects and download existing objects in the Portal. It will not be possible to modify an object's metadata in the Portal. PDF files should open in the Portal as they do now.

Metadata-entry app

The metadata-entry app will have to be modified to allow adding time series to existing recordings. It can do this now via the original screens, but not through the S3 file key import. This will be necessary since we are only providing services to upload objects, not time series. Expected initial workflow: User creates recording and uploads objects possibly including time series data. A Portal admin will download the time series files and convert to MEF. The Portal admin will upload the MEF to the appropriate place and use the new metadata-app functionality to import the time series information.

Web Services

We will add new RESTful web services for recording creation and deletion and object upload, download, and deletion. The upload and download services should support resumes. The current client for the creation and deletion services will be the expanded ieeg program, but the upload and download services will be used both by the Portal and ieeg and so need to work with both SessionTokens and signatures. We already have examples of this.

EEGServiceImpl

We'll need to get the object's metadata to the Portal through EEGServiceImpl. We'll need a service for object deletion.

note: the snapshot-related services are now SnapshotServiceImpl.

IDataSnapshotServer

As usual the Portal and Web Services will use IDataSnapshot server to talk to the database.

Tables

New tables:

For an object's metadata:

recording_object 1. recording_object_id: key 1. obj_version: for optimistic locking 1. recording_id: parent recording 1. file_key: nullable S3 key to the file 1. file_localname: [reserved] nullable string field for local filesystem copy of the file 1. creator: the User who uploaded the object 1. create_time: the timestamp of the upload 1. internet_media_type: e.g. application/octet-stream 1. md5_hash: MD5 hash of the file for integrity check 1. size_bytes: size of the file in bytes 1. description: a user supplied description 1. json: a user supplied JSON string for any additional object metadata

description and json are optional. Everything else is required.

We'll need a table for the processing "queue". An entry in the recording_object_task represents the object being in the processing pipeline. There is at most one entry for an object.

recording_object_task 1. recording_object_task_id: primary key. 1. obj_version: for optimistic locking 1. recording_object_id: foreign key to recording_object. 1. status: a status string

recording_object_task_metadata 1. recording_object_task_metadata_id: primary key. 1. obj_version: for optimistic locking 1. recording_object_task_id: foreign key to recording_object_task. 1. value: additional JSON metadata field for capturing any additional pipeline processing details, e.g., next stage, what has been processed

Modification to existing tables: * recording will have to lose the columns for the zip and pdf file, but this can happen in a later release.

S3

The object files will be kept in a directory called 'objects' in the parent recording's directory. This will keep them separate from the MEF files.

Migration

  • The zip and pdf files for existing recordings will have to be moved by a migration program, both in S3 and in the database. The MD5 hashes and sizes will need to be calculated.
  • Project names should be made unique. There are currently duplicates which will have to be dealt with.

Processing Objects

There will be a system which will allow processing tasks to be run on the objects.

Examples of such tasks:

  1. Extraction of additional metadata contained in MEF headers for entry into the database.
  2. Extraction of text from the objects for search.
  3. Conversion of EDF to MEF.
  4. Running an annotation creating detector over the MEF.
  • Note that some core functionality, namely a separate server container and time series processing components, already exists in UploadPipeline. This currently works on the command-line and will need to be extended to be a background daemon thread.
  1. There needs to be a queue of objects waiting to be processed.
  2. The task processing system needs to handle errors.

Updated