Wiki

Clone wiki

ieeg / IeegUpload

ieeg upload

Example:

ieeg upload --controlFile controlFiles/myDatasetControl.json *.edf ~/Documents/image.zip /some/directory/report.pdf

will upload all the named files including the control file to an S3 bucket and register the control file in our DB.

The --controlFile parameter is required. If no other files are listed then nothing is uploaded.

If the optional parameter --collectionName "my dataset" is used then the name will form part of the file keys used to store the files in S3.

If the optional parameter --bucket my-bucket is used then the named bucket will be used to store the files. See below for more information about how a bucket and AWS credentials are selected.

The control file will contain information about the other named files which will allow a dataset to be created from the files by the processing pipeline.

The control file will not contain the bucket or full file keys of the other files since these will not be known by the writer of the file or the user running the upload. We will store the bucket and a file key prefix in the DB table where the control file is registered. This means all the files in the control file must appear in the same bucket and with the same file key prefix.

By default the files are uploaded to a bucket owned by the portal. If a user wants to use his or her own bucket for uploads then either a bucket name and AWS access key id need to be part of his or her profile in Drupal or the --bucket option should be used on the command line. If an AWS access key id is in the user's profile then the user is required to have the secret key corresponding to the access key id in the ieeg.properties file. See cli. It is up to the user to make sure the credentials allow write access to the bucket. We will work with the user to make sure that the portal will have read access to the files.

The command will first call an IEEG web service which returns upload location and credential information. It always returns a AWS access key id, a bucket name, and a unique file key prefix.

No AWS Access Key ID in Drupal profile AWS Access Key ID in Drupal profile
Use default bucket This first web service call generates temporary credentials and so in addition to the items mentioned above also returns a secret key and session token. When it is time to upload the files will be uploaded to a default inbox bucket owned by the portal using the temporary credentials. The temporary credentials will only have write access and only to the file key prefix in the default bucket.  This prefix is only usable for this run of the program. User-supplied credentials will not have write access to the default bucket. So an error.
User-supplied bucket Our temporary credential issuing IAM user will not have write access to non-default bucket. So an error. Nothing additional is returned by this first web service call. The secret key will be obtained from ieeg.properties file.

Now it is time to upload. This will upload directly to S3 using the AWS credentials described earlier. For each file we will only look at the base name of the path specified on the command line when constructing the S3 file key. So for example, if ~/Documents/image.zip was on the command line the S3 file key will look like

<file key prefix returned from first web service call>< /optional collectionName from command line__>/images.zip

The files will be uploaded with reduced redundancy.

If the control file is uploaded successfully then it is registered in the final IEEG web service call. This means an entry is created in the new table control_file which will have these columns:

  • control_file_id
  • obj_version
  • bucket
  • prefix: this will be <file key prefix returned from first web service call>< /optional collectionName from command line__>
  • name: this will be the base name: myDatasetControl.json
  • creator_id: FK to the user who ran the program
  • create_time
  • status: A string the pipeline can use to indicate status
  • metadata: This is json metadata the pipeline manages

Since recording_objects don't have tasks we'll delete the recording_object_task and recording_object_task_metadata tables.

Problems and things missing with this design

  • There is no way for a user to fix mistakes in the object files or the control file.
  • There is no way to split the upload of files for a single control file into multiple runs of the program. In particular, no way to fix things when some uploads fail.

ieeg register

ieeg register --bucket my-bucket --prefix myobjects/dataset1 --name control.json

Updated