Wiki

Clone wiki

ieeg / ieeg upload-directory tutorial

Uploading Data for Import

The ieeg upload-directory command can be used to upload EDF time series data and other files so that they may be imported into the IEEG portal as a dataset. If you have non-EDF data contact us first to see if we can import your data. In the examples below we will assume EDF.

To upload the new EDF and any other files that should be part of the dataset you will need the IEEG CLI tool. The latest version can be found on the Downloads page at the ieeg-cli-<version>-dist.zip link, where '<version>' is replaced by the current version. Please read the Installation Steps section of the general ieeg-cli documentation where the basic software requirements, installation, and the ieeg.properties configuration file are discussed. You will need to create this ieeg.properties file in your home directory as a one-time setup step before you can upload any files. A simple test for your ieeg-cli and ieeg.properties setup is to ask for the size of a dataset you have read access to, so for example:

$ ./ieeg-cli-<version>/ieeg size 'Study 005'
3.438GB

Study 005 is world readable in the portal so this should work if your setup is correct.

The easiest way to upload to the portal is to have all (and only) the files for a single dataset in a directory and use the CLI to upload the contents. Once it is uploaded the EDF will be converted so that the data can be accessed in the online viewer and through the Matlab toolbox. Text and PDF files will be made available for viewing. ZIP files will be downloadable.

For human datasets do not upload any files which include patient identifying information.

So if I want to create a dataset from tsData.edf and images.zip I place both files in a directory, say myData.

ieeg upload-directory

The ieeg sub-command we will run is upload-directory. Aside from the directory to upload the only other argument is a sort of instruction string which tells the import process what to do with the data. For human data this argument has the form:

-n 'Human_Data/<Organization Name>/<Dataset Name>'

For non-human data it has the form:

-n 'Animal_Data/<Organization Name>/<Dataset Name>'

The Dataset Name needs to be unique. The Organization Name can be re-used to group together datasets from the same organization. The quotes are only necessary if one of the names has spaces but they are always safe to include.

Say your organization is University of X. Then to create a dataset from the directory described above my upload run would like like this:

$ ./ieeg-cli-<version>/ieeg upload-directory -n 'Human_Data/University of X/userName-myData-test-01' myData
Starting upload to bucket [org-ieeg-inbox] with prefix [3696/Human_Data/University of X/userName-myData-test-01]
Scheduling upload of /Users/userName/myData/tsData.edf
Scheduling upload of /Users/userName/myData/images.zip
Upload of tsData.edf 0% done
Upload of images.zip 0% done
Uploading finished: 
Successful: tsData.edf
Successful: images.zip
Registered directory [3696/Human_Data/University of X/userName-myData-test-01/.] in bucket [org-ieeg-inbox]

This will create a new dataset called userName-myData-test-01 from the uploaded files.

It is important to realize that when this command finishes it only means that the files have been uploaded for import. The import itself is another process that will be kicked off automatically on our server. Depending on the size of the data and the number of uploads from other users it may take some time for your upload to be imported and made available in the portal. You can check if your dataset has been created yet by using the ieeg size sub-command:

$ ./ieeg-cli-<version>/ieeg size userName-myData-test-01
17.079MB

If you get a size back then your import has finished and you should now be able to see your dataset in the portal.

If the size of the files is large and I don’t want to tie up my terminal or worry about getting disconnected before the upload completes I would run the upload command this way:

$ nohup ./ieeg-cli-<version>/ieeg upload-directory -n 'Human_Data/University of X/userName-myData-test-01' myData&
[1] 5200
$ appending output to nohup.out
$

So I get my prompt back right away and the upload will continue even if I log off. I can check the progress by reading the nohup.out file.

Projects and Permissions

When a dataset is created through the upload/import process the user running ieeg upload-directory is always made an owner of the dataset. If, as above, a -n argument with no project name is used then no other user has access.

However, by modifying our -n argument we can automatically give access to a group of users by using projects. Here is the basic info on projects and permissions:

Projects

A project is a collection of users and datasets. The users in a project are broken up into two groups: admins and team.

A member of a project's admins group can add and remove users and datasets from the project.

A member of a project's team group can see the project.

If a user is in neither group then he or she will not see the project.

Membership in a project group does not by itself give a user any access to datasets in the project. This is one of the most important things to keep in mind when using projects to control access to datasets:

Adding a dataset to a project in the browser UI does not automatically give project members access to the dataset.

In order for project members to access a dataset you will need to add the project user groups to that dataset's access control list. This is the other most important thing to keep in mind:

Adding a project's user groups to a dataset's access control list in the browser UI does not automatically add the dataset to the project.

Dataset Permissions

  • Read permission: A user with read permission on a dataset can search for and open the dataset, read its data through the Matlab toolbox, and download any file associated with the dataset.

  • Edit permission: A user with edit permission on a dataset has read permission plus can add and remove annotations from the dataset. He or she can also add or remove files associated with the dataset.

  • Owner permission: A user with owner permission on a dataset has edit permission and can also modify the dataset's access control list. He or she can also delete the dataset.

Specifying a Project

If you want to use a project to share your dataset, then first you need to have a project which includes you in the admins group. Here are the steps to create a new project. As the project creator you will automatically be added to both the admins and team groups.

  1. Select the Data tab and click "Create Project" button. (You may not see that button if a dataset is selected in the tree viewer. Select the root level 'Datasets' item and the button should appear.)

  2. In the small dialog that pops up enter a name for your project and click OK.

  3. Your project should now appear under the Projects item in the tree viewer. Select the project and click the "Open Project" button.

  4. A new tab opens up representing the project. Here is where you can add users and datasets. Lets add the users.

  5. Select the 'PROJECT ADMINS' item in the left-hand side of the project's tab. You should see a list of all users. Check off the ones you want to add to the project's admins group. The UI has room for improvement here but if you use your browser's built-in search capability you should be able to find the users you want.

  6. Once you have added your project's admins you can add the team members in the same way by selecting the 'PROJECT TEAM' item and checking off the desired users there.

Using the -n argument

Assume I have a project called 'Project 123' and I want to create my new dataset in this project. Then instead of running the ieeg upload-directory command given above, I run it with an addition to the -n argument: I add the project name between the organization and dataset names.

$ ./ieeg-cli-<version>/ieeg upload-directory -n 'Human_Data/University of X/Project 123/userName-myData-test-01' myData
Starting upload to bucket [org-ieeg-inbox] with prefix [3696/Human_Data/University of X/Project 123/userName-myData-test-01]
Scheduling upload of /Users/userName/myData/tsData.edf
Scheduling upload of /Users/userName/myData/images.zip
Upload of tsData.edf 0% done
Upload of images.zip 0% done
Uploading finished: 
Successful: tsData.edf
Successful: images.zip
Registered directory [3696/Human_Data/University of X/Project 123/userName-myData-test-01/.] in bucket [org-ieeg-inbox]

If you do this (and you are an admin in 'Project 123') then dataset userName-myData-test-01 will be added to 'Project 123'. The project's admins will have owner permission and the project's team will have read only permission.

Using a ieeg-dataset.ini file

To have more control over the access to your dataset you can use an ieeg-dataset.ini file. Say you still want to add your new dataset to Project 123 but you only want the project's admins to have read access just like the project's team. Then before creation add a file called ieeg-dataset.ini to your myData directory and give it the following contents:

[Project.Project 123]

admins=read

Then when you run ieeg upload-directory you can use either form of the -n argument. The Project. section head in the ini file tells the importer to add the new dataset to the project named after the dot. The admins=read line will override the default owner permissions for 'Project 123's admins and give them read permission instead. No need to mention the project's team since the default read permission for them is what we want.

Or say you are an admin in a second project, 'Trial Z' and you wanted this new dataset to be added to this project as well as 'Project 123'. You would then use the following in your ieeg-dataset.ini file.

[Project.Project 123]

admins=read

[Project.Trial Z]

admins=edit
team=edit

Then, when your dataset is created it will be added to both projects. The 'Project 123' admins and team will have read access. The 'Trial Z' admins and team will have edit access.

Another use of the ieeg-dataset.ini file is to control world permission. By default a new dataset has no world permission. But if you want to share your new dataset with all registered IEEG users then you would include an ieeg-dataset.ini file in your uploaded directory with the following contents.

world=read

Then any registered user will have read-only access. You can also use world=edit but world=owner is not allowed.

ieeg-dataset.ini format

In the global section the lines world=read or world=edit are allowed and give the respective permission to any registered IEEG user.

The global section can also contain the property channels with value a comma-separated list of channel names. When importing an EDF file this list will be consulted and only the listed channels will be imported into the dataset. For example:

channels=Fp1,Fp2,F7,F3,Fz,F4,F8,A1,A2,C3,Cz,C4,EKG

The channels property can also be used to rename channels when importing an EDF file. For example, if channels looks like

channels=Fp1,Fp2,F7,F3,Fz,F4,F8,A1,T3:T7,C3,Cz,C4,T4:T8,A2,T5:P7,P3,Pz,P4,T6:P8,O1,O2,EKG:ECG

this creates a mapping from EDF to portal labels with T3, T4, T5, T6, and EKG mapped to T7, T8, P7, P8, and ECG respectively. The other channel names will be imported as is. The : is used to separate the EDF name on the left and the portal name on the right for each comma separated value. This can be useful if you need to match channel names with an already existing montage in the portal.

The section names have the form Project.<project name> where project name is an existing project which contains the uploader in its admins group.

In a project section the allowed keys are admins and team. The allowed values are read, edit, and owner. If no keys are present under a project's section then that project's admins are given owner permission and team is given read permission.

Lines starting with a # are comments and ignored by the importer.

Manual permissions

If you need to provide access after the dataset has been created, here is how to manually use projects to grant a group of users permissions to see your dataset:

  1. If your project is open, close it. Datasets generally don't appear in the project's 'PROJECT DATASETS' list unless you have interacted with them in the current login session prior to opening the project. So closing the project before opening any datasets you wish to add to the project is the best bet.

  2. Open a dataset you want to add to the project. Once it loads click the 'Share snapshot' button that appears in the button bar by your username.

  3. A new dialog appears that allows you to modify the dataset's access control list. Click the 'Add Project Group' button. Scroll through to find your project's groups. You'll notice that the groups are listed in order of project name and that there are always two groups per project: '<project name> admins' and '<project name> team'.

  4. Check your project's two groups and click OK. You'll see that they are automatically added to the dataset's access control list with read permission. You can change this to either owner or edit. Or if you made a mistake you can click remove to remove the group entirely. See Dataset Permissions for an explanation of dataset permissions.

  5. Click OK to close the Permissions dialog.

  6. Repeat steps 2-5 for each dataset you have added or need to add to the project.

  7. Open your project again. You should now see all of the datasets you have opened listed under 'PROJECT DATASETS'. If any still need to be added to the project check their boxes. When done you can close the project.

Now when you import a new dataset and want to add it to an existing project you can just repeat steps 8-13 for the new dataset.

Updated