Formatting your data for diXa using ISAcreator


You will need the following:

  1. ISAcreator

    a. Visit the ISA tools page, scroll down and click on your operating system icon under ISAcreator
    b. Download and unzip/install the application

  2. The diXa configurations

    a. Download the latest available here
    b. Unzip this archive (it contains xml files with the configurations for each assay type)
    c. Move the unzipped folder into the Configurations folder of your ISAcreator unzip/install directory (from step 1b). You should now have a path like ISAcreator-1.x/Configurations/isaconfig-diXa2.x/

Note: be sure to never add any other files in the configuration folder, otherwise ISAcreator will not recognize it as a valid configuration

Mapping your existing annotation to ISA-tab

You need to use the "normal" version of ISAcreator (middle option) for this exercise. After you select a configuration (use the diXa-specific configuration – see requirements above) and login, select "create new experiment description" and "map from existing file"

Select your spreadsheet file. Supported formats are txt, csv, xls. In case you have another tabular extension, just rename it to txt.

Make sure your file does not contain blank columns or special characters (such as μ).

Select the kind of assay performed on your data and click "+add assay". Here we have selected "transcription profiling using DNA microarray on Affymetrix".

Now you need to map columns in your spreadsheet to their counterparts in the ISA-tab configuration you have selected, starting with the sample annotations. You can map either to a column, literal (meaning fixed string), or combination of the above. Required columns are in red. In the example below we have mapped the ISA-tab Subject ID column to a concatenation of GROUP_ID, a dash -, and the INDIVIDUAL_ID from our spreadsheet.

TIP: You can use the magnifying glass icon to peek into your spreadsheet Do the same for assay annotations.

Once you are finished mapping, fill out any other useful information about the Investigation you have just mapped. By default, the Study Identifier field will read Mapped Study. You may want to replace that with something more descriptive.

Once you are done, you can save your ISA-tab investigation as a zip archive. Your data files need to be located in the study folder created by ISA-creator (by default in ISAcreator/isatab files/"investigation name"). The filenames must match the names displayed in the field Array Data File. Also,

make sure you use the same Sample Names in ISAcreator as the ones in your data files! Otherwise there is no way to link your annotations to the actual data.

For a list of supported data formats, refer to the end of the document.

Go to File -> Create ISArchive. Your investigation will validate against your configuration, and when it does, it will be saved as a zip archive in the location you have selected. This archive contains your annotations in ISA-tab format, as well as any data files you have linked to your study (such as tables or cel files).

Starting from scratch

Start the "normal" version of ISAcreator (middle option), select a configuration (use the diXa-specific configuration – see requirements section above) and login. Select "create new experiment description", then "create manually".

First, you have to create your first study and name it. Later, you will be able to create and add more study to your investigation.

Next you need to register the main information about the study: an ID, title, description, submission and public release dates. Then create as many array types as was used on your study’s samples. In the screen below, we have setup a study with only one assay: a microarray transcription profiling. It is possible to add (or delete) a new array at any time.

You will then have to register all the required information about the study itself (containing information describing the samples used) and the arrays (containing the metadata describing all the different factors applied to the study samples). Whenever possible, use the recommended ontology terms to describe the factors of your experiments, as described in the following example for tagging Homo Sapiens:

For specifying factor levels you get a screen where you can enter all levels used for this factor and their units (here mapped to OBI:microgram). If no units are required, uncheck the box "use unit?" at the top.

To fill out the different fields, you can either enter data manually (in the case of small studies) or copy/paste from a spreadsheet program (like Excel).

Make sure you use the same Sample Names in ISAcreator as the ones in your data files! Otherwise there is no way to link your annotations to the actual data.

Now you can create an ISArchive as before (File -> Create ISArchive).

Using the Excel templates

You will need the ISA-config-diXa2.x-template.xlsx file downloadable here.

This is an export of the diXa ISA configuration in Excel. It follows the same principles for data collection (i.e. describing your Investigation, Study and Assay parameters) as the methods above. If you are having major difficulties with ISAcreator, pick this method.

You’ll notice that headers contain a red triangle in the top right corner. Hover over them, and you’ll see an explanation of what you need to enter in that column.

Make sure you fill out the investigation AND studySample tabs AND at least one assay tab (for example, transcription_micro for transcriptomics microarray experiments) for each experiment you wish to submit.

When you’re done, please place your Excel sheet in a folder with your data and zip everything up. Name the zip archive the same as your experiment ID. This is the archive you must submit.

Make sure you use the same Sample Names in your Excel sheet as the ones in your data files! Otherwise there is no way to link your annotations to the actual data.

Supported raw data formats

NGS data

  • Illumina QSEQ
  • BAM
  • Bowtie
  • Eland
  • SAM
  • BED
  • Matrix
  • Methyl-C
  • Wiggle

MS data

  • AB Sciex Wiff
  • AB Sciex t2d
  • Agilent
  • Agilent QTOF
  • Bruker MALDI
  • Bruker YEP
  • Ciphergen XML
  • Leco
  • Shimadzu
  • Thermo
  • Waters
  • MzXML
  • NetCDF
  • Spectrometry Binary Format (.sbf)
  • Spectrometry Text Format (.txt, stf)
  • Two Column Files (.two)
  • MSP files

Affymetrix data

  • CEL Files

Numerical data

(i.e. derived/condensed data with or without auxiliary data, e.g., p-values)

  • Affymetrix CHP
  • Agilent
  • Combimatrix
  • Fluidigm
  • Illumina
  • NanoString
  • NimbleGen
  • TaqMan
  • Geo SOFT
  • ABS (Genedata format)
  • GDA (Genedata Format)