galaxy / galaxy-central

Main development repository for Galaxy. Active development happens here, and this repository is thus intended for those working on Galaxy development. See http://bitbucket.org/galaxy/galaxy-dist/ for a more stable repository intended for end-users.

Clone this repository (size: 44.8 MB): HTTPS / SSH
$ hg clone http://bitbucket.org/galaxy/galaxy-central/

NOTE: The ENCODE dataset tool is deprecated. The datasets are from the ENCODE pilot project and are outdated at this point. The tool will be replaced with a new "Data Library" containing ENCODE data

Adding Additional ENCODE Datasets

Adding additional datasets to the encode import tool involves editing the file /cache/encode_datasets/encode_datasets.loc which is located on g2.bx.psu.edu.

Currently, only files adhering to the Browser Extensible Data (BED) format are allowed.

Once you have added your datasets, the Galaxy server must be reset so that it can be made aware of the changes.

Format of encode_datasets.loc

  • Tab-delimited file
  • There are 5 required fields
  • Lines beginning with # are ignored

Description of Fields

First Field

  • Abbreviation of the Encode Group where data belongs
  • Valid abbreviations are as follows:
  • CC = Chromatin and Chromosomes
  • GT = Genes and Transcripts
  • MSA = Multi-species Sequence Analysis
  • TR = Transcription Regulation

Second Field

  • Database build for which the data is valid
  • Examples:
  • hg17
  • hg16

Third Field

  • Description of the dataset
  • This is displayed in the tool's select page and also the history

Fourth Field

  • A unique ID for the dataset
  • Any combination of letters and/or numbers is acceptable
  • Except the keyword None, do not use it or else your data won't be accessible
  • Make sure that the ID that you select is different than any other
  • If not, one of the datasets will be unknown to the tool

Fifth Field

  • The full path including file name of the dataset you are adding
  • This file must be accessible to the Galaxy Server

An Example Entry

You want to add a dataset with the following characteristics:

  • Belongs in the Chromatin and Chromosomes group
  • Is based on the hg17 build
  • Has the description of "Some really cool data"
  • The file is located (accessible to the galaxy server) at the path of /cache/encode_datasets/encodeData1.bed
  • You checked, and double checked, that the ID you want, encodeCCReallyCoolData, hasn't been taken yet The entry would look like this:
CC	hg17	Some really cool data	encodeCCReallyCoolData	/cache/encode_datasets/encodeData1.bed

Some Questions/Answers

Why doesn't my data set appear?

  • You didn't reset the server
  • The server must be reset in order for the tool to be aware of its presence
  • You did not include all the required fields
  • Fields are delimited by tabs
  • The file you specified isn't accessible to the Galaxy server
  • Check permissions
  • The file you specified doesn't exist
  • Check your spelling
  • You used an ID (field 4) which matches another dataset
  • Or someone reused your ID

This revision is from 2009-11-18 18:15