Wiki

Clone wiki

enterobase-web / EnteroBase Backend Pipeline: RCatch

Top level links:

RCatch

Overview

RCatch implements the automated downloading of SRAs.

In order to accomplish this task, RCatch provides a standalone RESTful service which responds to HTTP requests for downloading SRAs from NCBI. RCatch also implements uploading SRAs to the S3 interface in CLIMB.

RCatch is written in Python and uses Flask to offer web-based APIs, PostgreSQL to store information, and ZeroMQ as a broker between them.

Because of occasional communication break downs, RCatch tries five successive protocols for downloading short reads before giving up (with an ERROR message). These protocols are:

  1. FASTQ/GZIP file from ENA/EBI, using Aspera.
  2. FASTQ/BZIP2 file from DRA/DDBJ, using Aspera.
  3. FASTQ/GZIP file from ENA/EBI, using FTP.
  4. SRA file from SRA/NCBI, using Aspera.
  5. SRA file from SRA/NCBI, using FTP.

RCatch automatically reformats FASTQ/BZIP2 files or SRA files into FASTQ/GZIP files after downloading.

API

RCatch URI

In the examples below, the RCatch URI is configuration dependent, depending on which system RCatch runs.

Downloading short reads

The get method is used to download short reads. An example of downloading short reads is provided below which downloads short with accession codes ERR036000 and ERR036001:

http://<RCatch Host>/ET/RCatch/get?run=ERR036000,ERR036001

Another example of downloading short reads that also controls the priority is provided below:

http://<RCatch Host>/ET/RCatch/get?run=ERR036002,ERR036003,ERR036004&priority=-1

The lower the number, the higher the priority. By default priority=0.

Delete downloaded short reads

An example of deleting downloaded short reads is provided below which deletes short reads with accession codes ERR036002, ERR036003 and ERR036004:

http://<RCatch Host>/ET/RCatch/del?run=ERR036002,ERR036003,ERR036004

Priority of tasks

Below is an example which changes the priority of a existing task, for a short read with accession code ERR0360000 to priority=2:

http://<RCatch Host>/ET/RCatch/priority?run=ERR036000&priority=2

Controlling the choice of downloading protocols

It is possible to control the choice of downloading protocols using the source method. Below is an example which change the downloading protocols (i.e. do not download from DRA and try SRA before ENA):

http://<RCatch Host>/ET/RCatch/source?sites=SRA,SRA-FTP,ENA,ENA-FTP

Default order of downloading protocols are : ENA,DRA,ENA-FTP,SRA,SRA-FTP.

Updated