Wiki
Clone wikienterobase-web / EnteroBase Backend Pipeline: RCatch
Top level links:
- Main top level page for all documentation
- EnteroBase Features
- Registering on EnteroBase and logging in
- Tutorials
- Using the API
- About the underlying pipelines and other internals
- How schemes in EnteroBase work
- FAQ
RCatch
Overview
RCatch implements the automated downloading of SRAs.
In order to accomplish this task, RCatch provides a standalone RESTful service which responds to HTTP requests for downloading SRAs from NCBI. RCatch also implements uploading SRAs to the S3 interface in CLIMB.
RCatch is written in Python and uses Flask to offer web-based APIs, PostgreSQL to store information, and ZeroMQ as a broker between them.
Because of occasional communication break downs, RCatch tries five successive protocols for downloading short reads before giving up (with an ERROR message). These protocols are:
- FASTQ/GZIP file from ENA/EBI, using Aspera.
- FASTQ/BZIP2 file from DRA/DDBJ, using Aspera.
- FASTQ/GZIP file from ENA/EBI, using FTP.
- SRA file from SRA/NCBI, using Aspera.
- SRA file from SRA/NCBI, using FTP.
RCatch automatically reformats FASTQ/BZIP2 files or SRA files into FASTQ/GZIP files after downloading.
API
RCatch URI
In the examples below, the RCatch URI is configuration dependent, depending on which system RCatch runs.
Downloading short reads
The get
method is used to download short reads.
An example of downloading short reads is provided below which downloads short with accession codes
ERR036000 and ERR036001:
http://<RCatch Host>/ET/RCatch/get?run=ERR036000,ERR036001
Another example of downloading short reads that also controls the priority is provided below:
http://<RCatch Host>/ET/RCatch/get?run=ERR036002,ERR036003,ERR036004&priority=-1
The lower the number, the higher the priority. By default priority=0.
Delete downloaded short reads
An example of deleting downloaded short reads is provided below which deletes short reads with accession codes ERR036002, ERR036003 and ERR036004:
http://<RCatch Host>/ET/RCatch/del?run=ERR036002,ERR036003,ERR036004
Priority of tasks
Below is an example which changes the priority of a existing task, for a short read with accession code ERR0360000 to priority=2:
http://<RCatch Host>/ET/RCatch/priority?run=ERR036000&priority=2
Controlling the choice of downloading protocols
It is possible to control the choice of downloading protocols using the source
method.
Below is an example which change the downloading protocols (i.e. do not download from DRA and try SRA before ENA):
http://<RCatch Host>/ET/RCatch/source?sites=SRA,SRA-FTP,ENA,ENA-FTP
Default order of downloading protocols are : ENA,DRA,ENA-FTP,SRA,SRA-FTP.
Updated