Wiki
Clone wikiLazypipe / exercises / Analyzing-SRA-with-Lazypipe.v2
Analyzing SRA data with Lazypipe
In this example we will:
- setup NCBI SRA Toolkit
- download SARS-CoV-2 SRA libraries
- run default Lazypipe analysis with sbatch-lazypipe
Prerequisites:
- access to CSC Puhti account
- Lazypipe 2.1 CSC module
- basic knowledge of Unix command line
Download data
In the following example we assume that $data
environment variable points to your data directory and $lazypipe
to your installation/working directory. Defining these variables will allow you to copy paste example code to your terminal without editing. To set the variables type:
data=/my/data/path
lazypipe=/my/lazypipe/installation/directory
Start by configuring SRA Toolkit with vdb-config utility (included in the kit). Set SRA Toolkit download directory to $data/sra
or any other convenient location:
module load biokit
vdb-config -i
Download any SRA library for project PRJNA605983 (Illumina HiSeq/MiSeq libraries sequenced from five patients at the early stage of SARS2 outbreak in Wuhan, China). In the following example we will download SRR11092062 to $data/sra
:
mkdir $data/sra
prefetch SRR11092062
fasterq-dump --split-files --outdir $data/sra
Download human reference genome for host read filtering. Download to $data/hostgen
:
mkdir -p $data/hostgen
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz -P $data/hostgen
Load required modules and run main analysis steps (pre,ass,rea,ann,rep,stats,pack,clean) with default settings using sbatch-lazypipe. When prompted, set run-time to 6 h (6:0:0), memory to 120 GB and cores to 30.
cd $lazypipe
mkdir -p $data/results
module load r-env-singularity
module load biokit
module load lazypipe
sbatch-lazypipe -1 $data/sra/SRR11092062_1.fastq --pipe main -r $data/results -t 16 --hostgen $data/hostgen/GCA_000001405.15_GRCh38_genomic.fna.gz -v
Check that job is in-queue/running
sacct
After your job completes check your results from $data/results
:
ls -l $data/results
This should contain SRR11092062 directory and SRR11092062.tar.gz packed result files.
End notes
This completes our example on Analyzing SRA with Lazypipe.
For more information see Lazypipe User Guides
Updated