Wiki

Clone wiki

Lazypipe / exercises / Analyzing-SRA-with-Lazypipe.v2

Analyzing SRA data with Lazypipe

In this example we will:

  • setup NCBI SRA Toolkit
  • download SARS-CoV-2 SRA libraries
  • run default Lazypipe analysis with sbatch-lazypipe

Prerequisites:

  • access to CSC Puhti account
  • Lazypipe 2.1 CSC module
  • basic knowledge of Unix command line

Download data

In the following example we assume that $data environment variable points to your data directory and $lazypipe to your installation/working directory. Defining these variables will allow you to copy paste example code to your terminal without editing. To set the variables type:

data=/my/data/path
lazypipe=/my/lazypipe/installation/directory

Start by configuring SRA Toolkit with vdb-config utility (included in the kit). Set SRA Toolkit download directory to $data/sra or any other convenient location:

module load biokit
vdb-config -i

Download any SRA library for project PRJNA605983 (Illumina HiSeq/MiSeq libraries sequenced from five patients at the early stage of SARS2 outbreak in Wuhan, China). In the following example we will download SRR11092062 to $data/sra:

mkdir $data/sra
prefetch SRR11092062
fasterq-dump --split-files --outdir $data/sra

Download human reference genome for host read filtering. Download to $data/hostgen:

mkdir -p $data/hostgen
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz -P $data/hostgen

Load required modules and run main analysis steps (pre,ass,rea,ann,rep,stats,pack,clean) with default settings using sbatch-lazypipe. When prompted, set run-time to 6 h (6:0:0), memory to 120 GB and cores to 30.

cd $lazypipe
mkdir -p $data/results
module load r-env-singularity
module load biokit
module load lazypipe
sbatch-lazypipe -1 $data/sra/SRR11092062_1.fastq --pipe main -r $data/results -t 16 --hostgen $data/hostgen/GCA_000001405.15_GRCh38_genomic.fna.gz -v

Check that job is in-queue/running

sacct

After your job completes check your results from $data/results:

ls -l $data/results

This should contain SRR11092062 directory and SRR11092062.tar.gz packed result files.

End notes

This completes our example on Analyzing SRA with Lazypipe.

For more information see Lazypipe User Guides

Updated