Wiki
Clone wikiLazypipe / exercises / Running-Lazypipe-on-Puhti.v2
Running Lazypipe on Puhti
Welcome to the Running Lazypipe on Puhti. This module is intended for practicing basic NGS analysis with Lazypipe 2.1 on CSC Puhti supercluster. In this module you will learn to:
- set up working environment on CSC Puhti
- run Lazypipe analysis with lazypipe.pl
- run Lazypipe analysis with sbatch-lazypipe
- save/share your results with Fairdata IDA
Prerequisites:
- account on CSC Puhti
- Lazypipe 2.1 CSC module
- no experience with Unix command line or NGS analysis is required
For more information please refer to these guides:
Exercise 1: setting up working environment
In this exercise you will setup working environment for running Lazypipe on CSC Puhti.
Connecting to CSC Puhti server
Users new to Unix/CSC working environment:
Both MacOS and Windows users can access Puhti via Puhti web-interface. We recommend this option for all users that are new to Unix/CSC working environment:
- Login to Puhti web-interface by following the link: Puhti web-interface
- From the main Dashboard click on "Login node shell" to open the terminal
Experienced Unix/CSC users working on MacOS:
MacOS users can connect to Puhti with ssh client from Terminal.
- start by opening Terminal utility: From Finder menu select Go and Applications. From Utilities select Terminal
- From Terminal select Shell, New Window and Basic (black on white layout) or Homebrew (white on black layout).
- In the terminal type (change username to your username):
ssh -X username@puhti.csc.fi -l username
Experienced Unix/CSC users working on Windows:
Download and install Putty SSH client for windows from https://www.putty.org
Start Putty. You will see a window with connection settings. In the “Host Name (or IP address)” field, type:
puhti.csc.fi
Setting up working environment
After you have logged in to Puhti continue working in your terminal. Work through the exercises by copy-pasting or typing commands to your terminal and hitting enter.
Start by checking which projects you have access to:
csc-workspaces
As an example we will use project project_2002989. However, you can use any project you have access to.
CSC supercomputers have three main disk areas: home, projappl and scratch.
For a short intro see CSC Disk Areas.
We will create directories for data in the scratch
and one directory for the Lazypipe application in the projappl disk areas.
In the following examples we will use variable $USER
that will be automatically substituted for your username.
Thus, you can copy-paste the example commands without editing to your terminal.
Create data directory named $USER
in the project´s scratch disk area.
Create subdirectories data
and results
:
mkdir /scratch/project_2002989/$USER/ mkdir /scratch/project_2002989/$USER/data mkdir /scratch/project_2002989/$USER/results
Create application directory named $USER
in the project´s projappl disk area.
Create subdirectory named "lazypipe":
mkdir /projappl/project_2002989/$USER/ mkdir /projappl/project_2002989/$USER/lazypipe
It is convenient to define environment variables referring to your directories.
To do this you will need to edit .bashrc
file in your home directory.
In the Puhti web-interface navigate to your "Home Directory".
Click "Show Dotfiles" checbox at the top of your file list.
Locate .bashrc
file and start editing by clicking on the menu next to the file name and selecting Edit.
In the .bashrc
file add the following two lines and
save the file by clicking Save button at the top left.
export data=/scratch/project_2002989/$USER export lazypipe=/projappl/project_2002989/$USER/lazypipe
Now open the same file in the terminal with unix less.
To navigate less use up/down arrows, to exit less type q.
You should see the added lines in the .bashrc
file.
less ~/.bashrc
Load your variables (will autoload on the next login):
source ~/.bashrc
You should now have variables $data
and $lazypipe
available on the command line.
Check that these variables exist and point to the right directories by using echo:
echo $data echo $lazypipe
These should print full paths to your data and application directories:
/scratch/project_2002989/username/data /projappl/project_2002989/username/lazypipe
Now check that the directories exist by listing directory content with ls (note that \$lazypipe remains empty at this point) :
ls $data ls $lazypipe
Loading modules and creating config.yaml
Go to your Lazypipe application directory and load required modules
cd $lazypipe module load r-env-singularity module load biokit module load lazypipe
Copy default config.yaml
file to your application directory.
Then set tmpdir Lazypipe variable to point to your application directory
and set taxonomy variable to point to taxonomy subdirectory:
cd $lazypipe cp /appl/soft/bio/lazypipe/2.1/lazypipe/config.yaml config.yaml echo tmpdir: "$lazypipe" >> config.yaml echo taxonomy: "$lazypipe/taxonomy" >> config.yaml
Testrun lazypipe.pl: the command should print command-line usermanual:
lazypipe.pl -h
Exercise 2: Running Lazypipe with lazypipe.pl
In this exercise you will get familiar with basic Lazypipe commands.
According to CSC user policy: “The login nodes can be used for light pre- and postprocessing, compiling applications and moving data. All other tasks are to be done on the compute nodes using the batch job system.”
We will run this example on the login node because it is small scale.
Start by copying sample PE data to your $data/data
directory:
cp /appl/soft/bio/lazypipe/2.1/lazypipe/data/samples/M15small_R* $data/data/
Run read preprocessing:
cd $lazypipe
lazypipe.pl -1 $data/data/M15small_R1.fastq --pipe pre -t 4 -v
Run assembling:
lazypipe.pl -1 $data/data/M15small_R1.fastq --pipe ass -t 4 -v
Run read realignment to the created assembly:
lazypipe.pl -1 $data/data/M15small_R1.fastq --pipe rea -t 4 -v
Run 1st round annotation with SANSparallel against UniProt TrEMBL:
lazypipe.pl -1 $data/data/M15small_R1.fastq --pipe ann --ann sans -t 4 -v
Generate reports (mustdo before 2nd round annotation):
lazypipe.pl -1 $data/data/M15small_R1.fastq --pipe rep -t 4 -v
Run 2nd round annotation with Blastn against GeneBank virus genomes:
lazypipe.pl -1 $data/data/M15small_R1.fastq -p blastv -t 4 -v
Generate assembly stats, pack for sharing and clean up temporary files:
lazypipe.pl -1 $data/data/M15small_R1.fastq -p stats,pack,clean -t 4 -v
Your results are output to $res/$sample
,
where $res
is the root result directory and $sample
is the input sample name.
By default results are output to results/read1-filename
.
Check the content of your result directory:
ls -l results
ls -l results/M15small*
Exercise 3: Running Lazypipe with sbatch-lazypipe
sbatch-lazypipe is a help tool that automatically generates a configuration file and a batch job file for a Lazypipe run and submits the job to batch job system of Puhti. The command uses the same command line options as the lazypipe.pl command. In addition sbatch-lazypipe asks user to define batch job resources (account, run time, memory, number of cores). The required memory and time will depend on the size of your input library. As a rule of thumb we recommend using 5GB of memory per core (e.g. 80GB for 16 cores).
Run default analysis for M15small_R1.fastq
sample and output results to $data/results/M15_ex3
. Note that in the following call main pipeline steps (pre,ass,rea,ann,rep,stats,pack,clean) are referred using main tag. When prompted, set run-time to 5 min (0:5:0), memory to default (~32 GB) and cores to 8.
sbatch-lazypipe -1 $data/data/M15small_R1.fastq --pipe main -r $data/results -s M15_ex3 -v
Check that job is in-queue/running
sacct
After your job completes run 2nd round annotation with blastn. Redo reporting and repack results. Make sure you specify the same --res dir and --sample dir. When prompted, set run-time to 5 min (0:5:0), memory to default (~32 GB) and cores to 8.
sbatch-lazypipe -1 $data/data/M15small_R1.fastq --pipe blastv,rep,pack -r $data/results -s M15_ex3 -v
While your 2nd round annotation is running start another job that will output results to a different location. Use Minimap2 for 1st round annotation and blastn against GeneBank viruses for the 2nd round annotation. For this job we recommend setting run-time to 1 h (1:00:0), memory to 120 GB and number of cores to 32.
sbatch-lazypipe -1 $data/data/M15small_R1.fastq --pipe main,blastv --ann minimap -r $data/results -s M15_ex3.2 -v
While this analysis is running you can move on the next exercise.
Exercise 4: saving/sharing results on ida.fairdata.fi
Setup ida to connect to your designated project by editing the .ida-config
file in your home directory.
Login to Puhti web-interface by following the link: Puhti web-interface.
Navigate to you "Home Directory". Click "Show Dotfiles" checbox at the top of your file list.
Locate .ida-config
file and start editing by clicking on the menu next to the file name and selecting Edit.
If you don't have .ida-config
file create it by clicking "New File" at the top panel.
In the .ida-config
file add the following two lines.
In this exercise we use project 2002989 but you can use any CSC project you have access to.
Save the file by clicking Save button at the top left.
IDA_PROJECT="2002989" IDA_HOST="https://ida.fairdata.fi"
Now open the same file in the terminal with unix less.
You should see the added lines in the .ida-config
file.
Exit less by typing q:
less ~/.ida-config
Now you can upload your results to ida.fairdata.fi with ida module. Move to your result directory and check that your have M15_ex3.tar.gz (or similar) file ready for saving/sharing.
cd $data/results ls -l
Load ida module and start uploading (change my_dir to the name of subdirectory you wish to create and load your data to on Fairdata IDA):
module load ida ida upload my_dir/M15_ex3.tar.gz M15_ex3.tar.gz
When the upload completes you should see the uploaded file appear under 2002989+/my_dir/.
You can also save results to IDA by first dowloading them to your computer:
- Login to Puhti web-interface by following the link: Puhti web-interface
- Navigate to your result directory /scratch/project_2002989/username/results. Locate your result file (e.g. M15_ex3.tar.gz) click on the menu and select Download
- Open Fairdata IDA in your web-browser and login.
- Navigate to the Staging 2002989 project (or your designated project) and your subdirectory. Upload the results by clicking the "+" sign at the top panel and selecting the dowloaded M15_ex3.tar.gz file.
End notes
This completes Running Lazypipe on Puhti module.
For more information see Lazypipe User Guides
Updated