Wiki

Lazypipe User Guide

Running Lazypipe v3.0 on a Linux cluster

Table of Content

About Lazypipe
Running on CSC
Installing
Running Lazypipe
command-line options
config.yaml options
Citing Lazypipe
Contact

About Lazypipe

Lazypipe is a bioinformatic pipeline for analyzing virus and bacteria metagenomics from NGS data.

Lazypipe flowchart Figure 1. Lazypipe workflow

Lazypipe supports:

fastq preprocessing
de novo assembling
taxonomic binning
taxonomic profiling
reporting
- mapped contigs sorted by taxa
- virus contigs
- unmapped contigs
- contig annotations (tsv and excel)
- taxon abundancies (tsv and excel)
quality control plots

Running Lazypipe on CSC

Lazypipe can be quickly assessed using a preinstalled module at the Finnish Center of Scientific Computing.

Installing Lazypipe

Setting up directories

Create root directory $data and subdirectories for storing reference databases, NCBI taxonomy, host genomes and Lazypipe results (change /my/data/path/ according to your preferences):

data=/my/data/path
mkdir -p $data $data/databases $data/taxonomy $data/hostgen $data/results

For convenience add environment variable $data pointing to your root directory. To add the variable locate .bashrc file in your home directory and add this line to the file:

export data=/my/data/path

Cloning the repository

git clone https://plyusnin@bitbucket.org/plyusnin/lazypipe.git
cd lazypipe

Installing dependencies

Installing dependencies with Conda

We recommend installing BLAST under a separate Conda environment labeled blast:

conda create -n blast -c bioconda blast

All other dependencies can be installed under environment labeled Lazypipe:

conda create -n lazypipe -c bioconda -c eclarke bwa csvtk fastp krona megahit mga minimap2 samtools seqkit spades taxonkit trimmomatic numpy scipy requests

Mac users installing to M1/M2 ARM64 architecture: Prior to installing bio-packages configure Conda with conda config --add subdirs osx-64. You may also need to install MGA binary manually (see Table 1).

To activate all installed dependencies type:

conda activate blast
conda activate --stack lazypipe

Set taxonomy database location for KronaGraph:

 rm -rf $CONDA_PREFIX/conda/env/lazypipe/opt/krona/taxonomy
 ln -s $data/taxonomy $CONDA_PREFIX/conda/env/lazypipe/opt/krona/taxonomy

Set env variable $TM to point to trimmomatic directory:

 export TM=$CONDA_PREFIX/share/trimmomatic

Download PANNZER (version 02/2022 or later) and set runsanspanz.py as executable to your path:

wget http://ekhidna2.biocenter.helsinki.fi/sanspanz/SANSPANZ.3.tar.gz
tar -zxvf SANSPANZ.3.tar.gz
echo '#!'$(which python) 1> SANSPANZ.3/runsanspanz.ex.py
cat SANSPANZ.3/runsanspanz.py >> SANSPANZ.3/runsanspanz.ex.py
chmod 755 SANSPANZ.3/runsanspanz.ex.py
ln -sf $(pwd)/SANSPANZ.3/runsanspanz.ex.py ~/bin/runsanspanz.py

Installing dependencies manually

Download and unpack dependencies listed in Table 1. Then copy or link these executables to your ~/bin folder. For example:

wget https://github.com/lh3/minimap2/releases/download/v2.24/minimap2-2.24_x64-linux.tar.bz2
tar -xjvf minimap2-2.24_x64-linux.tar.bz2
cp minimap2-2.24_x64-linux/minimap2 ~/bin/

Tool	Website	Download binaries	Original article
blast	https://blast.ncbi.nlm.nih.gov/	blast+/LATEST/	https://doi.org/10.1186/1471-2105-10-421
bwa-mem	https://github.com/lh3/bwa	bio-bwa/files	https://arxiv.org/abs/1303.3997
csvtk	https://bioinf.shenwei.me/csvtk/	csvtk/download
fastp	https://github.com/OpenGene/fastp	http://opengene.org/fastp/fastp	https://doi.org/10.1093/bioinformatics/bty560
KronaTools	https://github.com/marbl/Krona/wiki/KronaTools	NA	https://doi.org/10.1186/1471-2105-12-385
MEGAHIT	https://github.com/voutcn/megahit/	IMEGAHT-1.2.9-Linux-x86_64-static.tar.gz	https://doi.org/10.1016/j.ymeth.2016.02.020
MGA	http://metagene.nig.ac.jp/metagene/	http://metagene.nig.ac.jp/metagene/download_mga.html	https://doi.org/10.1093/nar/gkl723
minimap2	https://github.com/lh3/minimap2	minimap2-2.24_x64-linux.tar.bz2	https://doi.org/10.1093/bioinformatics/bty191
PANNZER/SANS	http://ekhidna2.biocenter.helsinki.fi/sanspanz/	SANSPANZ.3.tar.gz	https://doi.org/10.1002/pro.4193
TaxonKit	https://bioinf.shenwei.me/taxonkit/	taxonkit/releases/tag/v0.9.0	https://doi.org/10.1016/j.jgg.2021.03.006
[Trimmomatic]	https://github.com/usadellab/Trimmomatic	v0.39.tar.gz	https://doi.org/10.1093/bioinformatics/btu170
Samtools	http://www.htslib.org/	samtools-1.14.tar.bz2	https://doi.org/10.1093/gigascience/giab008
SeqKit	https://bioinf.shenwei.me/seqkit/	seqkit_linux_amd64.tar.gz	https://doi.org/10.1371/journal.pone.0163962
[SPAdes]	https://github.com/ablab/spades	SPAdes-3.15.3-Linux.tar.gz	https://doi.org/10.1002/cpbi.102

Table 1: Lazypipe dependencies Tools in square brackets mark binaries that are not required for basic Lazypipe runs. When installed, these will provide additional options/functionalities.

Installing Perl modules

Install modules to local-lib ~/perl5

cpan --local-lib=~/perl5 File::Basename File::Temp Getopt::Long YAML::Tiny
export PERL5LIB=~/perl5/lib/perl5:{$PERL5LIB}

Installing R libraries

Open R console and type

install.packages( c("reshape","openxlsx") );

Installing reference databases

Install NCBI Taxonomy to default location ($data/taxonomy) by running:

perl perl/install_db.pl --db taxonomy

Download and unpack reference databases for 1st and 2nd round annotations. You can choose to install RefSeq/UniRef100 databases (Table 2), NT/UniRef100 databases (Table 3) or Viral databases (Table 4). RefSeq/UniRef100 databases are suited for annotating established taxa with small disk and time overhead. NT/UniRef100 databases have better coverage for novel taxa and may produce more accurate annotations, however the disk/time overhead is also higher. Viral databases are small databases intended for annotating only viral taxa with minimum disk/time overhead.

Use install_db.pl script to install databases from URLs listed in config.yaml. Too install RefSeq/UniRef100 databases to default path ($data/databases) call:

perl perl/install_db.pl --db minimap.refseq.abv -v
perl perl/install_db.pl --db minimap.refseq.vi -v
perl perl/install_db.pl --db blastn.refseq.ab -v
perl perl/install_db.pl --db blastn.refseq.vi -v
perl perl/install_db.pl --db blastp.vi -v

URL	Size *.gz (GB)	Description
minimap.refseq.abv.release221.tar.gz	5.7	Minimap2 index for RefSeq archaea, bacteria and viruses
minimap.refseq.vi.release221.tar.gz	0.16	Minimap2 index for RefSeq viruses
blastn.refseq.ab.release221.tar.gz	4.4	BLASTN index for RefSeq archaea and bacteria
blastn.refseq.vi.release221.tar.gz	0.14	BLASTN index for RefSeq viruses
blastp.uniref100.vi.2024_01_24.tar.gz	0.48	BLASTP index for UniRef100 viruses

Table 2: RefSeq/UniRef100 databases

URL	Size *.gz (GB)	Description
minimap.nt.abv.tar.gz	77	Minimap2 index for NCBI NT archaea, bacteria and viruses
blastn.nt.ab.tar.gz	44	BLASTN index for NCBI NT archaea and bacteria
blastn.nt.vi.tar.gz	9.7	BLASTN index for NCBI NT viruses
blastp.uniref100.ab.tar.gz	33	BLASTP index for UniRef100 archaea and bacteria
blastp.uniref100.vi.release.tar.gz	0.48	BLASTP index for UniRef100 viruses

Table 3: NT/UniRef100 databases

URL	Size *.gz	Description
minimap.refseq.vi.tar.gz	160 MB	Minimap2 index for RefSeq viruses
blastn.refseq.vi.tar.gz	130 MB	BLASTN index for RefSeq viruses
blastp.uniref100.vi.tar.gz	480 MB	BLASTP index for UniRef100 viruses

Table 4: Viral databases

Open config.yaml and check that database paths match the location and version of the installed databases. Edit these line in config.yaml:

ann1.databases:
   minimap.nt:        "$data/databases/nt.abv.2024_01_01.fa"
   minimap.refseq:    "$data/databases/refseq.abv.release221.fa"
   ..

ann2.databases:
   blastn.ab.nt:      "$data/databases/blastn.nt.ab.2024_01_01"
   blastn.vi.nt:      "$data/databases/blastn.nt.vi.2024_01_01"
   ..

If you wish to annotate bacteriophages, specify minimap.ph, blastn.ph and blastp.ph in your config.yaml. Use blastn/blastp virus databases or your custom bacteriophage databases:

ann1.databases:
   minimap.ph: $data/databases/minimap.GPD.ph.fasta
   blastn.ph:  $data/databases/blastn.nt.vi.2024_01_01
   blastp.ph:  $data/databases/blastp.uniref100.vi.2024_01_24
ann2.databases:
   blastn.ph:  $data/databases/blastn.nt.vi.2024_01_24
   blastp.ph:  $data/databases/blastp.uniref100.vi.2024_01_24

Naming convention for reference databases

Reference sequence databases are defined in config.yaml as key-value pairs under ann1.databases and ann2.databases. Here, each key is a string referring to the SearchTool and TargetTaxa, and each value is a path to the applied database. For the 1st round, annotation keys are named SearchTool[.dbid] and for the 2nd round SearchTool.TargetTaxa[.dbid]. In both rounds SearchTool can be sans, minimap, blastn or blastp. TargetTaxa can be abv (ie Archaea, Bacteria and Viruses), ab (ie Archaea and Bacteria), vi (Viruses), ph (Bacteriophages) or un (unmapped). You can use an optional dbid string to differentiate between similar databases. For any annotation step you can use any database, default or custom. For BLASTN/BLASTP use blast indices. For minimap2 you can use .fasta* or .mmi files; note that these must have an accomponing .acc2taxid tsv-map (see default minimap2 databases for an example).

Running Lazypipe

Example 1

In this example we will use a sample PE library that is included with the repository (data/M15small_R*.fastq).

Preprocess reads with fastp:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe pre -t 8 -v

Download Neovison vison genome and use it to filter host reads. Note that running host filtering with a newly downloaded genome will take some time to index the genome:

mkdir -p $data/hostgen
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/108/605/GCA_900108605.1_NNQGG.v01/GCA_900108605.1_NNQGG.v01_genomic.fna.gz -P $data/hostgen/
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe flt --hostgen $data/hostgen/GCA_900108605.1_NNQGG.v01_genomic.fna.gz -t 8 -v

Run assembling with Megahit and realign reads to assembly

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ass,rea --ass megahit -t 8 -v

Run 1st round annotation with Minimap2 against your local minimap.refseq database:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 minimap.refseq -t 8 -v

Run 1st round annotation with SANSparallel against UniProt TrEMBL. Note that SANSparallel runs on a remote server and requires internet connection. Append results to Minimap2 annotations from the previous step:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 sans --append -t 8 -v

Now run a more complex 1st round annotation. Start by mapping contigs with Minimap2, then map unmapped contigs with SANSparallel then map unmapped contigs with BLASTN against blastn.vi database. Note that without --append flag this will overwrite existing 1st round annotations:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 minimap.refseq,sans,blastn.vi -t 8 -v

Run 2nd round annotation. In the second round you can target archaeal+bacterial (=ab), bacteriophage (=ph), viral (=vi) and unmapped (=un) contigs, based on labeling from the 1st round. Local databases for the 2nd round annotations are defined in ann2.databases section of the config.yaml. For example, to map viral contigs with BLASTN and BLASTP against local viral databases type:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.vi.refseq,blastp.vi -t 8 -v

Run 2nd round annotation for bacteria with BLASTN. Append results to BLASTN and BLASTP annotations from the previous step:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.ab.refseq --append -t 8 -v

You can also combine these runs in any order. For example:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.ab.refseq,blastn.vi.refseq,blastp.vi -t 8 -v

The most common combinations of 1st and 2nd round annotations can be saved to config.yaml in the ann.strategies section. Each annotation strategy is saved as a key-value pair. There are several annotation strategies predifined:

abv.fast -- run only the 1st round with Minimap2 against RefSeq.abv
abv.nt -- 1st round: Minimap2 against NT.abv, 2nd round: BLASTN viral reads against NT.vi and archaeal+bacterial reads against NT.ab
abv.refseq -- 1st round: Minimap2 against RefSeq.abv, 2nd round: BLASTN viral reads against RefSeq.vi and archaeal+bacterial reads against RefSeq.ab
abv.extend -- 1st round: Minimap2 against NT.abv + SANSparallel unmapped reads against TrEMBL, 2nd round: BLASTN viral reads against NT.vi and archaeal+bacterial reads against NT.ab, additionally BLASTP viral reads against UniRef100.vi and archaeal+bacterial reads against UniRef100.ab
vi.nt -- 1st round: Minimap2 against NT.vi, 2nd round: BLASTN viral reads against NT.vi
vi.refseq -- 1st round: Minimap2 against RefSeq.vi, 2nd round: BLASTN viral reads against RefSeq.vi

Generate reports based on created annotations:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe rep -t 8 -v

Generate assembly stats, pack for sharing and remove temporary files:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p stats,pack,clean -t 8 -v

For convenience, routine analysis steps (pre,flt,ass,rea,ann1,ann2,rep,sta,pack,clean) can be called with main tag. To run main analysis with abv.refseq annotation strategy type:

perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p main --anns abv.refseq -t 8 -v

Example 1: generated reports

Results are output to $res/$sample. Default value for $res is set in config.yaml and default value for $sample is created from the name of the input reads. These can be changed during runtime with --res mydir --sample mysample.

In example 1 results were output to $data/results/M15small.

Assembled contigs and predicted ORFs

File or Directory	Description
contigs	contigs sorted by taxa
contigs.fa	contigs in a single fasta file
contigs.ann1.ab.fa	archaeal+bacterial contigs (based on 1st round annotation)
contigs.ann1.ph.fa	bacteriophage contigs (1st round)
contigs.ann1.vi.fa	viral contigs (1st round)
contigs.ann1.un.fa	unmapped contigs (1st round)
contigs.ann2.ab.fa	archaeal+bacterial contigs (2nd round)
contigs.ann2.ph.fa	bacteriophage contigs (2nd round)
contigs.ann2.vi.fa	viral contigs (2nd round)
contigs.ann2.un.fa	unmapped contigs (2nd round)
contigs.orfs.aa.fa	predicted ORFs as aa sequences
contigs.orfs.nt.fa	predicted ORFs as nt sequences
scaffolds.fa	scaffolds, if available

Table 5: Lazypipe results: contigs and ORFs.

Abundance tables

Figure 2. abund_table.xlsx

Spreadsheets with taxon abundancies are printed to abund_table.xlsx. Abundancies are displayed in separate tables for viruses (excluding bacteriophages), bacteria, bacteriophages and eukaryots. For each domain abundancies are displayed at three taxonomic levels: species, genus and family.

For raw abundance data see abund_table.tsv.

column	description
readn	read pairs assigned to this taxon
readn_pc	percentage of reads pairs assigned to this taxon
csum	cumulative read distribution score (percentage of reads mapped to this taxon and more abundant taxa)
csumq	confidences score based on csum (1 ~ reliable, 2 ~ intermediate, 3 ~ unreliable)
contign	contigs assigned to this taxon
species	species name (NCBI taxonomy)
species_id	species taxid (NCBI taxonomy)
genus	genus name
genus_id	genus taxid
family	family name
family_id	family taxid

Table 6: Columns in abund_table.xlsx

Annotation tables

Figure 3. annot_table.xslx

Spreadsheets with contig annotations are printed to contig_annot.xslx. Spreadsheets are displayed separately for viruses (excluding bacteriophages), bacteria, bacteriophages and eukaryots.

For raw annotation data see contigs_annot.tsv.

column	description
search	applied database search (e.g. blastn)
db	applied database (e.g. UniRef100.vi)
dbtype	nucl for nucleotide and prot for protein databases
contig	contig id
orf	orf description in start-end:strand format
clen	contig length
sseqid	subject sequence id
bitscore	alignment score
alen	alignment length
pident	percent identity
qlen	query sequence length
qcov	query coverage
slen	subject sequence length
scov	subject coverage
staxid	subject sequence taxid
sname	subject sequence name
bphage	yes for bacteriophage staxids
species	assigned species
genus	assigned genus
family	assigned family
order	assigned order
class	assigned class

Table 7: Columns in contigs_annot.xslx

Quality control plots

QC plots for a number of samples Figure 5. Quality control plogs

Quality Control (QC) plots include length histograms for reads and contigs, and survival plots. The survival plots track retained reads after each pipeline step.

file	description
qc.read1.jpeg	length hist for forward reads
qc.read2.jpeg	length hist for reverse reads
qc.contigs.jpeg	length hist for contigs
qc.readsurv.jpeg	read survival plots

Table 8: Quality Control plots

Retrieving reads for a contig or taxid

Start by unzipping your source fasta:

gunzip -k results/M15small/read*.trim.fq.gz

To retrieve all reads mapped to contig k99.17 type:

bin/retrieve_reads -r results/M15small -v -c k99.17

To retrieve all reads mapped to Circovirus mink use the following command. Note that the exact species name may change with taxonomy updates.

bin/retrieve_reads -r results/M15small -v -s "Circovirus mink"

To retrieve all reads mapped to staxid 1239574 (Mamastrovirus) type:

bin/retrieve_reads -r results/M15small -v -t 1239574

Command line options

Short	Long	Value	Default	Description
INPUT:
-1	`--read1`	file		PE reads, fastq with forward reads (can be gzipped)
-2	`--read2`	file	guess from `--read1`	PE reads, fastq with reverse reads (can be gzipped)
	`--se`		false	Input reads are SE-reads. Any --read2 file will be ignored
	`--hostgen`	file		`*.fna` file containing host genome. To filter host reads use `--hostgen file -p flt`
	`--hgtaxid`	taxid		Map host reads to this taxid
	`--config`	file	`config.yaml`	Configuration file with default options
OUTPUT:
	`--logs`	dir	logs	Logs will be printed to `$logs/$sample/`
-r	`--res`	dir	results	Results will be printed to `$res/$sample/`
-s	`--sample`	str	`--read1` prefix	Results will be printed to `$res/$sample/`
PARAMETERS:
-p	`--pipe`	str	main	Comma-separated list of steps to perform, e.g. `--pipe pre,flt,ass,ann,realign,sta,pack`
		pre/preprocess		Preprocess reads, i.e. filter low quality reads
		flt/filter		Filter reads mapping to host genome using --hostgen file
		ass/assemble		Assemble reads to contigs
		rea/realign		Realign reads to contigs
		ann1/annot1		Run 1st round annotation
		ann2/annot2		Run 2nd round annotation
		rep/report		Create reports
		sta/stats		Create assembly stats + QC plots
		pack		Pack results into a `*tar.gz` in the root result directory
		clean		Remove all intermediate/temporary files
		main		Run main steps: `pre,flt,ass,rea,ann1,ann2,rep,sta,pack,clean`
	`--ann1`	key	minimap,sans	List of keys defining 1st round annotation
				MUST be in format: `$search[.$dbid]`, where:
				`$search` is a valid database search (blastn,blastp,minimap or sans)
				`$dbid` is a reference database id (optional)
				For each key their MUST be a database defined in `config.yaml`
	`--ann2`	key	blastn.vi,blastp.vi	List of keys defining 2nd round annotations
				MUST be in format: `$search.$target[.$dbid]`, where:
				`$search` is a valid database search (blastn,blastp,minimap or sans)
				`$target` is a valid target (ab = Archaea+Bacteria, ph = Bacteriophages, vi = Viruses, un = Unmapped)
				`$dbid` is a reference database id (optional)
				For each key their MUST be a database defined in `config.yaml`
	`--anns`	key		Apply annotation-strategy defined in `config.yaml` under the supplied key. Overrides any `--ann1/ann2` options
	`--ass`	str	megahit	Assembler: megahit/spades
	`--gen`	str	mga	Gene prediction: mga/prod
	`--pre`	str	pre	Use fastp/trimm/none to preprocess reads
	`--clean`		false	Delete intermediate files after each step
-t	`--numth`	int	8	Number of threads
-w	`--wmodel`	str	bitscore	Weighting model for abundance estimation: taxacount/bitscore/bitscore2
-v			false	Verbal mode

Table 9: Lazypipe command line options.

Default options and additional settings are defined in config.yaml file. Note that command line options take precedence over options in config.yaml file.

Additional options in `config.yaml`:

Option	Value	Description
GENERAL PARAMETERS
`R_call`	str	Rscript or similar for calling R
`min_read2hostgen_score`	num	Minimum alignment score for read mapping to hostgen
`min_orf_length`	num	Minimum ORF sequence length for reporting/mapping
`min_sans_bits`	num	Minimum alignment score for mapping with SANSparallel
`min_blastp_bits`	num	Minimum alignment score for mapping with BLASTP
`min_blastn_bits`	num	Minimum alignment score for mapping with BLASTN
`min_minimap_DPpeak_score`	num	Minimum alignment score for contig mapping with minimap2
`min_read2contig_score`	num	Minimum alignment score for read mapping to contigs
`fastp_par`	str	Fastp parameters
`trimm_par`	str	Trimmomatic parameters. NOTE: please ensure that `$TM` envirnoment variable is pointing to Trimmomatic installation root
`tail`	percent	Remove taxa that correspond to this percentile in abundance estimation. Set to zero to keep all predictions
`tail_contig`	percent	Remove taxa from contig that correspond to this percentile. Reduces noise in abundance estimation.
`trimm_sample_name`	0/1	When setting sample-name from read1-name, trimm read1-name to the first occurance of "_"
DEFAULT COMMAND LINE OPTIONS
See Command Line Options
DATABASES
`ann1.databases:`		Reference databases for the 1st round annotations
`minimap`	path	Local Minimap2 database. Specify path to .fasta or .fasta.mmi file. This MUST be accomponied with .acc2taxid tsv-file (see default Minimap2* databases for example)
`blastn[.dbid]`	path	Local blastn database. To specify several blastn databases use optional `dbid` (eg `blastn.abv`)
`blastp[.dbid]`	path	Local blastp database. To specify several blastp databases use optional `dbid` (eg `blastp.viruses`)
`ann2.databases:`		Reference databases for the 2nd round annotations
`$search.$target[.dbid]`	path	Generally use any valid dbsearch (minimap/blastn/blastp) and any valid target (ab/ph/vi/un) to specify databases for the 2nd round annotations
`blastn.vi[.dbid]`	path	Local blastn database targeting viral sequences. To specify several databases for the same target use `dbid`
`blastp.vi[.dbid]`	path	Local blastp database targeting viral sequences. To specify several databases for a target use `dbid`
`taxonomy`	dir	Path to local NCBI taxonomy database. Database will be installed on demand
`taxonomy_update`	0/1	Set to 1 to update NCBI taxonomy db
`taxonomy_update_time`	num	NCBI taxonomy update frequency in days
`urls:`		Urls for retrieving databases
`taxonomy`	str	URL to NCBI taxonomy (taxdump.tar.gz). This MUST be defined

Table 10: Default options in in config.yaml

Citing Lazypipe

Plyusnin Ilya, Olli Vapalahti, Tarja Sironen, Ravi Kant, and Teemu Smura. “Enhanced Viral Metagenomics with Lazypipe 2.” Viruses 15, no. 2 (February 4, 2023): 431. https://doi.org/10.3390/v15020431
Ilya Plyusnin, Ravi Kant, Anne J. Jaaskelainen, Tarja Sironen, Liisa Holm, Olli Vapalahti, Teemu Smura. (2020) Novel NGS Pipeline for Virus Discovery from a Wide Spectrum of Hosts and Sample Types. Virus Evolution, veaa091, https://doi.org/10.1093/ve/veaa091

Contact

Project website: https://www.helsinki.fi/en/projects/lazypipe

Contact email: grp-lazypipe@helsinki.fi