- edited description
Function plotNovel() found nv NA/NaN in Introductory Webinar and Jupyter Notebook
Hello! I am trying to learn the Immcantation Tutorials, and I found the Introductory Webinar and Jupyter Notebook and the example input file (input.fasta) . However, I got KeyError: 'V Frame shift'
, so I installed the development version of changeo according to the Issue. And I installed other R packages by install.packages(c("alakazam", "shazam", "tigger", "scoper"))
.
Everything was OK until I test the Identify potentially novel V gene alleles part in the Introductory Webinar and Jupyter Notebook. I found the nv contains no Novel allele
. Thus when I run plotNovel() I got another error. Here are the codes:
suppressPackageStartupMessages(library(alakazam))
suppressPackageStartupMessages(library(tigger))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(airr))dir.create("results/tigger")
db <- read_airr("results/changeo/data_ph_parse-select.tsv")
|===================================================================| 100% 92 MB
colnames(db)
[1] "sequence_id" "sequence" "rev_comp"
[4] "productive" "v_call" "d_call"
[7] "j_call" "sequence_alignment" "germline_alignment"
[10] "junction" "junction_aa" "v_cigar"
[13] "d_cigar" "j_cigar" "stop_codon"
[16] "vj_in_frame" "locus" "junction_length"
[19] "np1_length" "np2_length" "v_sequence_start"
[22] "v_sequence_end" "v_germline_start" "v_germline_end"
[25] "d_sequence_start" "d_sequence_end" "d_germline_start"
[28] "d_germline_end" "j_sequence_start" "j_sequence_end"
[31] "j_germline_start" "j_germline_end" "isotype"ighv <- readIgFasta("/hw5/data/imgt/human/vdj/imgt_human_IGHV.fasta")
ighv[1]
IGHV1-18*01
"CAGGTTCAGCTGGTGCAGTCTGGAGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTT............ACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC......AATGGTAACACAAACTATGCACAGAAGCTCCAG...GGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA"
nv <- findNovelAlleles(db, germline_db=ighv, nproc=8)
selectNovel(nv)
[1] germline_call note
[3] polymorphism_call nt_substitutions
[5] novel_imgt novel_imgt_count
[7] novel_imgt_unique_j novel_imgt_unique_cdr3
[9] perfect_match_count perfect_match_freq
[11] germline_call_count germline_call_freq
[13] mut_min mut_max
[15] mut_pass_count germline_imgt
[17] germline_imgt_count pos_min
[19] pos_max y_intercept
[21] y_intercept_pass snp_pass
[23] unmutated_count unmutated_freq
[25] unmutated_snp_j_gene_length_count snp_min_seqs_j_max_pass
[27] alpha min_seqs
[29] j_max min_frac
<0 rows> (or 0-length row.names)pdf("results/tigger/plot1.pdf")
plotNovel(db, selectNovel(nv)[1,])
Error in novel_row$pos_min:novel_row$pos_max : NA/NaN argument
dev.off()head(nv)
germline_call
1 IGHV1-2*02
2 IGHV1-2*04
3 IGHV1-3*01
4 IGHV1-3*01
5 IGHV1-8*01
6 IGHV1-8*03note
1 No positions pass y-intercept test.
2 No positions pass y-intercept test.
3 No positions pass y-intercept test.
4 Position(s) passed y-intercept (9,60,86,117,144,173,190,191,196,232,255,259,263,270,303) but no unmutated versions of novel allele found.
5 No positions pass y-intercept test.
6 No positions pass y-intercept test.
polymorphism_call nt_substitutions novel_imgt novel_imgt_count
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
novel_imgt_unique_j novel_imgt_unique_cdr3 perfect_match_count
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NAperfect_match_freq germline_call_count germline_call_freq mut_min mut_max
1 NA 2485 0.038 1 10
2 NA 758 0.012 1 10
3 NA 703 0.011 1 10
4 NA 703 0.011 29 38
5 NA 1083 0.017 1 10
6 NA 664 0.010 1 10
mut_pass_count
1 336
2 77
3 67
4 123
5 79
6 61
germline_imgt
1 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
2 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCTGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
3 CAGGTCCAGCTTGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTC............ACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGC......AATGGTAACACAAAATATTCACAGAAGTTCCAG...GGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
4 CAGGTCCAGCTTGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTC............ACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGC......AATGGTAACACAAAATATTCACAGAAGTTCCAG...GGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
5 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAAC......AGTGGTAACACAGGCTATGCACAGAAGTTCCAG...GGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
6 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCAGCTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAAC......AGTGGTAACACAGGCTATGCACAGAAGTTCCAG...GGCAGAGTCACCATTACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
germline_imgt_count pos_min pos_max y_intercept y_intercept_pass snp_pass
1 1161 1 312 0.125 0 NA
2 512 1 312 0.125 0 NA
3 328 1 312 0.125 0 NA
4 328 1 312 0.125 15 76
5 791 1 312 0.125 0 NA
6 432 1 312 0.125 0 NA
unmutated_count unmutated_freq unmutated_snp_j_gene_length_count
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 0 0 0
5 NA NA NA
6 NA NA NA
snp_min_seqs_j_max_pass alpha min_seqs j_max min_frac
1 NA 0.05 50 0.15 0.75
2 NA 0.05 50 0.15 0.75
3 NA 0.05 50 0.15 0.75
4 NA 0.05 50 0.15 0.75
5 NA 0.05 50 0.15 0.75
6 NA 0.05 50 0.15 0.75
Any ideas or help would be greatly appreciated!
The attachments was about IGHV, which might be helpful to solve the problem.
Comments (19)
-
reporter -
reporter Here is the head of the fasta file .
$head input.fasta
seq1|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnnnnnnnagtgtcaggtgcagctggtggagctggggagcgtggt
ccagcctgggaggccctgagtctctcctgtgcagcctctggattcaccttcagtctctat
gctatgcactgggtccgccaggctccaggcaaggggctggagtgggtggcagttatatca
tatcatgtaagcagtaaatactacgcagactccgtgaagggccgattcaccatctccaga
gacaattccaagaacacgctgtatctgcaaatgaacagcctgagagctgaggacacggct
gtgtattactgtgcgagagggccctatagtactggttattactacgagcttgactactgg
ggccagggaacgctcggtcacccgtctcctcacgggtagtgcatccgccccaaccc
seq2|ISOTYPE=IGHM
nnnnnnnnnnnnnnnnnntgtcccaggtgcagctgcaggagtcgggcccaggactggtga
-
-
assigned issue to
This looks like a tigger issue. Could you please take a took @ssnn?
-
assigned issue to
-
reporter - edited description
-
reporter - edited description
-
reporter - attached imgt_human_IGHV.fasta
-
reporter - edited description
-
I have not been able to reproduce the error with
ExampleDb
, after removing all sequences with the alleleIGHV1-8*02
, and producing 0 novels alleles.Does the same error happen with other rows in
nv
, or just with the first one?Why does your data use the germline name
IGHV1-2_02_
innv[1,]
? Is this just a typo introduced during copy/paste to create the issues, or is this present in your data?Can you share your
db
so I can reproduce the error? I can set up a secure file transfer request if the file is too big, or if you’d rather not share it here. If so, please send me a message to immcantation@googlegroups.com, with the email you want to use. -
reporter - edited description
-
reporter - edited description
-
reporter - edited description
-
reporter - attached dp.zip
-
reporter Thank you for your reply.
The same error happen with other rows in
nv
, maybe due to producing no novels alleles because I found the selectNovel(nv) step was inconsistent with the Jupyter Notebook.I’m sorry that it was just a mistake introduced during copy/paste about the germline name, indeed. I have corrected that.
And I have upload the
db
file in the attachments (db.zip). Actually I produced the file according to the Jupyter Notebook, And I create the igblast/imgt reference according to the following codes (downloaded from this website ):
#Import IgBLAST and IMGT germline databases
fetch_igblastdb.sh -o /data/igblast
fetch_imgtdb.sh -o /data/imgt
imgt2igblast.sh -i /data/imgt -o /data/igblast
Here is the igblast version information:
igblastn: 1.16.0
Package: igblast 1.17.0, build Oct 6 2020 11:26:37 -
Thanks for the files. I don’t get the error. I am using the most recent version of
tigger
, 1.0.0.999, installed from the repo. Please upgrade and let me know if that works for you.
-
- attached changeo-issue-173.html
Attaching the result, with
tigger 1.0.0.999
-
reporter Hi, I used the
tigger
(1.0.0) to run the codes in the html you provided above, and I dont’t get any errors either. Maybe I found the key point that I usedplotNovel(db, selectNovel(nv)[1,])
for visualization while theselectNovel(nv)
is empty. When I usedplotNovel(db, nv[i,])
, no error occured. Does it mean that No Novel allele found ?However, I wonder why
selectNovel(nv)
wasn’t empty in the Jupyter Notebook on the official website and therefore, usingplotNovel(db, selectNovel(nv)[1,])
is OK. Is it due to the database/reference, packages version or parameter issues ?Anyway, thank you so much again.
-
selectNovel(nv)
is empty when tigger doesn’t find new polymorphisms that pass all the thresholds. If there are novel alleles, the second panel in the figure fromplotNovel
, shows the nucleotide usage at each of the polymorphic positions as a function of sequence-wide mutation count. If there are no novel alleles (`selectNovel(nv)` is empty), in recent versions of tigger, you can still useplotNovel
, to visualize your data, and then the second panel shows the mutation count. The Jupyter Notebook is a tutorial, and uses a container, data and reference germlines for demonstration purposes only. It has been prepared to be able to find a novel allele. In particular, alleleIGHV3-20*04
has been removed from the reference germline database, and is ‘discovered’ asIGHV3-20*01_C307T
. -
reporter Aha, I got it. Thanks !
-
- changed status to resolved
Seems cleared up.
- Log in to comment