Error in findNovelAlleles(db, germline_db = ighv, nproc = 2)

Issue #39 resolved
Xavier Laulhe created an issue

hello there i have a problem. I don’t undestand this error. Don’t have enough sequences?

this is my code :

AssignGenes.py igblast -s filtered_contig.fasta -b /usr/local/share/igblast
--organism mouse --loci ig --format blast --outdir results

MakeDb.py igblast -i results/filtered_contig_igblast.fmt7 -s filtered_contig.fasta
-r /usr/local/share/germlines/imgt/mouse/vdj/ IMGT_Mouse_IG.fasta*
filtered_contig_annotations.csv --extended

ParseDb.py select -d results/filtered_contig_igblast_db-pass.tsv -f productive -u T --outname data_p

ParseDb.py select -d results/changeo/data_p_parse-select.tsv
-f v_call -u IGHV --regex --outname data_ph

R
suppressPackageStartupMessages(library(airr))
suppressPackageStartupMessages(library(alakazam))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(tigger))

db <- read_rearrangement(file.path("results","data_ph_parse-select.tsv"))
colnames(db) # show the column names in the database
ighv <-readIgFasta(file.path("","usr","local","share","germlines","imgt","mouse","vdj","imgt_mouse_IGHV.fasta"))
ighv[1] # show the first germline

nv <- findNovelAlleles(db, germline_db = ighv, nproc = 2) # find novel allelesError in findNovelAlleles(db, germline_db = ighv, nproc = 2) :
Not enough sample sequences were assigned to any germline:
(1) germline_min is too large or
(2) sequences names don't match germlines.

Comments (2)

  1. ssnn

    Hi! findNovelAlleles requires a minimum number of sequences to analyze each allele. By default, this number is germline_min=200. Based on the message, it seems that you don’t have at least 200 sequences for any of the alleles. This can happen because (1) your data really doesn’t have that many sequences, and then you can try to lower germline_min ; or (2) because for whatever reason, the names of the germline sequences in germline_db and the v_calls in db don’t match, then you should review that you provided the right germline_db data.

  2. Log in to comment