NA row for geneCounts() on d_call
Using immcantation/suite:4.1.0 docker image, the attached AIRR TSV input file, run with this
db <- airr::read_rearrangement('vdjserver1.airr.tsv')
genes <- countGenes(db, gene='d_call', group='repertoire_id', mode='gene', copy='duplicate_count')
write.table(genes, row.names=F, sep='\t', file='d_gene_usage.tsv')
produces NA
row in the output, based upon the count it seems to be the rearrangements with a blank d_call
"repertoire_id" "gene" "seq_count" "copy_count" "seq_freq" "copy_freq"
"5325894209612354026-242ac113-0001-012" "TRBD1" 81387 132975 0.472979491262429 0.473424499517586
"5325894209612354026-242ac113-0001-012" "TRBD2" 49013 80938 0.284838411604377 0.288159670178262
"5325894209612354026-242ac113-0001-012" NA 41673 66966 0.242182097133193 0.238415830304152
The AIRR TSV was generated by IgBlast, though the data has been uploaded then downloaded from the VDJServer’s ADC API.
Comments (7)
-
-
-
assigned issue to
-
assigned issue to
-
reporter I agree, it sounds like it might be nice to know about missing data. It’s reassuring to see the totals add up to the expected. I wonder though if any programs would get confused and blindly display/graph it thinking it's a real gene? They would have to know to exclude.
I actually filed the issue because
c_call
throws an error in this case. So my preference is the NA versus an exception, but regardless the behavior should be the same for all. -
-
assigned issue to
-
assigned issue to
-
- changed status to open
-
- changed milestone to next release
-
- changed status to resolved
- Log in to comment
Ah, I see. It must be counting “NA” as a gene. I can’t decide whether that’s good or bad. It seems like it might be desirable to know how much missing data you have. What’s your preference?