NA row for geneCounts() on d_call

Issue #90 new
Scott Christley created an issue

Using immcantation/suite:4.1.0 docker image, the attached AIRR TSV input file, run with this

db <- airr::read_rearrangement('vdjserver1.airr.tsv')
genes <- countGenes(db, gene='d_call', group='repertoire_id', mode='gene', copy='duplicate_count')
write.table(genes, row.names=F, sep='\t', file='d_gene_usage.tsv')

produces NA row in the output, based upon the count it seems to be the rearrangements with a blank d_call

"repertoire_id" "gene"  "seq_count" "copy_count"    "seq_freq"  "copy_freq"
"5325894209612354026-242ac113-0001-012" "TRBD1" 81387   132975  0.472979491262429   0.473424499517586
"5325894209612354026-242ac113-0001-012" "TRBD2" 49013   80938   0.284838411604377   0.288159670178262
"5325894209612354026-242ac113-0001-012" NA  41673   66966   0.242182097133193   0.238415830304152

The AIRR TSV was generated by IgBlast, though the data has been uploaded then downloaded from the VDJServer’s ADC API.

Comments (4)

  1. Jason Vander Heiden

    Ah, I see. It must be counting “NA” as a gene. I can’t decide whether that’s good or bad. It seems like it might be desirable to know how much missing data you have. What’s your preference?

  2. Scott Christley reporter

    I agree, it sounds like it might be nice to know about missing data. It’s reassuring to see the totals add up to the expected. I wonder though if any programs would get confused and blindly display/graph it thinking it's a real gene? They would have to know to exclude.

    I actually filed the issue because c_call throws an error in this case. So my preference is the NA versus an exception, but regardless the behavior should be the same for all.

  3. Log in to comment