- changed title to Missing gene usage values for genes with "D" in the name
Missing gene usage values for genes with "D" in the name
Hello,
I am using the function countGenes
to calculate IGHV gene usage. I've noticed that I don't get any gene usage values for genes that have the character "D" in the name. For example, I don't get any gene usage values for IGHV3-64D. Is this the intention, and is there a way to turn this off?
I believe this is coming from the function getSegment
in Genes.R. Lines 272 to 275 seem to be removing "D" if strip_d
is set to True, and strip_d
is set to True in the function getGene
(lines 302 to 308).
Thanks!
Oscar
Comments (4)
-
reporter -
reporter If its useful, this is the code that I am using:
gene <- countGenes(changoTableMaster, gene="v_call", groups=c("sample","cprimer"), clone="clone_id", mode="gene")
I also checked the change-o (
changoTableMaster
) to make sure I have assignments to genes with “D“, eg:$ cat changeoTableMasterClonePatient_expandedStatus.tsv | grep IGHV3-64D | cut -f5 | sort | uniq -c | sort -k1,1nr 7452 IGHV3-64D*06 7372 IGHV3-64D*09 2206 IGHV3-64D*08 1021 IGHV3-64D*06,IGHV3-64D*08 165 IGHV3-64D*06,IGHV3-64D*09 101 IGHV3-64*03,IGHV3-64D*09 83 IGHV3-64D*08,IGHV3-64D*09
-
- marked as proposal
Hi Oscar,
Yes,
strip_d
removes the D that signals a duplicate gene. In the current implementation ofcountGenes
, it is not possible to setstrip_d
to FALSE. If you want to count the duplicate genes separately, you could do a previous step where you create the gene names withgetGene
andstrip_d=FALSE
, then usecountGenes
withmode='asis'
.> db <- data.frame( + list(v_call=c("IGHV3-64D*06","IGHV3-64*06")) + ) > countGenes(db, gene="v_call") # A tibble: 1 × 3 gene seq_count seq_freq <chr> <int> <dbl> 1 IGHV3-64 2 1 > db[["v_gene"]] <- getGene(db[["v_call"]], strip_d = FALSE) > countGenes(db, gene="v_gene", mode="asis") # A tibble: 2 × 3 gene seq_count seq_freq <chr> <int> <dbl> 1 IGHV3-64 1 0.5 2 IGHV3-64D 1 0.5
-
- changed status to closed
- Log in to comment