-
assigned issue to
selectNovel removes all identical novel alleles
Issue #18
resolved
selectNovel will remove both novel alleles when two identical allele calls are present. For example, IGHV1-2*02_T163C
and IGHV1-2*05_T299C
are both dropped when present. Changing the keep_alleles
argument has no effect.
Comments (6)
-
-
I think this, based on description above, should reproduce the problem, but it doesn't
novel_df <- data.frame(NOVEL_IMGT=c("CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA", NA, "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA"), POLYMORPHISM_CALL=c("IGHV1-2*02_T163C", NA, "IGHV1-2*05_T299C"), GERMLINE_CALL=c("IGHV1-2*02", "IGHV1-2*02", "IGHV1-2*05"))
Output with keep_alleles=T:
> selectNovel(novel_df, keep_alleles = T) # A tibble: 2 x 3 NOVEL_IMGT POLYMORPHISM_CA… GERMLINE_CALL <fct> <fct> <fct> 1 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTG… IGHV1-2*02_T163C IGHV1-2*02 2 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTG… IGHV1-2*05_T299C IGHV1-2*05
Output with keep_alleles=F
> selectNovel(novel_df, keep_alleles = F) NOVEL_IMGT 1 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA POLYMORPHISM_CALL GERMLINE_CALL 1 IGHV1-2*02_T163C IGHV1-2*02 >
-
reporter I can't reproduce this now using the attached novel allele data.frame which previously showed the error.
Seems like we fixed it at some point?
-
reporter - attached selectNovel_error.rda
-
Yes, seems to work fine with the attached selectNovel_error.rda
> selectNovel(novel_ur01, keep_alleles = F) GERMLINE_CALL NOTE POLYMORPHISM_CALL NT_SUBSTITUTIONS 1 IGHV1-2*02 Novel allele found! IGHV1-2*02_T163C 163T>C NOVEL_IMGT 1 CAGGTGCAGCTGGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC------------ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC------AGTGGTGGCACAAACTATGCACAGAAGTTTCAG---GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA NOVEL_IMGT_COUNT NOVEL_IMGT_UNIQUE_J NOVEL_IMGT_UNIQUE_CDR3 PERFECT_MATCH_COUNT PERFECT_MATCH_FREQ 1 154 6 146 184 0.7698745 GERMLINE_CALL_COUNT GERMLINE_CALL_PERC MUT_MIN MUT_MAX MUT_PASS_COUNT 1 239 1.8 1 10 216 GERMLINE_IMGT 1 CAGGTGCAGCTGGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC------------ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC------AGTGGTGGCACAAACTATGCACAGAAGTTTCAG---GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA GERMLINE_IMGT_COUNT POS_MIN POS_MAX Y_INTERCEPT Y_INTERCEPT_PASS SNP_PASS UNMUTATED_COUNT 1 0 1 312 0.125 1 213 184 UNMUTATED_FREQ UNMUTATED_SNP_J_GENE_LENGTH_COUNT SNP_MIN_SEQS_J_MAX_PASS ALPHA MIN_SEQS J_MAX 1 0.7698745 63 1 0.05 50 0.15 MIN_FRAC 1 0.75 > selectNovel(novel_ur01, keep_alleles = T) # A tibble: 2 x 30 GERMLINE_CALL NOTE POLYMORPHISM_CA… NT_SUBSTITUTIONS NOVEL_IMGT NOVEL_IMGT_COUNT NOVEL_IMGT_UNIQ… <chr> <chr> <chr> <chr> <chr> <int> <int> 1 IGHV1-2*02 Novel… IGHV1-2*02_T163C 163T>C CAGGTGCAGC… 154 6 2 IGHV1-2*05 Novel… IGHV1-2*05_T299C 299T>C CAGGTGCAGC… 154 6 # ... with 23 more variables: NOVEL_IMGT_UNIQUE_CDR3 <int>, PERFECT_MATCH_COUNT <int>, # PERFECT_MATCH_FREQ <dbl>, GERMLINE_CALL_COUNT <int>, GERMLINE_CALL_PERC <dbl>, MUT_MIN <int>, # MUT_MAX <int>, MUT_PASS_COUNT <int>, GERMLINE_IMGT <chr>, GERMLINE_IMGT_COUNT <int>, # POS_MIN <int>, POS_MAX <int>, Y_INTERCEPT <dbl>, Y_INTERCEPT_PASS <int>, SNP_PASS <int>, # UNMUTATED_COUNT <int>, UNMUTATED_FREQ <dbl>, UNMUTATED_SNP_J_GENE_LENGTH_COUNT <int>, # SNP_MIN_SEQS_J_MAX_PASS <int>, ALPHA <dbl>, MIN_SEQS <dbl>, J_MAX <dbl>, MIN_FRAC <dbl> >
-
- changed status to resolved
Can't reproduce.
- Log in to comment