selectNovel removes all identical novel alleles

Issue #18 resolved
Jason Vander Heiden created an issue

selectNovel will remove both novel alleles when two identical allele calls are present. For example, IGHV1-2*02_T163C and IGHV1-2*05_T299C are both dropped when present. Changing the keep_alleles argument has no effect.

Comments (6)

  1. ssnn

    I think this, based on description above, should reproduce the problem, but it doesn't

    novel_df <- data.frame(NOVEL_IMGT=c("CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
                                        NA,
                                        "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA"),
                           POLYMORPHISM_CALL=c("IGHV1-2*02_T163C", NA, "IGHV1-2*05_T299C"),
                           GERMLINE_CALL=c("IGHV1-2*02", "IGHV1-2*02", "IGHV1-2*05"))
    

    Output with keep_alleles=T:

    > selectNovel(novel_df, keep_alleles = T)
    # A tibble: 2 x 3
      NOVEL_IMGT                            POLYMORPHISM_CAGERMLINE_CALL
      <fct>                                 <fct>            <fct>        
    1 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGIGHV1-2*02_T163C IGHV1-2*02   
    2 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGIGHV1-2*05_T299C IGHV1-2*05   
    

    Output with keep_alleles=F

    > selectNovel(novel_df, keep_alleles = F)
                                                                                                                                                                                                                                                                                                                            NOVEL_IMGT
    1 CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
      POLYMORPHISM_CALL GERMLINE_CALL
    1  IGHV1-2*02_T163C    IGHV1-2*02
    > 
    
  2. Jason Vander Heiden reporter

    I can't reproduce this now using the attached novel allele data.frame which previously showed the error.

    Seems like we fixed it at some point?

  3. ssnn

    Yes, seems to work fine with the attached selectNovel_error.rda

    > selectNovel(novel_ur01, keep_alleles = F)
      GERMLINE_CALL                NOTE POLYMORPHISM_CALL NT_SUBSTITUTIONS
    1    IGHV1-2*02 Novel allele found!  IGHV1-2*02_T163C           163T>C
                                                                                                                                                                                                                                                                                                                            NOVEL_IMGT
    1 CAGGTGCAGCTGGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC------------ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC------AGTGGTGGCACAAACTATGCACAGAAGTTTCAG---GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
      NOVEL_IMGT_COUNT NOVEL_IMGT_UNIQUE_J NOVEL_IMGT_UNIQUE_CDR3 PERFECT_MATCH_COUNT PERFECT_MATCH_FREQ
    1              154                   6                    146                 184          0.7698745
      GERMLINE_CALL_COUNT GERMLINE_CALL_PERC MUT_MIN MUT_MAX MUT_PASS_COUNT
    1                 239                1.8       1      10            216
                                                                                                                                                                                                                                                                                                                         GERMLINE_IMGT
    1 CAGGTGCAGCTGGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC------------ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAAC------AGTGGTGGCACAAACTATGCACAGAAGTTTCAG---GGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
      GERMLINE_IMGT_COUNT POS_MIN POS_MAX Y_INTERCEPT Y_INTERCEPT_PASS SNP_PASS UNMUTATED_COUNT
    1                   0       1     312       0.125                1      213             184
      UNMUTATED_FREQ UNMUTATED_SNP_J_GENE_LENGTH_COUNT SNP_MIN_SEQS_J_MAX_PASS ALPHA MIN_SEQS J_MAX
    1      0.7698745                                63                       1  0.05       50  0.15
      MIN_FRAC
    1     0.75
    > selectNovel(novel_ur01, keep_alleles = T)
    # A tibble: 2 x 30
      GERMLINE_CALL NOTE   POLYMORPHISM_CANT_SUBSTITUTIONS NOVEL_IMGT  NOVEL_IMGT_COUNT NOVEL_IMGT_UNIQ<chr>         <chr>  <chr>            <chr>            <chr>                  <int>            <int>
    1 IGHV1-2*02    NovelIGHV1-2*02_T163C 163T>C           CAGGTGCAGC154                6
    2 IGHV1-2*05    NovelIGHV1-2*05_T299C 299T>C           CAGGTGCAGC154                6
    # ... with 23 more variables: NOVEL_IMGT_UNIQUE_CDR3 <int>, PERFECT_MATCH_COUNT <int>,
    #   PERFECT_MATCH_FREQ <dbl>, GERMLINE_CALL_COUNT <int>, GERMLINE_CALL_PERC <dbl>, MUT_MIN <int>,
    #   MUT_MAX <int>, MUT_PASS_COUNT <int>, GERMLINE_IMGT <chr>, GERMLINE_IMGT_COUNT <int>,
    #   POS_MIN <int>, POS_MAX <int>, Y_INTERCEPT <dbl>, Y_INTERCEPT_PASS <int>, SNP_PASS <int>,
    #   UNMUTATED_COUNT <int>, UNMUTATED_FREQ <dbl>, UNMUTATED_SNP_J_GENE_LENGTH_COUNT <int>,
    #   SNP_MIN_SEQS_J_MAX_PASS <int>, ALPHA <dbl>, MIN_SEQS <dbl>, J_MAX <dbl>, MIN_FRAC <dbl>
    > 
    
  4. Log in to comment