Workflow question from Ang

Issue #22 resolved
Jason Vander Heiden created an issue

"I was testing the code and noticed that this block of code is no longer there. Just curious why we removed it. The gaps IMGT puts in break some 5-mers, don't they?"

matInputCollapsed<-t(apply(matInput,1,function(x){
  L<-nchar(x[1])
  apply(sapply(1:floor(L/3),function(i){
    if(substr(x[1],((i-1)*3+1),i*3)!="---" & substr(x[2],((i-1)*3+1),i*3)!="---" )return(c(substr(x[1],((i-1)*3+1),i*3),substr(x[2],((i-1)*3+1),i*3)))
    else return (c("",""))}
  ),1,function(x)paste(x,collapse="")
  )
}))

Comments (4)

  1. Jason Vander Heiden reporter

    Newer version from Mohamed.

        seqsWithImgtGaps <- "THE_GERMLINE_OR_INPUT_SEQUENCE"
        # Removing IMGT gaps (they should come in threes)
        # After converting ... to "" any other . is not an IMGT gap & will be treated like N
        gaplessSeq <- gsub("\\.\\.\\.", "", seqsWithImgtGaps)
        #If there is a single gap left convert it to an N
        gaplessSeq <- gsub("\\.", "N", gaplessSeq)
    

    Note, this version does not remove gap "codon" from both the germline and input sequence simultaneously. Ie, guarantee the same positions are removed in both germline and input.

  2. Jason Vander Heiden reporter

    Ang:

    "Sure, this is prior to building substitution and mutability models. Yes, it deletes codons that are all gaps. I think Moe wrote the original code, so he should know it very well.

    I was wondering about this case. AAT...TTA

    Without removing the gaps, the 5-mers AATTT and ATTTA would not be counted.

    I was wondering if anywhere else in the code is handling this.

    I think I'll also update you on testing. Building consensus sequence step looks fine. The substitution model, however, gives different values. One source of the discrepancies is this deleting empty codon step, but there are some other differences that I haven't figured out. Since substitution models are different, I haven't tested mutability model yet. I attached a test case here."

    ...

    "We may also want to make sure that the triple gaps start at 3n+1 positions."

  3. Log in to comment