getClosestBy1Mer is broken when model=aa
The final step of getClosestBy1Mer()
is broken when model="aa"
, due to trying to match translated sequences against nucleotide sequences:
This doesn't work:
arrJunctionsDist <- arrUniqueJunctionsDist[match(names(arrJunctionsDist),
names(arrUniqueJunctionsDist))]
Because:
> names(arrJunctionsDist)
[1] "ACGTACGTACGT" "ACGAACGTACGT" "ACGAACGTATGT" "ACGAACGTATGC" "ACGAACGTATCC" "AAAAAAAAAAAA"
> names(arrUniqueJunctionsDist)
[1] "TYVR" "TNVR" "TNVC" "TNVC" "TNVS" "KKKK"
Thus, all distances returned are NA
.
Comments (10)
-
-
reporter Yeah, should be. I'm leaving now and needed the reminder. In case I feel like fixing it Sunday.
-
reporter I think this is fixed now. I made a lot of changes to
getClosestBy1Mer()
, including a lot of readability refactoring.@namita1025 and @ssnn - It'd be great if you could both look over it to make sure it's behaving how it should. I'm not really familiar with the code, so I don't know if I missed something here.
-
reporter - changed title to getClosetBy1Mer is broken when model=aa
-
reporter - edited description
-
reporter - edited description
- changed title to getClosestBy1Mer is broken when model=aa
-
reporter The following unit test failures occurred:
1. Failure: Test cross distToNearest with model hs1f (@test.distToNearest.R#18) dist_hs1f$DIST_NEAREST[test_idx] not equal to c(...). 6/17 mismatches (average diff: 0.346) [12] 0.607 - 0.070 == 0.5371 [13] 0.615 - 0.196 == 0.4190 [14] 0.196 - 0.607 == -0.4116 [15] 0.597 - 0.615 == -0.0171 [16] NA - 0.196 == NA [17] NA - 0.597 == NA 2. Failure: Test cross distToNearest with model hs1f (@test.distToNearest.R#47) cross_dist_hs1f$CROSS_DIST_NEAREST[1:nrow(dist_hs1f)] not equal to dist_hs1f$DIST_NEAREST. 172/315 mismatches (average diff: 0.158) [34] 0.528 - 0.437 == 0.0905 [35] 0.528 - 0.450 == 0.0780 [37] 0.437 - 0.520 == -0.0833 [38] 0.450 - NA == NA [39] 0.528 - NA == NA [40] 0.528 - NA == NA [41] 0.520 - NA == NA [46] 0.679 - 0.476 == 0.2031 [47] 0.476 - 0.500 == -0.0239
I'm not sure if these are correct or not, and I won't have the time to check today. It could be that the test are wrong, because I switched from
nchar()
tostringi::stri_length
becausenchar(NA)
returns2
, or it could be due to gap/N handling in alakazam. Not sure. A quick check with 4 sequences looks correct to me.(BTW - Thanks a bunch @ssnn for writing the tests.)
-
Fixed it, there was a bug in the rewrite. The original vector of junctions was being given the names of the vector of UNIQUE junctions, which means some names were NA, which led to a bug in the mapping unique distances back to input vector. Test runs smoothly now, though I guess there is still no test for the amino acid model...
-
reporter That's great news! I was worried the bug was in the original code and someone would have to remake all the figures for their paper. :)
-
reporter - changed status to resolved
- Log in to comment
Seems like this has an easy fix of making sure the names of the vectors remain nucleotide, no?