getClosestBy1Mer is broken when model=aa

Namita Gupta

Seems like this has an easy fix of making sure the names of the vectors remain nucleotide, no?

2016-06-10T22:00:45+00:00

Jason Vander Heiden reporter

Yeah, should be. I'm leaving now and needed the reminder. In case I feel like fixing it Sunday.

2016-06-10T22:02:31+00:00

Jason Vander Heiden reporter

I think this is fixed now. I made a lot of changes to getClosestBy1Mer(), including a lot of readability refactoring.

@namita1025 and @ssnn - It'd be great if you could both look over it to make sure it's behaving how it should. I'm not really familiar with the code, so I don't know if I missed something here.

2016-06-12T17:20:26+00:00

Jason Vander Heiden reporter

changed title to getClosetBy1Mer is broken when model=aa

2016-06-12T17:53:32+00:00

Jason Vander Heiden reporter

edited description

2016-06-12T17:53:42+00:00

Jason Vander Heiden reporter

edited description
changed title to getClosestBy1Mer is broken when model=aa

2016-06-12T17:54:13+00:00

Jason Vander Heiden reporter

The following unit test failures occurred:

1. Failure: Test cross distToNearest with model hs1f (@test.distToNearest.R#18) 
dist_hs1f$DIST_NEAREST[test_idx] not equal to c(...).
6/17 mismatches (average diff: 0.346)
[12] 0.607 - 0.070 ==  0.5371
[13] 0.615 - 0.196 ==  0.4190
[14] 0.196 - 0.607 == -0.4116
[15] 0.597 - 0.615 == -0.0171
[16]    NA - 0.196 ==      NA
[17]    NA - 0.597 ==      NA


2. Failure: Test cross distToNearest with model hs1f (@test.distToNearest.R#47) 
cross_dist_hs1f$CROSS_DIST_NEAREST[1:nrow(dist_hs1f)] not equal to dist_hs1f$DIST_NEAREST.
172/315 mismatches (average diff: 0.158)
[34] 0.528 - 0.437 ==  0.0905
[35] 0.528 - 0.450 ==  0.0780
[37] 0.437 - 0.520 == -0.0833
[38] 0.450 -    NA ==      NA
[39] 0.528 -    NA ==      NA
[40] 0.528 -    NA ==      NA
[41] 0.520 -    NA ==      NA
[46] 0.679 - 0.476 ==  0.2031
[47] 0.476 - 0.500 == -0.0239

I'm not sure if these are correct or not, and I won't have the time to check today. It could be that the test are wrong, because I switched from nchar() to stringi::stri_length because nchar(NA) returns 2, or it could be due to gap/N handling in alakazam. Not sure. A quick check with 4 sequences looks correct to me.

(BTW - Thanks a bunch @ssnn for writing the tests.)

2016-06-12T18:23:34+00:00

Namita Gupta

Fixed it, there was a bug in the rewrite. The original vector of junctions was being given the names of the vector of UNIQUE junctions, which means some names were NA, which led to a bug in the mapping unique distances back to input vector. Test runs smoothly now, though I guess there is still no test for the amino acid model...

2016-06-13T15:48:32+00:00

Jason Vander Heiden reporter

That's great news! I was worried the bug was in the original code and someone would have to remake all the figures for their paper. :)

2016-06-13T16:00:13+00:00

Jason Vander Heiden reporter

changed status to resolved

Resolved in 7f0dec8 thru 0aba438. We still need more unit tests for distToNearest(), but that's a separate issue.

2016-06-13T19:02:04+00:00

Comments (10)