- edited description
Improve performance of observedMutations
Replace rbind
with dplyr::bind_rows
in observedMutations
. Also, probably remove s2c
and c2s
in favor of equivalent stringi
functions. And maybe cbind
to dplyr::bind_cols
Profile.
Comments (9)
-
reporter -
Will hand to Jason but can take over if changes are not affecting speed. Curious about the results...
-
reporter I swapped out the rbind and it didn't help. Turns out it was a matrix rbind and not a data.frame rbind, so there's that.
Looks like we need to search elsewhere for the problem. Probably in calcObservedMutations and associated helpers.
-
reporter Some ideas:
1)
translateCodonToAminoAcid
doesn't need to be a function. It's just extra overhead:> system.time(replicate(100000, shazam:::translateCodonToAminoAcid("TGA"))) user system elapsed 1.087 0.001 1.093 > system.time(replicate(100000, AMINO_ACIDS["TGA"])) user system elapsed 0.167 0.000 0.168
2)
mutationType
is a private function, so we don't need to usematch.arg
to verify the commandline arguments. -
Ok, are you assigning back to me? Any changes I would include would not be within the scope of the way this issue was defined so maybe we should close it.
I would... 1. staying in the same function, separate the code into a A. preprocessing part for formatting the input and a B. calculation part. I think this will make maintenance easier. This is simply formatting 2. optimize the calculation by using lists (like above) 3. profiling to assess significance
and depending on the extent of rewriting permitted... 4. a pre-calculated R/S codon matrix i.e. given AGT and CNC -> how many R and S SHM may be involved and store as 125(5^3) x 125 table in memory. This would avoid the need to translate on the fly.
-
reporter We don't need to open a new issue. Same problem.
Best thing at this point is probably to determine how much we care by benchmarking the time it takes on a typical data set. If it's a big deal, we'll see who wants the task. If it's not urgent, then we'll just leave the issue unassigned and active for a later date.
-
reporter - changed title to Improve performance of observedMutations
-
reporter Removed
translateCodonToAminoAcid
and skippedmatch.arg
inmutationType
in 4c6ce6f. -
- changed status to on hold
- Log in to comment