kleinstein / shazam / issues / #111 - Improve performance of observedMutations — Bitbucket

Issue #111 on hold

Jason Vander Heiden created an issue 2018-07-13

Replace rbind with dplyr::bind_rows in observedMutations. Also, probably remove s2c and c2s in favor of equivalent stringi functions. And maybe cbind to dplyr::bind_cols

Profile.

Comments (9)

Jason Vander Heiden reporter
- edited description
- 2018-07-13T16:13:07+00:00
Roy Jiang
Will hand to Jason but can take over if changes are not affecting speed. Curious about the results...
- 2018-07-13T18:09:55+00:00
Jason Vander Heiden reporter
I swapped out the rbind and it didn't help. Turns out it was a matrix rbind and not a data.frame rbind, so there's that.

Looks like we need to search elsewhere for the problem. Probably in calcObservedMutations and associated helpers.
- 2018-07-13T19:15:32+00:00
Jason Vander Heiden reporter
Some ideas:

1) translateCodonToAminoAcid doesn't need to be a function. It's just extra overhead:
```
> system.time(replicate(100000, shazam:::translateCodonToAminoAcid("TGA")))
   user  system elapsed 
  1.087   0.001   1.093 
> system.time(replicate(100000, AMINO_ACIDS["TGA"]))
   user  system elapsed 
  0.167   0.000   0.168 
```
2) mutationType is a private function, so we don't need to use match.arg to verify the commandline arguments.
- 2018-07-13T19:26:10+00:00
Roy Jiang
Ok, are you assigning back to me? Any changes I would include would not be within the scope of the way this issue was defined so maybe we should close it.

I would... 1. staying in the same function, separate the code into a A. preprocessing part for formatting the input and a B. calculation part. I think this will make maintenance easier. This is simply formatting 2. optimize the calculation by using lists (like above) 3. profiling to assess significance

and depending on the extent of rewriting permitted... 4. a pre-calculated R/S codon matrix i.e. given AGT and CNC -> how many R and S SHM may be involved and store as 125(5^3) x 125 table in memory. This would avoid the need to translate on the fly.
- 2018-07-13T21:20:03+00:00
Jason Vander Heiden reporter
We don't need to open a new issue. Same problem.

Best thing at this point is probably to determine how much we care by benchmarking the time it takes on a typical data set. If it's a big deal, we'll see who wants the task. If it's not urgent, then we'll just leave the issue unassigned and active for a later date.
- 2018-07-13T21:40:29+00:00
Jason Vander Heiden reporter
- changed title to Improve performance of observedMutations
- 2018-07-14T18:51:08+00:00
Jason Vander Heiden reporter
Removed translateCodonToAminoAcid and skipped match.arg in mutationType in 4c6ce6f.
- 2018-07-14T18:52:19+00:00
ssnn
- changed status to on hold
- 2020-04-03T15:21:04+00:00
Log in to comment

Assignee: Roy Jiang

Type: enhancement

Priority: minor

Status: on hold

Milestone: –

Votes: 0

Watchers: 3

Jira: the preferred issue tracker for Bitbucket. Join the team!