create shazam::indexByUnion
@javh @nimanouri I think distToNearest fist=TRUE groups as DefineClones --act firts, but distToNearest fist=FALSE does not behave as DefineClones --act set. It would be useful to have a shazam::indexByUnion function that we could use wherever gene calls are used to group. This may be relevant in the case of distToNearest, which is used to find the threshold for DefineClones.
Comments (14)
-
-
Grouping methods were always suspicious... I agree to bring everything in one script: either R or python. The way we have now is apple and orange situation.
-
reporter I agree we should avoid replication and my vote goes to move distToNearest to changeo. Actually, it would be great that this step is done automagically inside DefineClones. But in the short term, the easiest seems to have an R version of indexByUnion
-
It'd be a big task to move distToNearest into changeo, but @ruoyijiangyale has already done a decent chunk of the work.
It would still be a lot of work, but probably worth it in the long run.
-
I have no biases either way. In fact I am actually slightly biased towards having all of changeo into R (except maybe MakeDb) rather than the other way around. python is not great for databases in general and groupby is as speedy as a python dict.
-
reporter I am very biased toward R. Say no more. Everything to R! Ok, yes, I understand this is unrealistic. For me, right now, the only annoying thing is having to use distToNearest in R to find a threshold for DefineClones in python. Would be nice to have a
--dist auto
for DefineClones that calls internally a python distToNearest and uses @nimanouri 's method to find the threshold. -
Yeah, it should really all be possible in one step. The thing that annoys me is having to maintain the exact same algorithm and models across two code bases. It's really easy to make a small error and end up with a mutation model that's different between shazam and changeo.
The problem with R is memory. Everything has to be loaded into memory. And it's finicky about wrapping external applications.
-
reporter Confessions time. As I always end up having to load db's to R, I just make intensive use of system2 to run changeo from my beautiful markdown (now I am using bookdown) files. I will ask Santa Claus to bring me an R package that wraps changeo.
-
Heh. That ain't happenin'. If you really want to go that route, you could probably use something like reticulate to call changeo functions. Everything in changeo goes through a "main" function, so you don't actually need to use the commandline. You can just import and call the main function.
-
reporter I will investigate reticulate, thanks for the suggestion.
-
Seeing as I'm spring cleaning issues, what's the consensus here?
I'm inclined to skip it.
-
-
assigned issue to
-
assigned issue to
-
I have not worked on this yet. I will investigate it as soon as my deadlines are over. We have new cloning method in-line which we need to discuss and decide what we are going to do with them.
-
- changed status to resolved
fixed by commit 72ff54d.
- Log in to comment
This would be another case of replicating python code in R. Alternatively, we could move distToNearest into changeo and reuse the existing code from DefineClones.
Maybe we should see how much this matters? Would something analogous to the
--act set
actually change the distance-to-nearest distribution? I dunno. Not sure how we could test that without implementing theindexByUnion
function anyway.