DefineClones indexJunctions is uber slow with action='set'
Issue #14
resolved
DefineClones bygroup --act set is very slow. ~400,000 rows took a couple minutes to index the v/j/junction lengths with action='first' and 20+ hours with action='set'. Not quite sure how long total, as I terminated it before it finished.
Was about 20 minutes with ~75,000 rows, so I'm guessing the set method is probably O(n^2) right now. Implying a nested for loop. There's probably a hash table (set or dict key) method we should use instead.
Comments (2)
-
reporter -
reporter - changed status to resolved
The new faster version passes all tests in 1bd6fe2. I think we're good.
- Log in to comment
I've gotten some complaints about this speed issue from the AIRR group. I think we should either fix this for v0.3.3 or change the default to 'first', the former being preferred. It gives a bad first impression as is, if you try to run it on a large-ish data set, species with incomplete germlines, etc.