genomicepidemiology / kma / issues / #70 - Faster way to index a database ? — Bitbucket

Issue #70 new

Gustavo Tamasco created an issue 2022-11-03

I am trying to index a 11Gb fasta file.

The command is running for 9 days and I still have no result, the indexing is still running...

Is there a way to speedup this process? I am using the default run for that kma index -i db_fasta -o database .

Is there a safe way to apply multiprocessing on that ?

Best,
Tamasco

Comments (4)

ptlcc
Dear Tamasco

There is not a possibility to use multiprocessing on kma index. But you could subsample the k-mers using the “-Sparse” option or index the minimizers (-m).

Best,
Philip
- 2022-11-07T10:39:23+00:00
Gustavo Tamasco reporter
Hey Philip, thanks for the advice.

One question, using the Sparse flag, less numbers of Kmers will be used. Can this reduce the resolution of my mappings down the road ?

Just out of curiosity … Why no indexing tools use multiprocessing ? Is there a reason for that ?

‌

Best,
Tamasco
- 2022-11-07T13:39:42+00:00
ptlcc
Hi Tamasco

The resolution can be lowered, but we have not seen anything notable for prefixes of length two or less, as you then will have half-overlapping k-mers on average.

Some mapping and alignment methods do offer multithreading on indexing, but these usually comes at a relatively high memory cost. This is because it hard to parallelize updates to the same data structure, as you need to ensure that two processes are not writing/editing the same piece of memory at once.
When performing the mapping and alignment it is easier as the data structure is constant and you can analyse the individual input reads more or less individually. That is, you just need to make sure only one thread is reading and writing at a time, together with some collection steps such as the ConClave algorithm.

Best,
Philip
- 2022-11-08T11:18:11+00:00
Gustavo Tamasco reporter
Good to know that! I will make some tests using the -Sparse flag.

Oh I see. Thanks for the explanation and for the advice !

Best,

Tamasco
- 2022-11-08T12:02:38+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: trivial

Status: new

Votes: 0

Watchers: 1