ConClave Version 2 Documentation

ptlcc

Hi Manu

The ConClave2 algorithm starts by calculating the ConClave scores as usual, which is used to identify the significantly overrepresented template sequences.
The significantly overrepresented template sequences is then used to limit the number of equally well matching templates towards each query sequence. One template (t_i) is then chosen for each query sequence, amongst this reduced set, with a probability equal to the proportional ConClave score of that template (t_i) compared to the remaining templates matching that query sequence equally well. For example:

Best_Templates = {1, 2, 3, 4}
ConClave_Scores = {2, 0, 4, 1}

Will choose template 1 with probability 2 / 7, template 2 with 0 probability, template 3 with probability 4 / 7 and template 4 with probability 1 / 7.
The choice has been implemented with a random seed generated from the query sequence, so that it is reproducible between runs.

This naturally decreases the amount of false negatives, but at the cost of false positives. Usually the gain of false positives is rather limited, but it depends on the sequence data and the database.

Best,
Philip

2020-09-29T08:17:41+00:00

Comments (2)

ptlcc
Hi Manu

The ConClave2 algorithm starts by calculating the ConClave scores as usual, which is used to identify the significantly overrepresented template sequences.
The significantly overrepresented template sequences is then used to limit the number of equally well matching templates towards each query sequence. One template (t_i) is then chosen for each query sequence, amongst this reduced set, with a probability equal to the proportional ConClave score of that template (t_i) compared to the remaining templates matching that query sequence equally well. For example:

Best_Templates = {1, 2, 3, 4}
ConClave_Scores = {2, 0, 4, 1}

Will choose template 1 with probability 2 / 7, template 2 with 0 probability, template 3 with probability 4 / 7 and template 4 with probability 1 / 7.
The choice has been implemented with a random seed generated from the query sequence, so that it is reproducible between runs.

This naturally decreases the amount of false negatives, but at the cost of false positives. Usually the gain of false positives is rather limited, but it depends on the sequence data and the database.

Best,
Philip
- 2020-09-29T08:17:41+00:00
Manu S
Hi Philip,

Thank you very much for that detailed explanation. That clears a lot of my doubts.

Thanks,

Manu
- 2020-09-29T16:33:55+00:00
Log in to comment