ConClave Version 2 Documentation

Issue #18 new
Former user created an issue

Hi, It would be helpful if you could provide a bit more explanation of how "ConClave version 2 allows for several closely related templates to present, which limits the assumptions taken by ConClave 1 at the cost of false positives".

Thanks, Manu

Comments (2)

  1. ptlcc

    Hi Manu

    The ConClave2 algorithm starts by calculating the ConClave scores as usual, which is used to identify the significantly overrepresented template sequences.
    The significantly overrepresented template sequences is then used to limit the number of equally well matching templates towards each query sequence. One template (t_i) is then chosen for each query sequence, amongst this reduced set, with a probability equal to the proportional ConClave score of that template (t_i) compared to the remaining templates matching that query sequence equally well. For example:

    Best_Templates = {1, 2, 3, 4}
    ConClave_Scores = {2, 0, 4, 1}

    Will choose template 1 with probability 2 / 7, template 2 with 0 probability, template 3 with probability 4 / 7 and template 4 with probability 1 / 7.
    The choice has been implemented with a random seed generated from the query sequence, so that it is reproducible between runs.

    This naturally decreases the amount of false negatives, but at the cost of false positives. Usually the gain of false positives is rather limited, but it depends on the sequence data and the database.

    Best,
    Philip

  2. Manu S

    Hi Philip,

    Thank you very much for that detailed explanation. That clears a lot of my doubts.

    Thanks,

    Manu

  3. Log in to comment