NOUMENON is a Contact Prediction (CP) validation dataset that has been designed in order to remove the observational selection bias due to the enrichment of available homologs for proteins with known PDB structure. It has been sampled in order to have the same distributions of homologs expected from random un-characterized Uniprot sequences, as discussed in the paper (under review).
What is this repository for?
- Development and validation of contact prediction softwares based on evolutionary information
- Version: 1.0
How do I can use it?
We provide both the CONTACT MAPS (one per PDB file, in the contacts/ folder) for testing the performances and the jackhmmer (http://hmmer.org/) multiple sequence alignments (MSAs) computed (i) searching against Uniref100 (ii) with 3 iterations and E-value cut-off = 0.0001 .
We sampled the 150 proteins in NOUMENON in order to (i) respect some length and contact density commonly adopted in the CP field (see related publication, now under review) and (ii) to ensure that the distribution of the homologs available in the alignments_jackhmmer_3_iter/ folder is indistinguishable from the distributions of homologs obtainable for a random sample of uncharacterized proteins in Uniprot.
This removes the risk of inflation of CP methods' performances due to the fact that they are tested on proteins from PDB that tend to have more homologs than real application cases.
Who do I talk to?
- NOUMENON has been developed and is maintained by G. Orlando, D. Raimondi and W. Vranken
- Contact: orlando[dot ]gabriele89[ at]gmail[dot ]com, daniele[dot]raimondi[at ]vub[ dot ]ac[ dot]be