Quality Control

Issue #58 new
Former user created an issue

What quality control scores should I be looking for?

Comments (1)

  1. Ben Bolduc

    Are you referring to quality scores for input viral genomes or the “quality” column for a cluster in the genome-to-genome reports file? There’s no hard-and-fast rule about the quality level. It’s a measure of how cohesive the nodes are within a given cluster. High quality clusters have fewer external connections, more internal connections, and edge weights are higher between cluster members than between members of other clusters.

    Low-quality (0.3-0.5) could indicate VC members share many genes with other VCs. Shared genes, like portal proteins, could be shared beyond genus level. In terms of viral evolution, this is great for determining shared phylogeny. In terms of network structure, it “diffuses” a VC’s connections.

    High-quality (0.7-1) would follow to have a tight-knit structure. These genomes likely share few, if any genes with other groups.

    That said, quality is qualitative. There can be a large number of combinations (of the above explanations) that could result in similar low/high quality VCs. Always remember that what we’re looking at are shared genes among viral genomes. Sharing 50% of your genes but not being in the same genus is possible. For a quality measure, that could be “low” or “not good.”

    Not sure if that helped, or only confused. Let me know if that helped.

    -Ben

  2. Log in to comment