vContact2 output file "genome_by_genome_overview.csv" does not contain all nodes present in output file "c1.ntw"
Hey all -
As the description said, I've noticed in my data that the genome_by_genome_overview.csv file that gets output by vContact2 is missing a chunk of viruses that are present within the network file itself (c1.ntw). I noticed this when I was trying to assign some custom categories so I could color the network nodes by importing the genome_by_genome_overview as attributes.
If this isn’t a bug, is there some vContact2 step that I am missing here that happens before genome_by_genome_overview.csv file is made? Does vContact2 further quality control the c1.ntw nodes and only output a subset (and not all) clusters using some sort of metric? If so- what is that metric?
At first - i thought it was header errors within my input genome ID's that i generated with the gene2genome software - but I can see it happening with RefSeq + ICTV + ... viruses as well (e.g., Mycobacterium~virus~Omega). This virus is present as a node in c1.ntw, but not present in genome_by_genome_overview.csv.
This is similar to issue #47. If I look into files vContact2 uses to generate the network, I find these viruses. For example, Mycobacterium~virus~Omega shows up in the "c1.clusters" file. From that file, I can tell that it is in Viral Cluster # 16, which contains 8 viruses, and has a density of 600, internal weight of 1.68E+04, external weight of 1.57E+04, quality of 0.5161, and a p value of 0.000204966. Other members of this cluster are also absent from the genome_by_genome_overview.csv, and are all Refseq + ICTV + … viruses:
Mycobacterium~phage~Thibault; Mycobacterium~phage~Ariel; Mycobacterium~phage~Redno2; Mycobacterium~virus~Courthouse; Mycobacterium~phage~MiaZeal; Mycobacterium~phage~Minerva; Mycobacterium~phage~Wanda.
Any help or insight would be very much appreciated! Happy to share files on CyVerse as well. Just let me know which user would like access.