To generate a taxonomic count table
Dear vConTACT2 team, Thanks for your pretty good work with vConTACT2 first of all. I just wonder how should I get a taxonomic count table like OTU table? Now, I get the 'genome_by_genome_overview.csv' and 'viral_cluster_overview.csv'. Does it reflect a specific viral or someone within one group(viral cluster) with each line in 'genome_by_genome_overview.csv '? Could I take each line as a taxonomic unit and use the 'Members' information of 'viral_cluster_overview.csv' to sum each VC reads count to generate a count table? Or, could you show me another best way to generate a taxonomic count table base on vConTACT result. I am looking forward your replay. Thanks a lot!
genome_by_genome_overview.csv:
,Genome,Order,Family,Genus,VC,VC Status,Size,VC Subcluster,VC Subcluster Size,Quality,Adj P-value,Topology Confidence Score,Genera in VC,Families in VC,Orders in VC,Genus Confidence Score
0,Achromobacter~phage~83-24,Caudovirales,Siphoviridae,Jwxvirus,0_0,Clustered,2,VC_0_0,2,0.1952,0.95226825,0.1859,1,1,1,1.0
1,Achromobacter~phage~JWAlpha,Caudovirales,Podoviridae,Jwalphavirus,8_1,Clustered,11,VC_8_1,11,0.4755,1.0,0.4755,3,1,1,0.9818
viral_cluster_overview.csv:
,,VC,Size,Internal Weight,External Weight,Quality,P-value,Min Dist,Max Dist,Total Dist,Below Thres,Taxon Prediction Score,Avg Dist,Genera,Families,Orders,Members
0,VC_0_0,2,155.06242197581085,639.2083422889191,0.1952261482510373,0.04773175196323191,1.7320508075688772,1.7320508075688772,1,1,1.0,1.7320508075688772,1,1,1,"Achromobacter~phage~83-24,Achromobacter~phage~JWX"
1,VC_1000_0,5,16.007331291300495,12.280942590828833,0.5658645471971648,0.3717981656013571,1.7320508075688772,2.6457513110645907,10,10,1.0,2.3080226590546964,1,1,1,"k141_1143022_length_14828_cov_72.0000,k141_1292517_length_10485_cov_134.5822,k141_1980014_length_12453_cov_102.0362,k141_4986945_length_9470_cov_84.1939,k141_767706_length_6153_cov_139.2982"
Comments (2)
-
-
- changed status to on hold
Awaiting future update
- Log in to comment
Hello,
To get a taxonomic count, it’s a little more complicated than simply taking the lines from the genome-by-genome file and counting up a taxon column.
The easiest way I can think to do this would be to:
Identify the “majority rules” Order, Family, Genus for each VC
Count up the members of each VC, using the majority taxon to describe that VC
A future update to vcontact will “fix” this annoying issue for users.
Thanks for your use of the tool!
Cheers,
Ben