all my viruses absent from output files vContact2 in CyVerse
Hi Ben & Sullivan lab,
I am trying to use vContact2 on CyVerse without success after several attempts using v0.9.8 and v0.9.19
My input is a fasta file containing contigs that were VirSorted, CheckV’ed and VirSorted again as in https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-btv8nn9w
To generate my mapping file I have used vContact2-Gene2Genome_1.1.0 and vContact-Gene2Contig_1.0.1 after talking to Adjie
Jobs do run, but my viruses are all absent from output files
I would appreciate any advice on which vContact2 version to use and which mapping file-generating app to use as well for what will work at CyVerse
If you want I can share my files with you in CyVerse, let me know it!
Thank you,
Paula
Comments (5)
-
-
Hi Josue and Paula,
If you could share files with me on CyVerse, that would help. And if possible, use 0.9.19 or above.
One of the challenging issues here is - as Josue pointed out - there isn’t consistency in what’s being dropped. I have introduced a number of fixes specifically focused on trying to reduce the number of genomes being dropped during the final overview. While it’s been mostly successful, there’s still a number of users who’ve encountered this issue.
I do try to limit how much I request user data (as I can understand concerns for the sensitive nature of research data) - but given that I cannot reproduce missing genomes in any of my testing data - it’s hard to identify the root issue.
The reason? Due to limitations in ClusterONE, vConTACT2 must interrogate multiple input and intermediate files and essentially cross-compare each of those inputs and summarize the data to give VC #s, cluster status, taxonomic info, overlap/outlier data, etc.
-Ben
-
Hey Ben!
Thanks for the response. I’m happy to share my data folder if it would help you get to the bottom of it. I shared it with user “bbolduc-iplant-2015”. Let me know if you need it shared to any other user as well.
-Josué
-
Hi Josué,
Thank you for providing data, I’ll run it on my end and will see if I can’t find the appropriate fix.
-Ben
-
Hi Josué,
Thank you again for sharing the data with me. I was at least able to confirm what I noticed in your genome-by-genome file.
I’ve looked over your data and it appears that many of your contigs are clustered. If you download the genome-by-genome file and sort by your genomes, check the “VC Status” column. If they’re “Clustered”, that means they were placed into a genus-level group. They are not, however, assigned to an order/family/genus, so you’ll see Unassigned, Unassigned, Unassigned. We did not want users to blindly use the taxonomies as authoritative, but it seems we confused everyone in the process.
The latest version on CyVerse is 0.9.19, but I’ll be pushing the 0.10.0 to Bioconda and CyVerse whenever I can find some time.
If you still don’t see your genomes, we can follow up via email.
-Ben
- Log in to comment
Hey Paula and all -
Just wanted to chime in and say that I seem to be having the same issue. I also followed the SOP Paula posted above. I am not missing all of my viruses - but I am missing 6 of them. They show up in the merged.faa, vConTACT_contigs.csv, vConTACT_proteins.csv, vConTACT_profiles.csv files. The proteins for the missing viruses also seem to be passed into the modules_mcl_5.0.clusters, vConTACT_pcs.csv files and even in the c1.ntw and c1.clusters files just not into the gene_by_genome_overview.csv.
There doesn’t seem to be any consistency in what is missing (assembly methods vary, their ID names vary, # of proteins vary, genome length varies). Based off of the c1.clusters file - it would also seem that the novelty of these varies - as I have a few that cluster within known Clostridium phages and others that would be considered novel genera (no singletons/outliers based on that file). 2/6 of the missing viruses are in a cluster with each other.
I called my genes using prodigal (from DRAMv output) and changed the parameter accordingly when I ran vContact2-Gene2Genome_1.1.0. I used vContact2 v0.9.8 on CyVerse.
Any idea what might be happening?