how to generate the input files for the vcontact run?

Ben Bolduc

Hi Rodolfo,

The --raw-proteins is indeed an amino-acid translation of nucleotide sequences. I often use prodigal to generate the output, but any gene prediction software would work.

prodigal -i viral_genomes.fna -o viral_genomes.genes -a viral_genomes.faa -p meta

Once done, you’ll need to create the “Gene to Genome” mapping file (--proteins-fp), which, as you stated, links the protein names to their genomes. I have included a naive wrapper, vcontact2_gene2genome.py, that will convert the gene predictions to this mapping file. The wrapper can also handle MetaGeneMark and some outputs from NCBI. Users of CyVerse - a DOE-funded Cyberinfrastructure - can use the “vContact2-Gene2Genome” app to generate this file. And anyone using KBase will have their genomes automatically processed.

vcontact2_gene2genome -p viral_genomes.faa -o viral_genomes_g2g.csv -s 'Prodigal-FAA'

That should work for your specific situation. However, you’re free to create the mapping file however you want. All that’s required is a 3-column table with the headers “genome_id, protein_id, keywords.” You can also create keyword annotations with the “keywords” column, and those will be aggregated and summarized in one of the vcontact2 outputs. But very few people use keywords, and even fewer look at those outputs.

Cheers,

Ben

2020-06-17T17:28:17+00:00

Ben Bolduc

changed status to resolved

Considered resolved. As always, check the readme and wiki for further assistance!

2020-07-02T15:58:46+00:00

Zhanwen Cheng

Hi Ben, I am using prodigal and vcontact2_gene2geneome to generate g2g.csv with your previous published ‘GOV2_viral_populations_larger_than_5KB_or_circular.fasta’. I noticed that there was 488131 viral contigs in the fasta file, but only 452963 contigs could be predicted by prodigal as input into vcontact2. What should I do for the unpredicted 35168 viral contigs?

2021-01-02T14:02:00+00:00

Adhip Mukhopadhyay

Hello

I am new to Cyverse and using the vContact2-Gene2Genome 1.1.0 https://de.cyverse.org/apps/agave/vContact2-Gene2Genome-1.1.0u1.

I have submitted a job to create the “Gene to Genome” mapping file about 24 hours ago. The analysis id is 7bcd271c-0f95-4469-9028-ddd09f4b94ea-007.

The status is still showing ‘submitted’, but the info panel is showing ‘2021-06-13 14:37:35 - FINISHED; 2021-06-13 14:37:35 - Job completed successfully’.

In the output directory, the ‘protein.csv’ is also empty.

Please help me to resolve the issue.

Thanks

Adhip

2021-06-14T10:25:51+00:00

Comments (4)