Still identifying genome substrings. Consider adjusting input genomes naming. 'Shewanella~sp.~phage~1/4'
Hello Developers,
thank you for developing this super nice tool!!
I am running into an interesting issue here.
At the step “----------------------------Exporting results files-----------------------------” I get the following error:
“Still identifying genome substrings. Consider adjusting input genomes naming.
'Shewanella~sp.~phage~1/4'“
And then that is it, the execution stops…
Snapshot of my .out file from SLURM job:
Snapshot of my .err file from SLURM job:
I have used this successfully many times with other datasets, but this particular errors only with this dataset.
I checked to see if the problem was runtime or memory exceeded and it was not:
\$ seff 8841463
Job ID: 8841463
Cluster: eve
User/Group: brizolat/umb
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 14:41:22
CPU Efficiency: 90.48% of 16:14:06 core-walltime
Job Wall-clock time: 16:14:06
Memory Utilized: 129.31 GB
Memory Efficiency: 64.65% of 200.00 GB
I also checked the intermediary files here to provide you some more details:
\$ ls -lh vcontact-output
total 5,7G
-rw-rw-r--+ 1 brizolat eve_umbmsb 3,1M 15. Dez 06:43 c1.clusters
-rw-rw-r--+ 1 brizolat eve_umbmsb 89M 15. Dez 06:36 c1.ntw
-rw-rw-r--+ 1 brizolat eve_umbmsb 15M 15. Dez 03:08 merged_df.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 431M 14. Dez 17:57 merged.dmnd
-rw-rw-r--+ 1 brizolat eve_umbmsb 418M 14. Dez 17:56 merged.faa
-rw-rw-r--+ 1 brizolat eve_umbmsb 2,3G 14. Dez 22:27 merged.self-diamond.tab
-rw-rw-r--+ 1 brizolat eve_umbmsb 1,6G 14. Dez 22:29 merged.self-diamond.tab.abc
-rw-rw-r--+ 1 brizolat eve_umbmsb 431M 14. Dez 23:01 merged.self-diamond.tab.mci
-rw-rw-r--+ 1 brizolat eve_umbmsb 41M 14. Dez 23:20 merged.self-diamond.tab_mcl20.clusters
-rw-rw-r--+ 1 brizolat eve_umbmsb 47M 14. Dez 23:01 merged.self-diamond.tab_mcxload.tab
-rw-rw-r--+ 1 brizolat eve_umbmsb 419K 15. Dez 07:09 modules_mcl_5.0.clusters
-rw-rw-r--+ 1 brizolat eve_umbmsb 259K 15. Dez 07:10 modules_mcl_5.0_modules.pandas
-rw-rw-r--+ 1 brizolat eve_umbmsb 7,9M 15. Dez 07:10 modules_mcl_5.0_pcs.pandas
-rw-rw-r--+ 1 brizolat eve_umbmsb 134M 15. Dez 07:09 modules.ntwk
-rw-rw-r--+ 1 brizolat eve_umbmsb 196K 15. Dez 07:14 sig1.0_mcl2.0_clusters.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 17M 15. Dez 07:14 sig1.0_mcl2.0_contigs.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 204K 15. Dez 07:14 sig1.0_mcl2.0_modsig1.0_modmcl5.0_minshared3_link_mod_cluster.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 158K 15. Dez 07:14 sig1.0_mcl5.0_minshared3_modules.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 11M 15. Dez 03:08 vConTACT_contigs.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 8,1M 15. Dez 03:08 vConTACT_pcs.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 53M 15. Dez 03:08 vConTACT_profiles.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 123M 15. Dez 03:08 vConTACT_proteins.csv
-rw-rw-r--+ 1 brizolat eve_umbmsb 3,0M 15. Dez 07:56 viral_cluster_overview.csv
Could you please help me out on this one?
Thanks a lot and thank you for developing this amazing tool.
Best,
Rodolfo
Comments (2)
-
-
Similar mentions of this genome affecting runs have been reported elsewhere. This seems to be an issue where vContact2 can’t identify a specific genome within the network because the name is contained within another name, e.g. Pseudomonas Phage P1 is a subset of Pseudomonas Phage P10. A fix for this was implemented quite some time ago - but that seems to only affect user genomes, not the reference database.
We’ll be updating the reference DB to deal with this sequence.
- Log in to comment
Hello!
Thanks for developing this nice tool.
I am also facing the same issue as mentioned by Rodolfo.
However, I am not sure whether it is an error or not, as the run ends with only the message:
----------------------------Exporting results files-----------------------------
There were 687 genomes (including refs) that were singleton, outlier or overlaps.
Still identifying genome substrings. Consider adjusting input genomes naming.
'nchoe_02_contig_11'
There was no error message or warning even!
Interestingly in the output folder, I am missing the important genome_by_genome_overview.csv file.
The run log is provided below
I am a beginner, so please pardon and guide me.
Best
Adhip