error during vcontact2 run

Issue #27 resolved
Former user created an issue

dear developer,

I ran vcontact2 using the following command: vcontact2 --raw-proteins viral.seqs.filtered.faa --rel-mode 'Diamond' --proteins-fp gene_to_genome.mapping.csv --db 'ProkaryoticViralRefSeq201-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/***/tools/cluster_one-1.0.jar --output-dir vcontact2_out

protein names in faa file are as follow: k141_10497897_flag_0_multi_60.8734_len_15039||full_ORF_1 # 1 # 2535 # 1 # ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.570 gene-to-genome mapping are as follow: k141_10497897_flag_0_multi_60.8734_len_15039||full_ORF_1,k141_10497897_flag_0_multi_60.8734_len_15039||full,ORF_1_1-2535_1

but during run it reported huge number of warning: WARNING:vcontact2.protein_clusters: ** protein(s) without contig: frozenset( and finally stoped by the following error: INFO:vcontact2: Saving intermediate files...

----------------------------------Loading data---------------------------------- INFO:vcontact2: Read 0 entries (dropped 3503 singletons) from vcontact2_out/vConTACT_profiles.csv Traceback (most recent call last): File "/home/****/tools/anaconda2/envs/vcontact2/lib/python3.8/site-packages/scipy/sparse/coo.py", line 140, in init obj, (row, col) = arg1 ValueError: not enough values to unpack (expected 2, got 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/*/tools/anaconda2/envs/vcontact2/bin/vcontact2", line 757, in <module> main(options) File "/home//tools/anaconda2/envs/vcontact2/bin/vcontact2", line 545, in main matrix, singletons = vcontact2.pcprofiles.build_pc_matrices(profiles, contigs_csv_df, pcs_csv_df) File "/home//tools/anaconda2/envs/vcontact2/lib/python3.8/site-packages/vcontact2/pcprofiles.py", line 358, in build_pc_matrices matrix = sparse.coo_matrix(([1]len(profiles), (zip(*profiles.values))), shape=(len(contigs), len(pcs)), File "/home/*/tools/anaconda2/envs/vcontact2/lib/python3.8/site-packages/scipy/sparse/coo.py", line 142, in init raise TypeError('invalid input format') TypeError: invalid input format

Comments (2)

  1. Ben Bolduc

    This is almost definitely an issue with parsing the gene-to-genome csv file, likely unable to match between the headers in the faa file and csv. The biggest reason I suspect this is the issue is because of the line: “INFO:vcontact2: Read 0 entries (dropped 3503 singletons). So vContact2 is reading your csv file and filtering everything out before getting started.

    Also ensure that protein_id, contig_id and keywords are headers in the gene-to-genome file. This could be an issue too, but usually, the run will start and you’ll lose a genome.

    If this issue persists, can you share the first 5 or so lines of your csv and the corresponding faa file?

    -Ben

  2. Log in to comment