ERROR:vcontact2: Error in exporting the final summary data

Issue #7 resolved
Tommi Vatanen created an issue

Hello and thanks for this tool. I have successfully run the program in your provided test data but when I try to run it on my own test data of 50 viral proteins (attached in this report) I get the following error:

---------------------------Link modules and clusters----------------------------
ERROR:vcontact2: Error in linking modules and clusters
ERROR:vcontact2: 'NoneType' object has no attribute 'clusters'
Traceback (most recent call last):
  File "/home/tvat287/miniconda3/envs/vContact2/bin/vcontact", line 650, in main
    link = modules.link_modules_and_clusters_df(gc.clusters, gc.contigs, thres=args.link_sig,
AttributeError: 'NoneType' object has no attribute 'clusters'


----------------------------Exporting results files-----------------------------
ERROR:vcontact2: Error in identifying excluded contigs: 'NoneType' object has no attribute 'name'
ERROR:vcontact2: Error in exporting the final summary data: 'NoneType' object has no attribute 'contigs'

Any help is greatly appreciated!

Thank you,
Tommi

Comments (14)

  1. Ben Bolduc

    Hi Tommi,

    I appreciate the bug report. Usually, the NoneType cluster error occurs when ClusterONE fails to generate a *.clusters file. Can you check to see if that file was created? It’s often named “c1.clusters”. If that file hasn’t been generated, double-check that the full path to the clusterONE java file was specified on the command line:

    --c1-bin /path/to/location/of/cluster_one-1.0.jar
    

    Your input files look like, so it’s likely ClusterONE is the culprit!

    Cheers,

    Ben

  2. Megan Dillon

    Hi Ben,

    I’m getting the same error when running the test.

    \$ vcontact --raw-proteins /usr/local/software/sl-7.x86_64/sources/vcontact2/test_data/VIRSorter_viral_prots.faa --rel-mode 'Diamond' --proteins-fp /usr/local/software/sl-7.x86_64/sources/vcontact2/test_data/proteins.csv --db 'ProkaryoticViralRefSeq85-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /usr/local/software/sl-7.x86_64/sources/vcontact2/bin/cluster_one-1.0.jar --output-dir ~/Software/VirSorter/VirSorted_Outputs

    ------------------------Contig Clustering & Affiliation-------------------------
    ERROR:vcontact2: Error in contig clustering
    ERROR:vcontact2: No columns to parse from file
    Traceback (most recent call last):
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/bin/vcontact", line 604, in main
    mode=args.vc_mode)
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/lib/python3.6/site-packages/vcontact/contig_clusters.py", line 92, in init
    self.cluster_one, self.one_opts)
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/lib/python3.6/site-packages/vcontact/contig_clusters.py", line 227, in one_cluster
    return self.load_one_clusters(fi_clusters)
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/lib/python3.6/site-packages/vcontact/contig_clusters.py", line 318, in load_one_clusters
    clusters_df = pd.read_csv(one_fn, header=0)
    File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
    File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 818, in init
    self._make_engine(self.engine)
    File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 1049, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
    File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 1695, in init
    self._reader = parsers.TextReader(src, **kwds)
    File "pandas/_libs/parsers.pyx", line 565, in pandas._libs.parsers.TextReader.cinit
    pandas.errors.EmptyDataError: No columns to parse from file
    ERROR:vcontact2: Error in viral clusters
    ERROR:vcontact2: 'NoneType' object has no attribute 'contigs'
    Traceback (most recent call last):
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/bin/vcontact", line 622, in main
    vc = vcontact.cluster_refinements.ViralClusters(gc.contigs, profiles_fp, optimize=options.optimize)
    AttributeError: 'NoneType' object has no attribute 'contigs'

    --------------------------------Protein modules---------------------------------

    ---------------------------Link modules and clusters----------------------------
    ERROR:vcontact2: Error in linking modules and clusters
    ERROR:vcontact2: 'NoneType' object has no attribute 'clusters'
    Traceback (most recent call last):
    File "/usr/local/software/sl-7.x86_64/modules/vcontact2/bin/vcontact", line 650, in main
    link = modules.link_modules_and_clusters_df(gc.clusters, gc.contigs, thres=args.link_sig,
    AttributeError: 'NoneType' object has no attribute 'clusters'

    ----------------------------Exporting results files-----------------------------
    ERROR:vcontact2: Error in identifying excluded contigs: 'NoneType' object has no attribute 'name'
    ERROR:vcontact2: Error in exporting the final summary data: 'NoneType' object has no attribute 'contigs'

    I think I have the full path the cluster_one-1.0.gar correct and c1.clusters is generated

    \$ ls ~/Software/VirSorter/VirSorted_Outputs/
    c1.clusters merged.faa merged.self-diamond.tab_mcl20.clusters modules_mcl_5.0_pcs.pandas vConTACT_pcs.csv
    c1.ntw merged.self-diamond.tab merged.self-diamond.tab_mcxload.tab modules.ntwk vConTACT_profiles.csv
    merged_df.csv merged.self-diamond.tab.abc modules_mcl_5.0.clusters sig1.0_mcl5.0_minshared3_modules.csv vConTACT_proteins.csv
    merged.dmnd merged.self-diamond.tab.mci modules_mcl_5.0_modules.pandas vConTACT_contigs.csv

    I would greatly appreciate any help here!

    Thanks,

    Megan

  3. Ben Bolduc

    Hi Megan,

    Thanks for filling out the bug report. Is the c1.clusters file empty, if not, what are the 1st few lines? And does the network file (c1.ntw) have content? Also, did you start with a fresh job (i.e. a new output directory)?

    I’m trying to figure out where in the processing did something go awry. Usually, it’s something with cluster_one, but it could also be a momentary blip with the code or system.

    In the future, I’ll integrate a more judicious test so that users can figure out exactly where something goes wrong so we can narrow down fixes faster.

  4. Megan Dillon

    ah, the c1.clusters file IS empty.

    The c1.ntw has content:

    \$ head ~/Software/VirSorter/VirSorted_Outputs/c1.ntw
    Achromobacter~phage~JWX Achromobacter~phage~83-24 137.51358798097203
    Achromobacter~phage~phiAxp-1 Achromobacter~phage~83-24 20.050327607357058
    Acinetobacter~phage~IME_AB3 Achromobacter~phage~83-24 8.495014320953366
    Burkholderia~phage~BcepGomr Achromobacter~phage~83-24 16.647386386818162
    Burkholderia~phage~KL1 Achromobacter~phage~83-24 10.983036439204085
    Paracoccus~phage~vB_PmaS_IMEP1 Achromobacter~phage~83-24 8.834976445171403
    Phage~phiJL001 Achromobacter~phage~83-24 6.834486715043384
    Pseudomonas~phage~73 Achromobacter~phage~83-24 11.21967556201825
    Pseudomonas~phage~M6 Achromobacter~phage~83-24 9.181472719004518
    Pseudomonas~phage~MP1412 Achromobacter~phage~83-24 9.585804543616376

    I am 95% sure I specified a new output directory, but I’ll run the command again to be absolutely sure.

    Thanks again for your help with this, really looking forward to results from this tool!

  5. Megan Dillon

    Ok, update.

    When I re-ran the test file, I got no errors. However, I ran on my data and got this none-type error again. Both the c1.ntw and c1.cluster files have content.

    Not sure where to go from here :/

  6. Megan Dillon

    Hi again Ben,

    I ran the exact same command on my data again, and this time, got no none-type error. But I don’t have a genome_by_genome_overview.csv. I also have some questions about what exactly all the output files are. Do you have any documentation on what all the outputs are giving? I’m quite curious about the viral_clusters_overview.csv and the merged_df.csv, specifically.

    Thanks again!

    Megan

  7. Ben Bolduc

    Hi Megan,

    Thanks for sticking with this. If the test data works and there’s a genome_by_genome file, then that’s a good step forward.

    Getting no errors and no overview file? A few more Qs on my end: Does your proteins.csv file (the one you created that was genome and gene information) have your genomes and do they have PCs for their genes? Does the c1.ntw file have your genomes? And finally, does viral_clusters_overview have your genomes?

    The order of the Qs above follows the processing of your data, with proteins.csv being the 1st place for problems. If your genome+PCs are in there then that’s good. Next, c1.ntw. This is the actual network file generated from the algorithm that scores relationships. If your genomes aren’t in there, then something went wrong with something internal to vConTACT. Usually, it has to do with the proteins.csv file. If your genomes are in c1.ntw, then parsing the viral_clusters_overview is the next area for problems. If this file has your genomes, then it’s almost definitely an issue parsing out the sequence names or some mismatch between sequence names.

    If the answer to the 1st Q is no, then it’ll be no for the next 2. If it’s no for the 2nd q, then it’ll be no for the 3rd, etc.

    With regard to documentation on output, it’s only focused on most of the user-relevant outputs: https://bitbucket.org/MAVERICLab/vcontact2/wiki/Home#output-files

    merged_df is a simple merging of the reference database and the user's. It contains the taxonomies of the references and re-formats the user data for better suitability with downstream tools.

    viral_clusters is the “raw” parsing result from merging the clusterONE output, the internal (to vConTACT) calculations of confidences and other metrics for the clusters themselves. This file’s only purpose (from my standpoint) is an intermediate file to generate the genome_by_genome file. And the only thing that happens from the clusters_overview to the genome_by_genome is splitting the clusters into their individual genomes and reformatting of the columns.

  8. Megan Dillon

    Ben, thank YOU for sticking with this!

    1. I generated the proteins.csv file with the gene2genome.py helper script and I believe it DOES have genomes and PCs because it looks like it’s supposed to (no keywords, but that doesn’t seem crucial).

    \$ head Imnavait_Gene2Genome.csv
    protein_id,contig_id,keywords
    VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular-gene_0,VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular,None
    VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular-gene_2,VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular,None
    VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular-gene_3,VIRSorter_k127_803275_flag=3_multi=6_0040_len=1611-circular,None

    2. The c1.ntw DOES NOT include my genomes

    \$ head c1.ntw
    Achromobacter~phage~JWX Achromobacter~phage~83-24 166.69791567410283
    Achromobacter~phage~phiAxp-1 Achromobacter~phage~83-24 24.943148819686904

    \$ tail c1.ntw
    Pseudomonas~phage~Bf7 Yersinia~phage~vB_YenP_ISAO8 8.674741741163844
    Pseudomonas~phage~YMC11/06/C171_PPU_BP Yersinia~phage~vB_YenP_ISAO8 2.3921963282838608
    Ralstonia~phage~RSB1 Yersinia~phage~vB_YenP_ISAO8 18.368698585439667

    3. It looks to me like the viral_clusters_overview.csv DOES include my genomes.

    \$ head viral_cluster_overview.csv
    ,VC,Size,Internal Weight,External Weight,Quality,P-value,Min Dist,Max Dist,Total Dist,Below Thres,Taxon Prediction Score,Avg Dist,Genera,Families,Orders,Members
    0,VC_0_0,2,166.69791567410283,526.3633471363255,0.24052406997633535,0.050060481468243136,2.23606797749979,2.23606797749979,1,1,1.0,2.23606797749979,1,1,1,"Achromobacter~phage~83-24,Achromobacter~phage~JWX"
    1,VC_10002_0,2,4.162435714766747,0,1.0,0.0,1.0,1.0,1,1,1.0,1.0,1,1,1,"VIRSorter_NODE_118228_length_1519_cov_0_352730,VIRSorter_k127_795770_flag=1_multi=7_0000_len=1101"

    Thanks for the information about merged_df and viral_clusters. I think they might be what I’m looking for, really. I don’t want to visualize a network just yet, but I’m trying to figure out the most likely taxonomic annotations for my VirSorter output.

  9. Ben Bolduc

    Thanks for sending this Megan.

    Did you “grep <genome> c1.ntw” to find your genomes? They could be placed elsewhere in the file.

    It does look like your genomes are in clusters, at least 2 of them. VC_10002_0, representing initial VC 10002 and refined cluster 0. If your genomes are found in any clusters with references, you can at least identify that from the viral_cluster_overview file.

    I’m more bothered that there’s no final output file, when it’s a “simple” parsing of the viral_clusters_overview. I’ll need to investigate this further and perhaps put in a few checks to prevent this from occurring.

    Let me work on this and get a few updates out before year’s end.

  10. Ali Hassan Elbehery

    I also have an error in exporting the final summary data. I ran example data with the option --db None and got this error:

    ERROR:vcontact2: Error in identifying excluded contigs: local variable 'merged_fp' referenced before assignment
    ERROR:vcontact2: Error in exporting the final summary data: local variable 'excluded' referenced before assignment

    When I ran the same data with database (--db) set to default, it ran successfully with no errors. Could you please help me understand why I get this error?

  11. Ben Bolduc

    Hi Ali,

    vConTACT is designed to run with the DB enabled. The reference taxonomy from the databases is used to calibrate and refine the initial clusters, essentially ensuring they are comparable to ICTV genera. The error you received is a python syntax error, but I suspect there’s an earlier error causing an issue and I just didn’t catch the error early enough.

    If you want to use your own sequences w/out a reference database, I strongly recommend including taxonomic headers in your gene-to-genome file. “order, family, subfamily, genus” are all recognized, though genus is most important.

    Cheers,

    Ben

  12. Ben Bolduc

    Another two updates (0.9.16 and 0.9.17) have been updated and the --db None option should be working without modification to the original input files.

    I might close this issue soon, as the original issue was from over 6 months ago and no others have mentioned this issue since.

  13. Ben Bolduc

    Closing due to inactivity - and I think updates published in the past couple of months have resolved this issue. Please re-open if this isn't the case.

  14. Log in to comment