Error with test data

Issue #5 resolved
Former user created an issue

Hi,

thanks for bringing up this great software. I just tried to run with the test data using singularity built from the def file in your repository, and run it with test data, got the following error, any idea?

============================This is vConTACT 0.9.11=============================



----------------------------------Pre-Analysis----------------------------------
INFO:vcontact2: Found ClusterONE: /usr/local/bin/cluster_one-1.0.jar
INFO:vcontact2: Found Diamond: /miniconda3/bin/diamond
INFO:vcontact2: Found MCL: /miniconda3/bin/mcxload
INFO:vcontact2: Identified 15 CPUs
INFO:vcontact2: Using reference database: ProkaryoticViralRefSeq85-Merged
INFO:vcontact2: Using existing directory VirSorted_Outputs.


------------------------------Reference databases-------------------------------
INFO:vcontact2: Identified existing 'merged.faa' in output path: re-using...
INFO:vcontact2: Re-using existing Diamond file...


-------------------------------Protein clustering-------------------------------
INFO:vcontact2: Loading proteins...
INFO:vcontact2: Merging ProkaryoticViralRefSeq85-Merged to user gene-to-genome mapping...
[mclIO] writing <VirSorted_Outputs/merged.self-diamond.tab.mci>
.......................................
[mclIO] wrote native interchange 2396x2396 matrix with 9076 entries to stream <VirSorted_Outputs/merged.self-diamond.tab.mci>
[mclIO] wrote 2396 tab entries to stream <VirSorted_Outputs/merged.self-diamond.tab_mcxload.tab>
[mcxload] tab has 2396 entries
[mclIO] reading <VirSorted_Outputs/merged.self-diamond.tab.mci>
.......................................
[mclIO] read native interchange 2396x2396 matrix with 9076 entries
[mcl] pid 84467
 ite -------------------  chaos  time hom(avg,lo,hi) m-ie m-ex i-ex fmv
  1  ...................   4.19  0.01 0.96/0.36/1.80 1.43 1.42 1.42   2
  2  ...................   6.44  0.01 0.88/0.40/1.86 1.27 1.04 1.49   9
  3  ...................   7.54  0.01 0.87/0.29/2.13 1.24 0.85 1.26   9
  4  ...................   6.00  0.01 0.87/0.40/3.06 1.17 0.81 1.02   5
  5  ...................   4.43  0.00 0.88/0.43/1.50 1.04 0.79 0.80   3
  6  ...................   1.80  0.00 0.89/0.50/1.31 1.02 0.80 0.64   1
  7  ...................   1.13  0.00 0.90/0.53/1.00 1.01 0.82 0.53   0
  8  ...................   1.06  0.00 0.93/0.53/1.00 1.00 0.87 0.46   0
  9  ...................   0.79  0.00 0.96/0.63/1.00 1.00 0.86 0.40   0
 10  ...................   0.92  0.00 0.98/0.66/1.00 1.00 0.88 0.35   0
 11  ...................   0.42  0.00 0.99/0.71/1.00 1.00 0.90 0.31   0
 12  ...................   0.24  0.00 0.99/0.78/1.00 1.00 0.93 0.29   0
 13  ...................   0.25  0.00 1.00/0.76/1.00 1.00 0.96 0.28   0
 14  ...................   0.25  0.00 1.00/0.76/1.00 1.00 0.98 0.27   0
 15  ...................   0.25  0.00 1.00/0.77/1.00 1.00 0.99 0.27   0
 16  ...................   0.18  0.00 1.00/0.82/1.00 1.00 1.00 0.27   0
 17  ...................   0.04  0.00 1.00/0.96/1.00 1.00 1.00 0.27   0
 18  ...................   0.00  0.00 1.00/1.00/1.00 1.00 1.00 0.27   0
 19  ...................   0.00  0.00 1.00/1.00/1.00 1.00 1.00 0.27   0
[mcl] jury pruning marks: <99,99,99>, out of 100
[mcl] jury pruning synopsis: <99.0 or perfect> (cf -scheme, -do log)
[mcl] output is in VirSorted_Outputs/merged.self-diamond.tab_mcl20.clusters
[mcl] 670 clusters found
[mcl] output is in VirSorted_Outputs/merged.self-diamond.tab_mcl20.clusters

Please cite:
    Stijn van Dongen, Graph Clustering by Flow Simulation.  PhD thesis,
    University of Utrecht, May 2000.
       (  http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf
       or  http://micans.org/mcl/lit/svdthesis.pdf.gz)
OR
    Stijn van Dongen, A cluster algorithm for graphs. Technical
    Report INS-R0010, National Research Institute for Mathematics
    and Computer Science in the Netherlands, Amsterdam, May 2000.
       (  http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z
       or  http://micans.org/mcl/lit/INS-R0010.ps.Z)

INFO:vcontact2: Loading the clusters (this may take some time...)
INFO:vcontact2: Saving intermediate files...


----------------------------------Loading data----------------------------------
INFO:vcontact2: Read 2340 entries (dropped 2765 singletons) from VirSorted_Outputs/vConTACT_profiles.csv


--------------------------------Adding Taxonomy---------------------------------


------------------------Calculating Similarity Networks-------------------------


------------------------Contig Clustering & Affiliation-------------------------
INFO:vcontact2:         entries  reference_entries  classified_entries  reference_classes  ...  precision  recall  specificity  fmeasure
order      2879               1821                   0                  2  ...        0.0     0.0          1.0       NaN
genus      2879                942                   0                266  ...        0.0     0.0          1.0       NaN
family     2879               1960                   0                 22  ...        0.0     0.0          1.0       NaN

[3 rows x 10 columns]
ERROR:vcontact2: Error in viral clusters
ERROR:vcontact2: cannot set a frame with no defined columns
Traceback (most recent call last):
  File "/miniconda3/bin/vcontact", line 622, in main
    vc = vcontact.cluster_refinements.ViralClusters(gc.contigs, profiles_fp, optimize=options.optimize)
  File "/miniconda3/lib/python3.7/site-packages/vcontact/cluster_refinements.py", line 106, in __init__
    evaluations = vcontact.evaluations.Evaluations(adj_contigs, levels=['genus'],  focus='rev_pos_cluster')
  File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 42, in __init__
    clustering_wise_ppv, clustering_wise_sensitivity, accuracy = self.performance_metrics(contingency_tbl)
  File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 59, in performance_metrics
    ppv_tbl = self.calc_ppv(contingency_table)
  File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 111, in calc_ppv
    counts.loc['sum'] = counts.sum(axis=0)  # adds a new ROW with sum of column
  File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 205, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 406, in _setitem_with_indexer
    return self._setitem_with_indexer_missing(indexer, value)
  File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 635, in _setitem_with_indexer_missing
    raise ValueError("cannot set a frame with no defined columns")
ValueError: cannot set a frame with no defined columns


--------------------------------Protein modules---------------------------------


---------------------------Link modules and clusters----------------------------


----------------------------Exporting results files-----------------------------
ERROR:vcontact2: Error in exporting the final summary data: 'NoneType' object has no attribute 'contigs'

Thanks in advance!

Best, Shengwei

Comments (3)

  1. Ben Bolduc

    Hi Shengwei,

    Thanks for reporting this. This is a known bug with the resume functionality. Setting a fresh run, with either the original inputs or using legacy inputs (pcs, profiles, contigs) usually “fixes” this problem.

    Cheers,

    Ben

  2. Ben Bolduc

    Known bug, minor priority (as it's usually easily fixed by re-running). Future major versions may include a fix.

  3. Log in to comment