Error with test data
Issue #5
resolved
Hi,
thanks for bringing up this great software. I just tried to run with the test data using singularity built from the def file in your repository, and run it with test data, got the following error, any idea?
============================This is vConTACT 0.9.11=============================
----------------------------------Pre-Analysis----------------------------------
INFO:vcontact2: Found ClusterONE: /usr/local/bin/cluster_one-1.0.jar
INFO:vcontact2: Found Diamond: /miniconda3/bin/diamond
INFO:vcontact2: Found MCL: /miniconda3/bin/mcxload
INFO:vcontact2: Identified 15 CPUs
INFO:vcontact2: Using reference database: ProkaryoticViralRefSeq85-Merged
INFO:vcontact2: Using existing directory VirSorted_Outputs.
------------------------------Reference databases-------------------------------
INFO:vcontact2: Identified existing 'merged.faa' in output path: re-using...
INFO:vcontact2: Re-using existing Diamond file...
-------------------------------Protein clustering-------------------------------
INFO:vcontact2: Loading proteins...
INFO:vcontact2: Merging ProkaryoticViralRefSeq85-Merged to user gene-to-genome mapping...
[mclIO] writing <VirSorted_Outputs/merged.self-diamond.tab.mci>
.......................................
[mclIO] wrote native interchange 2396x2396 matrix with 9076 entries to stream <VirSorted_Outputs/merged.self-diamond.tab.mci>
[mclIO] wrote 2396 tab entries to stream <VirSorted_Outputs/merged.self-diamond.tab_mcxload.tab>
[mcxload] tab has 2396 entries
[mclIO] reading <VirSorted_Outputs/merged.self-diamond.tab.mci>
.......................................
[mclIO] read native interchange 2396x2396 matrix with 9076 entries
[mcl] pid 84467
ite ------------------- chaos time hom(avg,lo,hi) m-ie m-ex i-ex fmv
1 ................... 4.19 0.01 0.96/0.36/1.80 1.43 1.42 1.42 2
2 ................... 6.44 0.01 0.88/0.40/1.86 1.27 1.04 1.49 9
3 ................... 7.54 0.01 0.87/0.29/2.13 1.24 0.85 1.26 9
4 ................... 6.00 0.01 0.87/0.40/3.06 1.17 0.81 1.02 5
5 ................... 4.43 0.00 0.88/0.43/1.50 1.04 0.79 0.80 3
6 ................... 1.80 0.00 0.89/0.50/1.31 1.02 0.80 0.64 1
7 ................... 1.13 0.00 0.90/0.53/1.00 1.01 0.82 0.53 0
8 ................... 1.06 0.00 0.93/0.53/1.00 1.00 0.87 0.46 0
9 ................... 0.79 0.00 0.96/0.63/1.00 1.00 0.86 0.40 0
10 ................... 0.92 0.00 0.98/0.66/1.00 1.00 0.88 0.35 0
11 ................... 0.42 0.00 0.99/0.71/1.00 1.00 0.90 0.31 0
12 ................... 0.24 0.00 0.99/0.78/1.00 1.00 0.93 0.29 0
13 ................... 0.25 0.00 1.00/0.76/1.00 1.00 0.96 0.28 0
14 ................... 0.25 0.00 1.00/0.76/1.00 1.00 0.98 0.27 0
15 ................... 0.25 0.00 1.00/0.77/1.00 1.00 0.99 0.27 0
16 ................... 0.18 0.00 1.00/0.82/1.00 1.00 1.00 0.27 0
17 ................... 0.04 0.00 1.00/0.96/1.00 1.00 1.00 0.27 0
18 ................... 0.00 0.00 1.00/1.00/1.00 1.00 1.00 0.27 0
19 ................... 0.00 0.00 1.00/1.00/1.00 1.00 1.00 0.27 0
[mcl] jury pruning marks: <99,99,99>, out of 100
[mcl] jury pruning synopsis: <99.0 or perfect> (cf -scheme, -do log)
[mcl] output is in VirSorted_Outputs/merged.self-diamond.tab_mcl20.clusters
[mcl] 670 clusters found
[mcl] output is in VirSorted_Outputs/merged.self-diamond.tab_mcl20.clusters
Please cite:
Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis,
University of Utrecht, May 2000.
( http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf
or http://micans.org/mcl/lit/svdthesis.pdf.gz)
OR
Stijn van Dongen, A cluster algorithm for graphs. Technical
Report INS-R0010, National Research Institute for Mathematics
and Computer Science in the Netherlands, Amsterdam, May 2000.
( http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z
or http://micans.org/mcl/lit/INS-R0010.ps.Z)
INFO:vcontact2: Loading the clusters (this may take some time...)
INFO:vcontact2: Saving intermediate files...
----------------------------------Loading data----------------------------------
INFO:vcontact2: Read 2340 entries (dropped 2765 singletons) from VirSorted_Outputs/vConTACT_profiles.csv
--------------------------------Adding Taxonomy---------------------------------
------------------------Calculating Similarity Networks-------------------------
------------------------Contig Clustering & Affiliation-------------------------
INFO:vcontact2: entries reference_entries classified_entries reference_classes ... precision recall specificity fmeasure
order 2879 1821 0 2 ... 0.0 0.0 1.0 NaN
genus 2879 942 0 266 ... 0.0 0.0 1.0 NaN
family 2879 1960 0 22 ... 0.0 0.0 1.0 NaN
[3 rows x 10 columns]
ERROR:vcontact2: Error in viral clusters
ERROR:vcontact2: cannot set a frame with no defined columns
Traceback (most recent call last):
File "/miniconda3/bin/vcontact", line 622, in main
vc = vcontact.cluster_refinements.ViralClusters(gc.contigs, profiles_fp, optimize=options.optimize)
File "/miniconda3/lib/python3.7/site-packages/vcontact/cluster_refinements.py", line 106, in __init__
evaluations = vcontact.evaluations.Evaluations(adj_contigs, levels=['genus'], focus='rev_pos_cluster')
File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 42, in __init__
clustering_wise_ppv, clustering_wise_sensitivity, accuracy = self.performance_metrics(contingency_tbl)
File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 59, in performance_metrics
ppv_tbl = self.calc_ppv(contingency_table)
File "/miniconda3/lib/python3.7/site-packages/vcontact/evaluations.py", line 111, in calc_ppv
counts.loc['sum'] = counts.sum(axis=0) # adds a new ROW with sum of column
File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 205, in __setitem__
self._setitem_with_indexer(indexer, value)
File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 406, in _setitem_with_indexer
return self._setitem_with_indexer_missing(indexer, value)
File "/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 635, in _setitem_with_indexer_missing
raise ValueError("cannot set a frame with no defined columns")
ValueError: cannot set a frame with no defined columns
--------------------------------Protein modules---------------------------------
---------------------------Link modules and clusters----------------------------
----------------------------Exporting results files-----------------------------
ERROR:vcontact2: Error in exporting the final summary data: 'NoneType' object has no attribute 'contigs'
Thanks in advance!
Best, Shengwei
Comments (3)
-
-
Hi Shengwei,
Thanks for reporting this. This is a known bug with the resume functionality. Setting a fresh run, with either the original inputs or using legacy inputs (pcs, profiles, contigs) usually “fixes” this problem.
Cheers,
Ben
-
- changed status to resolved
Known bug, minor priority (as it's usually easily fixed by re-running). Future major versions may include a fix.
- Log in to comment
Just a quick update. It worked with my own data, no idea.
Cheers,
Shengwei