vcontact2: Error in contig clustering 'nan'

Issue #77 resolved
Linda Smith created an issue

Getting this ‘nan’ key error during the calculating similarity networks step. Full traceback here:

------------------------Calculating Similarity Networks-------------------------
?[1;42mINFO?[1;0m:vcontact2.contig_clusters: Exporting for ClusterONE
?[1;42mINFO?[1;0m:vcontact2.contig_clusters: Clustering the PC Similarity-Network using ClusterONE
?[1;42mINFO?[1;0m:vcontact2.contig_clusters: 837 clusters loaded (singletons and non-connected nodes are dropped).
?[1;41mERROR?[1;0m:vcontact2: Error in contig clustering
?[1;41mERROR?[1;0m:vcontact2: 'nan'
Traceback (most recent call last):
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'nan'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/linda/programs/anaconda3/envs/vcontact2/bin/vcontact2", line 692, in main
gc = vcontact2.contig_clusters.ContigCluster(pcp, output_dir, cluster_one_fp, cluster_one_args,
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 91, in init
self.clusters, self.cluster_results = self.one_cluster(os.path.join(self.folder, self.name),
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 231, in one_cluster
return self.load_one_clusters(fi_clusters)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 344, in load_one_clusters
if pd.isnull(self.contigs.loc[n, "pos_cluster"]): # If never seen before
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1418, in getitem
return self._getitem_tuple(key)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexing.py", line 805, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexing.py", line 929, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1850, in _getitem_axis
return self._get_label(key, axis=axis)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexing.py", line 160, in _get_label
return self.obj._xs(label, axis=axis)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/generic.py", line 3737, in xs
loc = self.index.get_loc(key)
File "/home/linda/programs/anaconda3/envs/vcontact2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'nan'

------------------------Contig Clustering & Affiliation-------------------------

Package versions:

Name: pandas
Version: 0.25.3

============================This is vConTACT2 0.9.22============================

Command I am running:

vcontact2 --raw-proteins FD_proteome.faa --rel-mode 'Diamond' --proteins-fp protein_to_genome_vcontact2_mapping.csv --db 'ArchaeaViralRefSeq201-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /home/linda/programs/anaconda3/envs/vcontact2/bin/cluster_one-1.0.jar --output-dir ./vcontact2_output

Comments (2)

  1. Linda Smith reporter

    My issue was resolved by creating a conda env with these dependencies:

    conda create -n vcontact2 -c bioconda -c conda-forge clusterone=1.0 mcl=14.137 diamond=2.0.15 blast=2.12.0 vcontact2=0.11.0 numpy=1.19.5 pandas=0.25.3
    
  2. Log in to comment