Protein clustering : OverflowError
Hi all, First of all, thanks for the great tool.
I am trying to run the program on a set of 60k prophages but I get an error message at the "Protein clustering" step :
============================This is vConTACT2 0.9.19============================
----------------------------------Pre-Analysis----------------------------------
------------------------------Reference databases-------------------------------
-------------------------------Protein clustering-------------------------------
Traceback (most recent call last):
File "/home/conchae/.conda/envs/vContact2/bin/vcontact2", line 757, in <module>
main(options)
File "/home/conchae/.conda/envs/vContact2/bin/vcontact2", line 470, in main
pcs_fp, gene2genome_df, pcs_mode)
File "/home/conchae/.conda/envs/vContact2/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 187, in build_clusters
clusters_df, name, c = load_mcl_clusters(fp)
File "/home/conchae/.conda/envs/vContact2/lib/python3.7/site-packages/vcontact2/protein_clusters.py", line 249, in load_mcl_clusters
formatter = "PC_{{:>0{}}}".format(int(round(np.log10(nb_clusters))+1))
OverflowError: cannot convert float infinity to integer
I know that packages version can sometimes be an issue, but I can't spot anything wrong with the current versions :
conda list
# packages in environment at /home/user/.conda/envs/vContact2:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
biopython 1.78 py37h5e8e339_2 conda-forge
blas 1.1 openblas conda-forge
blast 2.5.0 hc0b0e79_3 bioconda
blosc 1.21.0 h9c3ff4c_0 conda-forge
boost 1.70.0 py37h9de70de_1 conda-forge
boost-cpp 1.70.0 h7b93d67_3 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.17.1 h7f98852_1 conda-forge
ca-certificates 2022.5.18.1 ha878542_0 conda-forge
certifi 2022.5.18.1 py37h89c1867_0 conda-forge
decorator 4.4.2 py_0 conda-forge
diamond 2.0.8 h56fc30b_0 bioconda
hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
icu 67.1 he1b5a44_0 conda-forge
joblib 1.0.1 pyhd8ed1ab_0 conda-forge
krb5 1.17.2 h926e7f8_0 conda-forge
ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge
libcurl 7.75.0 hc4aaa36_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 9.3.0 h2828fa1_18 conda-forge
libgfortran 3.0.0 1 conda-forge
libgfortran-ng 9.3.0 hff62375_18 conda-forge
libgfortran5 9.3.0 hff62375_18 conda-forge
libgomp 9.3.0 h2828fa1_18 conda-forge
libnghttp2 1.43.0 h812cca2_0 conda-forge
libssh2 1.9.0 ha56f1ee_6 conda-forge
libstdcxx-ng 9.3.0 h6de172a_18 conda-forge
lz4-c 1.9.3 h9c3ff4c_0 conda-forge
lzo 2.10 h516909a_1000 conda-forge
mcl 14.137 pl526h516909a_5 bioconda
mock 4.0.3 py37h89c1867_1 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
networkx 2.5 py_0 conda-forge
numexpr 2.7.1 py37h0da4684_1 conda-forge
numpy 1.19.0 pypi_0 pypi
openblas 0.3.3 ha44fe06_1 conda-forge
openssl 1.1.1k h7f98852_0 conda-forge
pandas 0.25.0 py37hb3f55d8_0 conda-forge
perl 5.26.2 h36c2ea0_1008 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
psutil 5.8.0 py37h5e8e339_1 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pytables 3.6.1 py37h56451d4_2 conda-forge
python 3.7.9 h7579374_0
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytz 2021.1 pyhd8ed1ab_0 conda-forge
readline 8.0 he28a2e2_2 conda-forge
scikit-learn 0.20.4 py37_blas_openblashebff5e3_0 [blas_openblas] conda-forge
scipy 1.2.0 py37_blas_openblashb06ca3d_200 [blas_openblas] conda-forge
setuptools 49.6.0 py37h89c1867_3 conda-forge
singularity 2.4.2 0 bioconda
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.35.2 h74cdb3f_0 conda-forge
threadpoolctl 2.1.0 pyh5ca1d4c_0 conda-forge
tk 8.6.10 h21135ba_1 conda-forge
vcontact2 0.9.19 py_0 bioconda
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.4.9 ha95c52a_0 conda-forge
Thanks in advance, Best, Robby
Comments (5)
-
-
hi, there,I have the same error like you
OverflowError: cannot convert float infinity to integer
so, I change the file
protein_clusters.py", line 249
as same as you. However I got the othe issues:KeyError: 'cluster'
?[1;42mINFO?[1;0m:vcontact2: Found Diamond: /home/xubotang2/miniconda3/envs/vContact2/bin/diamond
?[1;42mINFO?[1;0m:vcontact2: Found MCL: /home/xubotang2/miniconda3/envs/vContact2/bin/mcxload
?[1;42mINFO?[1;0m:vcontact2: Identified 4 CPUs
?[1;42mINFO?[1;0m:vcontact2: Using reference database: ProkaryoticViralRefSeq94-Merged
?[1;42mINFO?[1;0m:vcontact2: Using existing directory ./output.
?[1;42mINFO?[1;0m:vcontact2: Identified existing 'merged.faa' in output path: re-using...
?[1;42mINFO?[1;0m:vcontact2: Re-using existing Diamond file...
?[1;42mINFO?[1;0m:vcontact2: Loading proteins...
?[1;42mINFO?[1;0m:vcontact2: Merging ProkaryoticViralRefSeq94-Merged to user gene-to-genome mapping...
?[1;43mDEBUG?[1;0m:vcontact2: Read 333767 proteins from out_map.csv.
?[1;43mDEBUG?[1;0m:vcontact2: File merged.self-diamond.tab_mcl20.clusters exists and will be used. Use -f to overwrite.
?[1;42mINFO?[1;0m:vcontact2: Building the cluster and profiles (this may take some time...)
If it fails, try re-running using --blast-fp flag and specifying merged.self-diamond.tab (or merged.self-blastp.tab)
Traceback (most recent call last):
File "/home/xubotang2/miniconda3/envs/vContact2/bin/vcontact2", line 834, in <module>
main(options)
File "/home/xubotang2/miniconda3/envs/vContact2/bin/vcontact2", line 526, in main
protein_df, clusters_df, profiles_df, contigs_df = vcontact2.protein_clusters.build_clusters(
File "/home/xubotang2/miniconda3/envs/vContact2/lib/python3.8/site-packages/vcontact2/protein_clusters.py", line 209, in build_clusters
for clust, prots in gene2genome.groupby("cluster"):
File "/home/xubotang2/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/core/frame.py", line 6717, in groupby
return DataFrameGroupBy(
File "/home/xubotang2/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 560, in init
grouper, exclusions, obj = get_grouper(
File "/home/xubotang2/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 811, in get_grouper
raise KeyError(gpr)
KeyError: 'cluster'
Did you have the same issues?
Thank you !
-
Hi, I have the same issue as Jiaojiao, did someone can solve this issue?
Thank you!
Wilson
-
Hi all,
This is a recipe that worked for me at the end :
conda create -n vcontact2 python=3.7 conda activate vcontact2 conda install -c anaconda networkx=2.2 conda install -c anaconda numpy=1.15.4 conda install -c anaconda scipy=1.2.0 conda install -c anaconda pandas=1.0.5 conda install -c anaconda scikit-learn=0.20.2 conda install -c anaconda biopython=1.73 conda install -c anaconda hdf5=1.10.4 conda install -c anaconda pytables=3.6.1 conda install -c anaconda pyparsing=2.4.6 conda install -c bioconda diamond=0.9.14 conda install -c bioconda mcl=14.137 conda install -c bioconda blast=2.7 conda install -c bioconda clusterone conda install -y -c bioconda vcontact2
Then execute the following changes in the respective .py files :
AttributeError: 'DataFrame' object has no attribute 'ix' ==> Change ".ix" to ".loc" and it should work correctly. /home/user/.conda/envs/vcontact2/lib/python3.7/site-packages/vcontact/matrices.py line 70 /home/user/.conda/envs/vcontact2/lib/python3.7/site-packages/vcontact/modules.py line 252
Best
-
To whom it may concern,
I met same problem, while when I changed two columns to three of file “gene_to_genome.csv“, it was fixed.
like: Form “F1608-028contig-4748.fna_1,F1608-028contig-4748.fna“ to “F1608-028contig-4748.fna_1,F1608-028contig-4748.fna,None“
- Log in to comment
So this change in the file
protein_clusters.py", line 249
did the job :However, I landed on this error :
ERROR:vcontact2: 'DataFrame' object has no attribute 'ix'
Changing the pandas version to 0.25.3 led me to another error. I’ll try Jeffrey’s solution.
Best
Robby