Error in "Contig Clustering & Affiliation" when running by the slurm

Issue #48 new
Former user created an issue

Hi, I am using vcontact0.9.22 in the conda
I have meet something tough, I have run the process sucefully yesterday in the command line (after fixing thousands problems...) But today when I submit the script on the slurm ,It failed. I tried three times they all failed so I tried to run the process in the local ,then, success. I have no idea why this happen .

this is my shell command:

vcontact2 --raw-proteins ../6_prodigal/ERR1995212_virus_proteins.faa --rel-mode 'Diamond' --proteins-fp ERR1995212_virus_g2g.csv --db 'ProkaryoticViralRefSeq94-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /picb/evolgen/users/zhousumei/miniconda/envs/virus_als/bin/cluster_one-1.0.jar -o ERR1995212_vcontact_1 -t 70

Here is the error info of slurm

Please cite:
    Stijn van Dongen, Graph Clustering by Flow Simulation.  PhD thesis,
    University of Utrecht, May 2000.
       (  http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf
       or  http://micans.org/mcl/lit/svdthesis.pdf.gz)
OR
    Stijn van Dongen, A cluster algorithm for graphs. Technical
    Report INS-R0010, National Research Institute for Mathematics
    and Computer Science in the Netherlands, Amsterdam, May 2000.
       (  http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z
       or  http://micans.org/mcl/lit/INS-R0010.ps.Z)

ESC[1;42mINFOESC[1;0m:vcontact2: Building the cluster and profiles (this may take some tim
If it fails, try re-running using --blast-fp flag and specifiying merged.self-diamond.tab 
ESC[1;42mINFOESC[1;0m:vcontact2: Saving intermediate files...
ESC[1;42mINFOESC[1;0m:vcontact2: Read 233809 entries (dropped 2362 singletons) from ERR199
ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Exporting for ClusterONE
ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Clustering the PC Similarity-Network usin
ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Running clusterONE: java -jar /picb/evolg
Loaded graph with 2568 nodes and 95850 edges
[                    ]   0% Growing clusters from seeds...^M[                    ]   1% Gr
[                    ]   0% Growing clusters from seeds...^M[                    ]   0% Fi
[                    ]   0% Finding highly overlapping clusters...^M[                    ]
Detected 350 complexes
ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: 350 clusters loaded (singletons and non-c
ESC[1;42mINFOESC[1;0m:vcontact2.contig_clusters: Computing membership matrix...
ESC[1;41mERRORESC[1;0m:vcontact2: Error in viral clusters
ESC[1;41mERRORESC[1;0m:vcontact2: type object 'object' has no attribute 'dtype'
Traceback (most recent call last):
  File "/picb/evolgen/users/zhousumei/miniconda/envs/virus_als/bin/vcontact2", line 674, i
    vc = vcontact2.cluster_refinements.ViralClusters(gc.contigs, profiles_fp, optimize=opt
  File "/picb/evolgen/users/zhousumei/miniconda/envs/virus_als/lib/python3.8/site-packages
    self.metrics = pd.DataFrame(columns=summary_headers)
  File "/picb/evolgen/users/zhousumei/miniconda/envs/virus_als/lib/python3.8/site-packages
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/picb/evolgen/users/zhousumei/miniconda/envs/virus_als/lib/python3.8/site-packages
    val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
  File "/picb/evolgen/users/zhousumei/miniconda/envs/virus_als/lib/python3.8/site-packages
    dtype = dtype.dtype
AttributeError: type object 'object' has no attribute 'dtype'

Comments (3)

  1. zhousm

    Oh sorry, Running on the the command line also failed, error info is the same as before.(It takes long time to finish the whole process, I thought it will run with successful result mistakely so I created this issus,) sorry again for this mess. now question is become the “AttributeError: type object 'object' has no attribute 'dtype'”

  2. Ben Bolduc

    The error has been seen related to your version of pandas and numpy. I think for 0.9.22, you need to have 'numpy>=1.15.4,<=1.19.5'and'pandas>=0.25.0,<=0.25.3'.

    It’s possible your local machine (that worked) has a different installation than the one available through Slurm (an HPC?)

    -Ben

  3. Ivan Pchelin

    Hi, I also encountered a similar problem with vConTACT2 0.9.22. It says

    ------------------------Contig Clustering & Affiliation-------------------------
    INFO:vcontact2.contig_clusters: Exporting for ClusterONE
    INFO:vcontact2.contig_clusters: Clustering the PC Similarity-Network using ClusterONE
    INFO:vcontact2.contig_clusters: Running clusterONE: java -jar ./cluster_one-1.0.jar Vespunovirus_38_vcontact2_output/c1.ntw --input-format edge_list --output-format csv --min-density 0.3 --min-size 2 --max-overlap 0.9 --penalty 2.0 --haircut 0.55 --merge-method single --similarity match --seed-method nodes > Vespunovirus_38_vcontact2_output/c1.clusters
    Loaded graph with 36 nodes and 330 edges
    [====================] 100% Growing clusters from seeds...
    [====================] 100% Finding highly overlapping clusters...
    [====================] 100% Merging highly overlapping clusters...
    Detected 7 complexes
    INFO:vcontact2.contig_clusters: 7 clusters loaded (singletons and non-connected nodes are dropped).
    INFO:vcontact2.contig_clusters: Computing membership matrix...
    ERROR:vcontact2: Error in viral clusters
    ERROR:vcontact2: type object 'object' has no attribute 'dtype'
    Traceback (most recent call last):
    File "./vcontact2", line 714, in main
    vc = vcontact2.cluster_refinements.ViralClusters(gc.contigs, profiles_fp, optimize=options.optimize)
    File "/home/arcella/miniconda3/lib/python3.8/site-packages/vcontact2/cluster_refinements.py", line 37, in init
    self.metrics = pd.DataFrame(columns=summary_headers)
    File "/home/arcella/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 411, in init
    mgr = init_dict(data, index, columns, dtype=dtype)
    File "/home/arcella/miniconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 242, in init_dict
    val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
    File "/home/arcella/miniconda3/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1221, in construct_1d_arraylike_from_scalar
    dtype = dtype.dtype
    AttributeError: type object 'object' has no attribute 'dtype'

    Still, I see a meaningful file “c1.clusters” in the results. Can I just ignore the warning?

    Update: the shown output was obtained with numpy 1.22.3 and pandas 0.25.3. With numpy 1.19.5 and pandas 0.25.3, the analysis by re-installed vConTACT2 0.11.3 went smoothly, “c1.clusters” was identical to the previous one.

  4. Log in to comment