'dtype' error

Issue #52 new
wangya created an issue

Hi Ben,

When I use test data to run vcontact2. I meet the following error.

my  command is :vcontact2 --raw-proteins /hl/YaxiangWang/miniconda3/envs/vContact2/MAVERICLab-vcontact2-1c9c0d7eed06/test_data/VIRSorter_genomes.faa -t 80 --rel-mode 'Diamond' --proteins-fp /hl/YaxiangWang/miniconda3/envs/vContact2/MAVERICLab-vcontact2-1c9c0d7eed06/test_data/VIRSorter_genomes_g2g.csv --db 'ProkaryoticViralRefSeq201-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /hl/YaxiangWang/miniconda3/envs/new_vContact2/bin/cluster_one-1.0.jar --output-dir vConTACT2_Results

The error screenshot is as follows:

error.txt :

Comments (3)

  1. Ben Bolduc

    Thanks for reporting this error. Can you let me know which version of NumPy and pandas you’re using? This could be an issue with using certain version combinations of NumPy and pandas, specifically older versions of pandas and newer versions of NumPy.

    Once you’ve activated your environment:

    $ conda list | grep -E 'numpy|pandas'
    

    -Ben

  2. wangya reporter

    Hi Ben,

    Thank you for your reply!

    My numpy and pandas versions are 1.20.1 and 0.25.3.

    I also found that there may be a problem with the numpy version.So I changed the version of numpy and pandas to 1.15.4 and 0.25.1.

    Surprisingly, I can run normally on the test data(MAVERICLab-vcontact2-1c9c0d7eed06/VIRSorter_genomes.faa) . But when I used my own data, another error occurred,

    My data is relatively large, with 600,000 contings, which contain 3 million protein sequences . So I doubt whether it is the reporting error because I have too large amount of data.

    Wish for your response, thank you!

    -wyx

  3. Ben Bolduc

    Hi Xyx,

    Thank you for letting me know that changing the NumPy version fixed the issue!

    Regarding the broken pipe… I don’t see enough of the history to see where this is occurring in the code. Broken pipes can occur for a lot of reasons… though 600K contigs is in the “grey” area where vConTACT2 may or may not work due to some technical limitations in the underlying python code.

    Can you try de-replicating the viral genomes to 95% ANI, 85% coverage using dRep or similar software and see if that lowers the numbers? Then give it another attempt?

    -Ben

  4. Log in to comment