pandas.errors.EmptyDataError: No columns to parse from file

Issue #24 resolved
Former user created an issue

Hi Ben,

I just install the latest version vContact2, and get this error when testing the test data:

------------------------Contig Clustering & Affiliation------------------------- INFO:vcontact2.contig_clusters: Exporting for ClusterONE INFO:vcontact2.contig_clusters: Clustering the PC Similarity-Network using ClusterONE INFO:vcontact2.contig_clusters: Running clusterONE: java -jar /root/miniconda3/bin/cluster_one-1.0.jar VirSorted_Outputs/c1.ntw --input-format edge_list --output-format csv --min-density 0.3 --min-size 2 --max-overlap 0.9 --penalty 2.0 --haircut 0.55 --merge-method single --similarity match --seed-method nodes > VirSorted_Outputs/c1.clusters Exception in thread "main" java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.lang.Double.parseDouble(Double.java:538) at uk.ac.rhul.cs.cl1.io.EdgeListReader.readGraph(EdgeListReader.java:46) at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.loadGraph(CommandLineApplication.java:311) at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.run(CommandLineApplication.java:147) at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.main(CommandLineApplication.java:321) ERROR:vcontact2: Error in contig clustering ERROR:vcontact2: No columns to parse from file Traceback (most recent call last): File "/root/miniconda3/envs/vContact2/bin/vcontact2", line 618, in main mode=args.vc_mode) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact2/contig_clusters.py", line 92, in init self.cluster_one, self.one_opts) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact2/contig_clusters.py", line 227, in one_cluster return self.load_one_clusters(fi_clusters) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact2/contig_clusters.py", line 318, in load_one_clusters clusters_df = pd.read_csv(one_fn, header=0) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f return _read(filepath_or_buffer, kwds) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read parser = TextFileReader(fp_or_buf, kwds) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/root/miniconda3/envs/vContact2/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file

The command is "vcontact2 --raw-proteins test_data/VIRSorter_genome.faa --rel-mode 'Diamond' --proteins-fp test_data/VIRSorter_genome_g2g.csv --db 'ProkaryoticViralRefSeq97-Merged' --pcs-mode MCL --vcs-mode ClusterONE --c1-bin /root/miniconda3/bin/cluster_one-1.0.jar --output-dir VirSorted_Outputs -t 20".

I also install pandas 0.25.3 again using the command "conda install -y -c conda-forge pandas=0.25.3".

The problem has not be fixed. Can you give some suggestions to fix?

Best,

Jun Liu

Comments (7)

  1. Giesela Goergens

    Hi Ben and Jun Liu,

    I have the same issue like Jun Liu.

    When I am running vContact2 with --db 'ProkaryoticViralRefSeq94-Merged' everthing works out fine. If I am changing to 'ProkaryoticViralRefSeq97-Merged' I get the same Error like above:

    ------------------------Contig Clustering & Affiliation-------------------------
    INFO:vcontact2.contig_clusters: Exporting for ClusterONE
    INFO:vcontact2.contig_clusters: Clustering the PC Similarity-Network using ClusterONE
    INFO:vcontact2.contig_clusters: Running clusterONE: java -jar /home/xx/miniconda3/envs/vContact2/lib/python3.8/site-packages/vcontact2/cluster_one-1.0.jar /mnt/xio/xx/xxx/c1.ntw --input-format edge_list --output-format csv --min-density 0.3 --min-size 2 --max-overlap 0.9 --penalty 2.0 --haircut 0.55 --merge-method single --similarity match --seed-method nodes > /mnt/xio/xx/xxx/c1.clusters
    Exception in thread "main" java.lang.NumberFormatException: empty String
    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
    at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.base/java.lang.Double.parseDouble(Double.java:543)
    at uk.ac.rhul.cs.cl1.io.EdgeListReader.readGraph(EdgeListReader.java:46)
    at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.loadGraph(CommandLineApplication.java:311)
    at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.run(CommandLineApplication.java:147)
    at uk.ac.rhul.cs.cl1.ui.cmdline.CommandLineApplication.main(CommandLineApplication.java:321)
    ERROR:vcontact2: Error in contig clustering
    ERROR:vcontact2: No columns to parse from file
    Traceback (most recent call last):
    File "/home/lk/miniconda3/envs/vContact2/bin/vcontact2", line 615, in main
    gc = vcontact2.contig_clusters.ContigCluster(pcp, output_dir, cluster_one_fp, cluster_one_args,
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 91, in init
    self.clusters, self.cluster_results = self.one_cluster(os.path.join(self.folder, self.name),
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 227, in one_cluster
    return self.load_one_clusters(fi_clusters)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/vcontact2/contig_clusters.py", line 318, in load_one_clusters
    clusters_df = pd.read_csv(one_fn, header=0)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 895, in init
    self._make_engine(self.engine)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
    File "/home/lk/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1917, in init
    self._reader = parsers.TextReader(src, **kwds)
    File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.cinit
    pandas.errors.EmptyDataError: No columns to parse from file

    I seems that there is a problem in the ClusterOne .jar file or with the RefSeq97?

    Thank you for your help

    Best wishes

    Giesela

  2. Ben Bolduc

    Thank you Jun Liu and Giesela,

    Have either of you tried v201? It’s a relatively new DB that should be in the more recent versions of vContact2?

    If ClusterONE does work with other DBs, then I strongly suspect it's an issue with v97.

    I will double-check v97 and get back to you.

    Cheers,

    Ben

  3. wangya

    I'm helpless!!

    I have the same problem as you!

    Did you solve it in the end?

    Hope to get your reply!

    Best wishes

    wyx

  4. Ben Bolduc

    Wyx,

    Specify the full path to the ClusterONE java file, so basically add “cluster_one-1.0.jar” to the nearly full path you already have for “--c1-bin”. Optionally, you can place the ClusterONE jarfile in your $PATH.

    -Ben

  5. Migun Shakya

    Hi all, adding the full path in the command line didn't work for me. However, adding the ClusterONE jarfile to the $PATH worked. Also, I couldnt get it working with ProkaryoticViralRefSeq97-Mergedm but it worked with ProkaryoticViralRefSeq94-Merged.

    Thanks

  6. Ben Bolduc

    Thank you for the report. I rebuilt RefSeq97 not too long ago because a number of users also had issues with that version specifically, but it seems that may have been in vain.

    The 0.9.22 version includes a ClusterONE check to ensure vContact2 can 1) find and 2) use ClusterONE. Hopefully, that identifies issues before spending all that time/energy towards the end of the run.

  7. Log in to comment