"pandas.errors.ParserError"

Issue #70 resolved
Fiona Newberry created an issue

Received this error while running vContact2:

-------------------------------Protein clustering-------------------------------
Traceback (most recent call last):
  File "/users/bio3newbef/miniconda3/envs/vContact2/bin/vcontact2", line 815, in <module>
    main(options)
  File "/users/bio3newbef/miniconda3/envs/vContact2/bin/vcontact2", line 472, in main
    gene2genome_df = pd.read_csv(usr_gene2genome_fp, sep=',', header=0)  # Don't rely on pandas to identify
  File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2037, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 1572668, saw 4

I don’t know much about pandas but I assumed it was to do with the input csv file. I printed line 1573668:

HKINOCCC_30195,B_1977_r3-phage_3,3',5'-cyclic adenosine monophosphate phosphodiesterase CpdA

And saw that the protein name contains an additional comma. There are a couple more like this within the csv file.

My question is, if I remove the comma from the csv file., do I need to make sure the header names within the proteins.faa file match exactly?

Thanks

Comments (1)

  1. Log in to comment