- changed status to resolved
"pandas.errors.ParserError"
Issue #70
resolved
Received this error while running vContact2:
-------------------------------Protein clustering-------------------------------
Traceback (most recent call last):
File "/users/bio3newbef/miniconda3/envs/vContact2/bin/vcontact2", line 815, in <module>
main(options)
File "/users/bio3newbef/miniconda3/envs/vContact2/bin/vcontact2", line 472, in main
gene2genome_df = pd.read_csv(usr_gene2genome_fp, sep=',', header=0) # Don't rely on pandas to identify
File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read
data = parser.read(nrows)
File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "/users/bio3newbef/miniconda3/envs/vContact2/lib/python3.8/site-packages/pandas/io/parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 1572668, saw 4
I don’t know much about pandas but I assumed it was to do with the input csv file. I printed line 1573668:
HKINOCCC_30195,B_1977_r3-phage_3,3',5'-cyclic adenosine monophosphate phosphodiesterase CpdA
And saw that the protein name contains an additional comma. There are a couple more like this within the csv file.
My question is, if I remove the comma from the csv file., do I need to make sure the header names within the proteins.faa file match exactly?
Thanks
Comments (1)
-
reporter - Log in to comment
There were commas in the protein names in the protein file