parse_file cannot parse windows-1251 encoding

Issue #124 resolved
Former user created an issue

I try to parse the following bibtex file, which contains Cyrillic letters.

here is the file mybib.bib

% Encoding: windows-1251 @article{Demidov-2010, author = {Д.Е. Демидов and А.Е. Егоров and А.Н. Нуриев}, title = {Решение задач вычислительной гидродинамики с применением технологии Nvidia CUDA}, journal = {Ученые записки Казанского Государственного Университета}, year = {2010}, volume = {152}, pages = {142--154} }


The following code in Python 3.7

from pybtex.database import parse_file bib_data = parse_file('mybib.bib')

produces the error

PybtexError: 'utf-8' codec can't decode byte 0xc4 in position 64: invalid continuation byte.

If I open the file in Notepad++ and change encoding to utf-8, I see this byte in this position, but neither Bibtex nore Jabref give any error.

Is there any way to read Cyrillic with pybtex?

Comments (1)

  1. Andrey Golovizin

    Hi,

    Pybtex uses the UTF-8 encoding by default. It is generally easier to embrace the Unicode and use UTF-8 for all your source files. :)

    If you still need to use the legacy Windows-1251 encoding, you can tell Pybtex about that with the --encoding option:

    $ pybtex --encoding cp1251 <filename>
    

    Thanks for using Pybtex. :)

  2. Log in to comment