Backslashes accumulate when saving/loading iteratively
When pybtex writes to a file or converts entries to a string, a number of substitutions are performed, e.g. “_” to “\_” or “~” to “\textascootilde”.
However, when the string is parsed or the file is loaded, these substitutions are NOT undone.
This illustrates the problem:
from pybtex.database import Entry, Person, BibliographyData, parse_file, parse_string
bd = BibliographyData()
e = Entry("misc", fields=dict(url="https:some.org/a?x=%2#x", keywords="as#sfdfd%dfdf_and~too"))
bd.add_entry("key1", e)
str1 = bd.to_string("bibtex")
bd2 = parse_string(str1, "bibtex")
print("URL Original: ", bd.entries["key1"].fields["url"])
print("URL ser/deser:", bd2.entries["key1"].fields["url"])
print("KW Original: ", bd.entries["key1"].fields["keywords"])
print("KW ser/deser:", bd2.entries["key1"].fields["keywords"])
This will output:
URL Original: https:some.org/a?x=%2#x
URL ser/deser: https:some.org/a?x=\%2\#x
KW Original: as#sfdfd%dfdf_and~too
KW ser/deser: as\#sfdfd\%dfdf\_and\textasciitilde too
Comments (5)
-
reporter -
reporter The bottom line is that any field that has one of the characters “&”, “%”, “_” or others that need backslash-escaping in latex will ACCUMULATE backslashes on each iteration of saving and loading (the additional backslashes are added on saving/serialization, nothing is changed on loading/deserialization).
I am a bit stumped how such a basic problem can still be in the code? -
reporter - changed title to Backslashes accumulate when saving/loading iteratively
-
reporter BTW this also happens when using
pybtex-convert
to repeatedly read in a bibtex file A, save it to B, then read B, save to C etc. At each iteration any underscore or hash character in a field will gain another backslash. -
@Johann Petrak , I’m currently using the following workaround:
#db.to_file(f_out, bib_format='bibtex') # <- what doesn't work correctly astr = db.to_string(bib_format='bibtex') astr = astr.replace("\\\\", "\\") astr = astr.replace("\_", "_") astr = astr.replace("\&", "&") astr = astr.replace("\%", "%") with open(f_out, "w") as f_handle: f_handle.write(astr)
- Log in to comment
This is really bad, because pybtex adds backslashes on each iteration of serialization and deserialization: