query about parse_string vulnerabilities

Issue #94 resolved
noahdesu created an issue

I will be using pybtex.database.parse_string to parse bibtex entries from untrusted sources. I've been putting parse_string through its paces with various bad inputs etc.., and haven't seen any issues.

I'll will be doing some basic checking of code paths, but I wanted to just ask the developers about any known issues. Things I'm interested in is any state that is maintained between calls to parse_string, and any paths from parse_string that may invoke other services like the file system or system calls.

Thanks

Comments (2)

  1. Andrey Golovizin

    No security issues in the parsing code that I know of. The actual parsing happens in bibtex.database.input.bibtex with additional name parsing in pybtex.database.Person. It is possible to disable name parsing by passing person_fields=() to parse_string() if you don't need it. Anyway, there shouldn't be any content-dependent filesystem access or stuff like that. Each call to parse_string() creates a new parser instance so there shouldn't be any state preserved between calls.

    There's also a more low level parser that just returns an AST without doing anything clever: pybtex.database.input.bibtex.LowLevelParser (used to be called BibTeXEntryIterator, I've renamed it for clarity). Might be enough for your needs and is easier to audit.

  2. Log in to comment