- changed status to resolved
query about parse_string vulnerabilities
Issue #94
resolved
I will be using pybtex.database.parse_string to parse bibtex entries from untrusted sources. I've been putting parse_string through its paces with various bad inputs etc.., and haven't seen any issues.
I'll will be doing some basic checking of code paths, but I wanted to just ask the developers about any known issues. Things I'm interested in is any state that is maintained between calls to parse_string, and any paths from parse_string that may invoke other services like the file system or system calls.
Thanks
Comments (2)
-
-
reporter Thanks a lot for the information that's really helpful.
- Log in to comment
No security issues in the parsing code that I know of. The actual parsing happens in bibtex.database.input.bibtex with additional name parsing in pybtex.database.Person. It is possible to disable name parsing by passing
person_fields=()
toparse_string()
if you don't need it. Anyway, there shouldn't be any content-dependent filesystem access or stuff like that. Each call toparse_string()
creates a new parser instance so there shouldn't be any state preserved between calls.There's also a more low level parser that just returns an AST without doing anything clever: pybtex.database.input.bibtex.LowLevelParser (used to be called
BibTeXEntryIterator
, I've renamed it for clarity). Might be enough for your needs and is easier to audit.