Checking of string inclusivity in _get_info-related dictionaries of formats using base64 encoded data is dangerous
_get_info_smart uses the
in operator to check to see if a particular string key is in the
info variable returned by
_get_info usually returns a
dict, except when it encounters a tag whose only purpose is to contain text, in which case it returns the tag's text instead of a
This is dangerous if the same string may be present either as a key or as the contents of a tag's text. This should be rare since the range of values tested for is small, only "binary" and "binaryDataArray". Unfortunately, rare events do happen.
In this example, one of the base64 encoded blobs in an mzML file contains the sequence "binary" in the encoded text:
The check in the code is:
if 'binary' in info:
To make the check handle this scenario:
if 'binary' in info and isinstance(info, dict):
Conceivably, this could happen anywhere we might receive a string instead of a dictionary, but it's most unpredictable around the base64 blobs.