UnicodeDecodeError

Issue #3 resolved
Prof Patsch
created an issue

I get this error with python2 & python3, in ipython and the normal interpreter:

In [1]: import gnupg
In [2]: gpg = gnupg.GPG()
In [3]: gpg.list_keys
Out[3]: <bound method GPG.list_keys of <gnupg.GPG object at 0x279e550>>
In [4]: gpg.list_keys()

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-141890d59f55> in <module>()
----> 1 gpg.list_keys()

/home/philip/.virtualenvs/test2/lib/python2.7/site-packages/gnupg.pyc in list_keys(self, secret)
   1060         self._collect_output(p, result, stdin=p.stdin)
   1061         lines = result.data.decode(self.encoding,
-> 1062                                    self.decode_errors).splitlines()
   1063         valid_keywords = 'pub uid sec fpr sub'.split()
   1064         for line in lines:

/home/philip/.virtualenvs/test2/lib/python2.7/encodings/utf_8.pyc in decode(input, errors)
     14 
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17 
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 89628: invalid continuation byte

Comments (15)

  1. Vinay Sajip repo owner

    It is possible to get this error if the keyring is somehow corrupted. To see where the problem is, please turn on logging and see and report here exactly what is read from the gpg process, and what encoding is used. What is the default encoding on your system? What OS are you using (looks like POSIX - is it Linux)? What happens if you set gpg.decode_errors = 'ignore'?

  2. Paul Wouters

    If you define all known gpg pub keyrings as broken, than yes, this is not a bug. I filed this bugs months ago on the old bug tracker and got the same response.

    Simple grab 20 random keys from a key server into a keyring and you'll see these failures. It's not hard to reproduce. My original fedora/epel package had:

    • return result.data.decode(self.encoding, self.decode_errors)
    • return result.data.decode(self.encoding, self.decode_errors)

    • return result.data

    I was hoping upstream would fix it properly, but currently this bug is impacting openpgpkey-milter and I'm tempted to re-instate this patch again.

  3. Vinay Sajip repo owner

    @Paul Wouters : Your patch will break backward compatibility, since the original code returns Unicode, whereas your code returns a byte-string. If the problem can be easily reproduced with 20 random keys from a key server, please post the keyring file which causes the problem, which should make it easier for me to track down the precise problem to see what the most appropriate solution is.

    I'm not trying to avoid the issue, and I've explained why I can't adopt your suggestion to just skip the decode step (which is what your patch does).

    OP hasn't yet answered my question about what happens if you set gpg.decode_errors = ignore. Can you answer it for your case?

  4. Vinay Sajip repo owner

    @Paul Wouters : The keys you sent seem to contain the byte-string sequence J\xf6\x72en, which is not valid utf-8 - so it's no wonder the decoding fails when that encoding is used. To decode correctly, the appropriate encoding would have to be used. I'm not sure what that is, but I would guess Latin-1 or a variant thereof, resulting in the Unicode u'Jörgen' (o with umlaut).

    Key identifiers are supposed to be text (i.e. Unicode), which is why the decoding step is there. The decoding can be controlled using gpg.encoding and gpg.decode_errors when needed.

  5. Paul Wouters

    but what's the point of throwing an error? anyone who receives a pubkey runs into this potential explosion. Why wouldn't you make ignore the default? What badness are you preventing by blowing up?

    sent from a tiny device

  6. Vinay Sajip repo owner

    but what's the point of throwing an error?

    Just the Zen of Python:

    Errors should never pass silently.

    Unless explicitly silenced.

    Looking into it further, it may be that passing --display-charset utf-8 on the command line might encourage gpg to produce utf-8 output. I'll investigate that soon.

  7. Vinay Sajip repo owner

    On GnuPG 1.4.x at least, --display-charset makes no difference. With the keyring that was causing @Paul Wouters a problem, adding gpg.encoding = 'iso-8859-1 caused no errors even with gpg.decode_errors set to strict. Can I close the issue now?

  8. Paul Wouters

    I still think being tolerant is the better default. it's a string error, try to convert it as best you can and don't explode. otherwise everyone is just going to need to set that default to ignore

    sent from a tiny device

  9. Vinay Sajip repo owner

    Alternatively, one could set the default encoding to iso-8859-1 (IIUC the default that gpg uses), as that should never throw a decoding error. That seems better, because if a user sets the encoding to something else, they should be told about errors.

  10. Log in to comment