Issue #33 resolved

UnicodeDecodeError when entering password with unicode characters

Gamesh
created an issue
 #!python
    ** Mercurial version (2.8.1).  TortoiseHg version (2.10.1)
    ** Command: 
    ** CWD: Z:\ituvs\jusurb
    ** Encoding: cp1257
    ** Extensions loaded: purge, mercurial_keyring
    ** Python version: 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)]
    ** Windows version: sys.getwindowsversion(major=6, minor=2, build=9200, platform=2, service_pack='')
    ** Processor architecture: x64
    ** Qt-4.8.4 PyQt-4.10.2 QScintilla-2.7.2
    Traceback (most recent call last):
      File "tortoisehg\hgqt\thread.pyo", line 269, in run
      File "tortoisehg\util\hglib.pyo", line 708, in dispatch
      File "mercurial\dispatch.pyo", line 806, in _dispatch
      File "mercurial\dispatch.pyo", line 585, in runcommand
      File "mercurial\dispatch.pyo", line 897, in _runcommand
      File "mercurial\dispatch.pyo", line 868, in checkargs
      File "mercurial\dispatch.pyo", line 803, in <lambda>
      File "mercurial\util.pyo", line 512, in check
      File "mercurial\commands.pyo", line 4590, in pull
      File "mercurial\hg.pyo", line 122, in peer
      File "mercurial\hg.pyo", line 102, in _peerorrepo
      File "mercurial\httppeer.pyo", line 238, in instance
      File "mercurial\httppeer.pyo", line 57, in _fetchcaps
      File "mercurial\httppeer.pyo", line 171, in _call
      File "mercurial\httppeer.pyo", line 118, in _callstream
      File "urllib2.pyo", line 406, in open
      File "urllib2.pyo", line 519, in http_response
      File "urllib2.pyo", line 438, in error
      File "urllib2.pyo", line 378, in _call_chain
      File "urllib2.pyo", line 890, in http_error_401
      File "mercurial\url.pyo", line 436, in http_error_auth_reqed
      File "hgext\mercurial_keyring.pyo", line 389, in basic_http_error_auth_reqed
      File "urllib2.pyo", line 865, in http_error_auth_reqed
      File "urllib2.pyo", line 878, in retry_http_basic_auth
      File "urllib2.pyo", line 406, in open
      File "urllib2.pyo", line 519, in http_response
      File "urllib2.pyo", line 438, in error
      File "urllib2.pyo", line 378, in _call_chain
      File "urllib2.pyo", line 890, in http_error_401
      File "mercurial\url.pyo", line 436, in http_error_auth_reqed
      File "hgext\mercurial_keyring.pyo", line 389, in basic_http_error_auth_reqed
      File "urllib2.pyo", line 865, in http_error_auth_reqed
      File "urllib2.pyo", line 878, in retry_http_basic_auth
      File "urllib2.pyo", line 406, in open
      File "urllib2.pyo", line 519, in http_response
      File "urllib2.pyo", line 438, in error
      File "urllib2.pyo", line 378, in _call_chain
      File "urllib2.pyo", line 890, in http_error_401
      File "mercurial\url.pyo", line 436, in http_error_auth_reqed
      File "hgext\mercurial_keyring.pyo", line 389, in basic_http_error_auth_reqed
      File "urllib2.pyo", line 865, in http_error_auth_reqed
      File "urllib2.pyo", line 871, in retry_http_basic_auth
      File "hgext\mercurial_keyring.pyo", line 383, in find_user_password
      File "hgext\mercurial_keyring.pyo", line 268, in find_auth
      File "hgext\mercurial_keyring.pyo", line 121, in set_http_password
      File "keyring\core.pyo", line 42, in set_password
      File "keyring\backends\Windows.pyo", line 146, in set_password
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 6: ordinal not in range(128)

Comments (15)

  1. Jason R. Coombs

    Marcin, I believe the issue does indeed lie with mercurial_keyring or with mercurial.

    For set_password, the keyring library requires the password to be passed as a unicode string or as a byte string which can be decoded as unicode using the system's default encoding. In the case above, the password solicited by ui.getpass() (mercurial_keyring.py:260) is a byte string and can't be decoded in ascii (the default encoding).

    Keyring is doing the best it can here with the inputs.

    I believe the proper fix here is for mercurial itself to solicit passwords as unicode or mercurial_keyring should use whatever guarantees mercurial provides about the password to present the password as unicode to keyring.

    Please note that I am not trying to pass the buck. I believe the proper fix must be implemented in mercurial or mercurial_keyring, as the current implementation of keyring is as designed (and explicitly documented in the unit tests which apply to all keyrings).

    I'm happy to help put together a patch for mercurial keyring or mercurial, but I'm particularly eager to hear your feedback given the above information.

  2. Marcin Kasperski repo owner

    the keyring library requires the password to be passed as a unicode string or as a byte string which can be decoded as unicode using the system's default encoding

    This is first time I got anybody to claim anything about keyring requirements for password encoding (and I asked in the past...). Still, this definition is completely vague, considering the concept of „default encoding” is simply unclear. In particular, I strongly believe that on windows user's default encoding on system level is not ascii (if it were ascii, he or she would not be able to specify non-ascii character while prompted for password)

    mercurial_keyring should use whatever guarantees mercurial provides about the password to present the password as unicode to keyring

    I do not know about any such „guarantees”.

    My source for the password provided by the user is ui.getpass method, which simply returns byte string with no information attached, that's all I have. In fact,, in mercurial code this method is simply tiny wrapper around standard getpass.getpass which does just that.

    (To make picture more confuding, TortoiseHg substututes ui.getpass with it’s GUI implementation IIRC)

    There simply is no place I could use to get password encoding from.

  3. Marcin Kasperski repo owner

    And to summarize my point.

    1. Keyring (the library) accepts binary string, I see no reason it insists on recoding/decoding/doing anything with this string. It should simply and just return whatever it was fed with. And, it seems to me that many backends behave so. Vault is different for unclear reason.

    2. I won't attempt to guess password encoding, this is the cure worse than the disease.

    3. It is of course possible to base64 passwords, or hex them, but I feel keyring is proper place for such ideas. Because:

    a) non-ascii passwords are not only problems of mercurial, they may happen anywhere where keyring library is used

    b) various backends have also - undefined - length limitations

    c) at least some backends are handling non-ascii passwords fine and there is no need to modify the way they behave

    d) some backends save passwords in the way which make them possible to be recovered via GUIs, so keeping password in original form if only possible make sense

  4. Gamesh reporter

    everything sounds solid and valid arguments, but shouldn't the front-end - tortoise, in this case handle at least the error and output to user a message saying "your password cannot have Unicode chars" or something, instead of producing an error with a back-trace :)

  5. Marcin Kasperski repo owner

    well, keyring (the library) should throw sensible specific exception in the first place.

    Note also, once more, that IMHO there is no need to throw any exception. I do not understand why vault backend attempts decoding the string instead of simply saving (and later returning) whatever it got.

  6. Jason R. Coombs

    In order to be a good citizen, keyring is adopting the position that Microsoft adopted a decade ago and which Python adopted with Python 3 - that user-input strings (a.k.a. text) are Unicode, including passwords.

    For this reason, Keyring now expects and prefers Unicode as inputs for passwords. The test suite guarantees that Unicode passwords work on all keyring backends. This approach is the only viable approach, as some backends can store Unicode text natively (Windows.Vault) while others store only bytes. Keyring can encode to bytes for those that require it and store text natively for those that support it, but it can't go the other way. This point is the key point: keyring intends to be a password store, not an arbitrary byte string store.

    Mercurial is still behind in this regard. Since it runs on Python 2, it uses bytes and Unicode interchangeably. As a result, mercurial_keyring gets binary byte streams as passwords (as you've pointed out). You've also implied that Mercurial provides no guarantees about the password other than it's a byte stream. I consider that a defect in Mercurial. Only Mercurial has control over how the passwords are entered, so is the only library in a suitable position to properly decode bytes to text.

    However, assuming that Mercurial is going to be providing byte strings and keyring expects Unicode text, it seems most appropriate to me that the intermediary adapting library, mercurial_keyring, should be responsible for accounting for that disparity. I agree and sympathize with your plight. I agree that guessing the encoding is an untenable proposition. Encoding as base64 or similar also seems like a horrible hack. It would also mean that the passwords stored would not be readily usable outside of Mercurial, which is an obvious non-goal. However, given the goals of keyring, it would not be appropriate to accept bytes.

    The more I think about this issue, the more I'm inclined to file a ticket upstream and to mark it as wontfix for mercurial_keyring and keyring and require users to enter their Mercurial passwords as ascii until the upstream issue is fixed.

  7. Jason R. Coombs

    According to the response in the Mercurial bug tracker, user-input, such as passwords, is encoded using the "local" encoding and can be transformed to "utf-8" using mercurial.encoding.fromlocal. Therefore, it should be straightforward to transform the password provided by mercurial using:

    password = mercurial.encoding.fromlocal(password_from_mercurial).decode('utf-8')
    

    I suggest re-opening this issue.

  8. Marcin Kasperski repo owner

    OK, I added password recoding as suggested above (and reverse operation on password read). As I hope it won't spoil ascii passwords, I released 0.6.2 with this fix.

    Testers having non-ascii passwords welcome...

  9. Log in to comment