TypeError: decoding Unicode is not supported

Issue #8 resolved
Mobus Dorphin created an issue

On Ubuntu 14.04 (did not have issue on CentOS 6, likely older version of wget), when trying to use the library to download anything, it raised the aforementioned typeerror. I noticed in the TODO list at the bottom that implementing Unicode in Linux is still in progress, but in the meantime, a try/except clause on line 222 solved (or at least avoided) the issue. Would it be possible to use that workaround temporarily?

Thanks

Comments (13)

  1. Mobus Dorphin reporter

    I researched this a little more fully using the یـــــــــاد case that initiated this change in the first place - it seems that this only throws the error in my case if the string we're trying to decode is already unicode. In that case, I chanted the try/except clause to an if isinstance(filename, unicode): clause instead. This way, it will handle this specific issue hopefully without breaking anything else.

  2. anatoly techtonik repo owner

    Can you be more specific about your case and standard case. Are you using API or CLI? Also, which Python version?

  3. Mobus Dorphin reporter

    CLI - Python 2.7. As far as the case, basically any time I pass a unicode string to download(), it would raise an exception at to_unicode() as the interpreter can't convert unicode to unicode. I submitted a pull request that just modifies to_unicode() to check if the string is unicode before trying to convert, and skipping conversion if it is already in the correct format.

  4. anatoly techtonik repo owner

    Does it work correctly to save the file? In particular I am not sure what urlparse.urlparse on 2.7 does if passed unicode argument.

  5. Mobus Dorphin reporter

    Yes, the file correctly saves any string I send to it, even a file such as یـــــــــاد.txt

  6. Mobus Dorphin reporter

    Granted, the standard output looks like this:

    #!100% [....................................................................................] 26 / 26
    
    '\xdb\x8c\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd9\x80\xd8\xa7\xd8\xaf.txt'
    

    However, the file saves correctly, and when issuing ls command, it correctly displays the filename of یـــــــــاد.txt

  7. anatoly techtonik repo owner

    Ok. Thanks.

    I need a test suite for all that stuff. And the biggest problem is that if I redirect stdout, the output settings will be broken. So it really needs to be something like QEMU or VirtualBox with per-pixel output matching.

  8. anatoly techtonik repo owner

    Can you create a separate issue for fixing unicode output on Ubuntu? Would be nice to have a complete example to inspect how the initial argument is passed.

  9. Log in to comment