fuse-zip displays file names with invalid UTF-8 sequences

When mounting some ZIP archives containing files whose names have non-ASCII characters, fuse-zip generates invalid UTF-8 sequences.

Example with https://www.hueber.de/shared/audio/schritte-neu/011081_Schritte_Neu_Uebungsgrammatik_V2_Audiodateien.zip

‌

$ fuse-zip -V
fuse-zip version: 0.7.1
libzip version: 1.5.2
FUSE library version: 2.9.9
fusermount version: 2.9.9
using FUSE kernel interface version 7.19

$ echo $LANG
en_US.utf8

$ fuse-zip -r '011081_Schritte_Neu_Uebungsgrammatik_V2_Audiodateien.zip' mnt

$ ls -l mnt
total 0
drwxrwxr-x 60 root root 0 Feb  2  2017 '011081 Schritte Neu '$'\232''bungsgrammatik V2'

$ ls mnt | od -c -tx1
0000000   0   1   1   0   8   1       S   c   h   r   i   t   t   e    
         30  31  31  30  38  31  20  53  63  68  72  69  74  74  65  20
0000020   N   e   u     232   b   u   n   g   s   g   r   a   m   m   a
         4e  65  75  20  9a  62  75  6e  67  73  67  72  61  6d  6d  61
0000040   t   i   k       V   2  \n
         74  69  6b  20  56  32  0a
0000047

‌

Note that the U with Umlaut (Ü) is replaced by the byte <0x82>, which is an invalid UTF-8 sequence. The correct UTF-8 sequence would be <0xC3 0x9C>.

‌

In fuse-zip’s code, calls to zip_get_name use the ZIP_FL_ENC_RAW flag. Changing this flag to ZIP_FL_ENC_GUESS fixes the issue. With the attached patch:

‌

$ ls -l mnt
total 0
drwxrwxr-x 60 root root 0 Feb  2  2017 '011081 Schritte Neu Übungsgrammatik V2'

$ ls mnt | od -c -tx1
0000000   0   1   1   0   8   1       S   c   h   r   i   t   t   e    
         30  31  31  30  38  31  20  53  63  68  72  69  74  74  65  20
0000020   N   e   u     303 234   b   u   n   g   s   g   r   a   m   m
         4e  65  75  20  c3  9c  62  75  6e  67  73  67  72  61  6d  6d
0000040   a   t   i   k       V   2  \n
         61  74  69  6b  20  56  32  0a
0000050

‌

Comments (5)