Git zip files with UTF-8 characters don't extract properly on Windows (BB-8425)

Issue #7178 closed
Jason Worley
created an issue

If a Git repository hosted on Bitbucket contains files/paths that use international characters such as "ä" and "ö", the corresponding file names are corrupted (e.g. "ö" gets replaced with "├╢") in a ZIP package download of the repository that Bitbucket provides.

I've created a public repository in Bitbucket that should show you the problem:

Comments (11)

  1. Brodie Rao

    I think this is dependent on the zip utility you're using. With Info-Zip unzip 5.52 (that comes with OS X 10.8), it incorrectly interprets the file paths in the zip file. However, if you use ditto (which is what OS X runs when you double click on a zip file), it unzips it correctly.

    From what I can see in Git's changelog, newer versions may produce zip files that extract properly with more zip utilities (not because it was generating the wrong thing, but because it adds extra metadata to work around bugs in those utilities). However, that version of unzip still doesn't work properly with the latest version of Git.

    I think the bottom line is that if you need those characters to be preserved properly, you should be using .tar.gz or .tar.bz2, not .zip.

  2. Kalle Immonen

    I created the original customer support query about this bug.

    Jason: I would have appreciated if you had somehow marked that you quoted that piece of text from me (the test repo is also mine).

    Brodie: I agree that it's not a bug in Bitbucket per se (as I understand it, it's also not a bug in unzip utilities, but a flaw in the ZIP file specification). But it's not up to me to decide to use tar.gz or tar.bz2 because Bitbucket doesn't provide those download formats!

  3. Brodie Rao

    Hi Kalle,

    While the download link on the repo overview page links to the zip file, we do in fact support .tar.gz and .tar.bz2. You can find links to those in the download section of your repo (under tags and branches). You can also just change the file extension of any download link from .zip to .tar.gz or .tar.bz2 and access it that way.

    Tar is encoding agnostic, and as far as I know, almost every tar extraction utility will use the path names in the tarball verbatim when creating those files on disk. The problem with Info-Zip's unzip is that it's trying to interpret the file paths, and it ends up garbling them in the process. Tar doesn't do that.

    Edit: I should that the current zip specification does allow for UTF-8 path names (but not any other encodings, and it is not encoding agnostic). The zip files Git outputs contain both ASCII path names and UTF-8 path names as extra metadata. The issue you're running into is specific to Info-Zip unzip. If you use another zip utility (like ditto or 7z), you'll see that it extracts the zip file correctly.

  4. Kalle Immonen

    Ah, good to know that you provide .tar.gz and .tar.bz2 as well, thanks.

    I don't know where you got the idea that I'm using Info-Zip's unzip, though? In fact, I'm using the latest version of 7-Zip on Windows, and it doesn't interpret the file names "correctly" in those ZIP files (but I guess "correct" doesn't really exist anyways). The tar.gz and tar.bz2 files don't work at all (it doesn't recognize the .tar as valid -- but .tar files without international characters in them open fine).

    At this point I've pretty much lost interest in this issue, it doesn't seem to be fixable in any good way. But I don't think the issue is quite as clear cut as you make it out to be.

    It seems the only solution for me at this point is to find a ZIP utility that lets me manually pick the character encoding to use.

  5. Brodie Rao
    • changed status to open

    Hi Kalle,

    Sorry about that. When Jason originally submitted the issue, I intended to mention unzip as an example of something that was broken. I should've asked you what utility you were using.

    It looks like the zip improvements in Git 1.8 should make the zip file work with 7-zip on Windows (and Windows' built-in zip utility). I'm going to upgrade our test servers shortly, and roll out the upgrade to the live site in a week or so. When it goes out, the zip files we generate should work on Windows.

  6. Log in to comment