UTF-16 little endian files are downloaded rather than displayed (BB-6918)

Issue #5648 resolved
Eric Knibbe created an issue

When working in AppleScript, using Script Editor to save a file as text generally encodes it in Latin-9, or Mac OS Roman if accented characters are used. But if any characters not covered by those sets are present, the text file is encoded as UTF-16LE. And if such a file is clicked on in Bitbucket, the file is downloaded rather than displayed and colourized inline.

As an example, here's a UTF-16LE file: https://bitbucket.org/EricFromCanada/ericfromcanada.bitbucket.org/src/55865a3ee2bf/applescript/close%20Safari%20Web%20Inspector.applescript

And its next revision, after conversion to UTF-8+BOM: https://bitbucket.org/EricFromCanada/ericfromcanada.bitbucket.org/src/51ba083a8253/applescript/close%20Safari%20Web%20Inspector.applescript

Comments (31)

  1. Ben Lachman

    I don't think trivial is the correct priority for this bug. Cocoa projects lose the ability to diff strings files because of this bug which makes the localization process super annoying.

  2. Ben Lachman

    Good link Eric. That definitely improves things. I'm kind of surprised Xcode doesn't auto convert these at this point since the feature has been around for quite a while.

  3. _dev_

    I would expect that normal UTF16 files (the ones with BOM) should definitely be classed as text files. In any case, file content is already examined when deciding what type it is.

  4. Jeff Gardner

    this is much more than a minor issue especially if you expect application to use externalized resource bundles for i18n.

  5. anwenkom Dev-Team

    We have a policy that all SQL files have to be UTF-16 and BBs current behavior is super annoying, I can't diff or view any of the SQL files in my repositories.

  6. Will Brown

    My organization was looking to move our Microsoft BizTalk Server codebase (which is developed in Visual Studio) into Bitbucket.org, but most of the BizTalk code artifacts are UTF-16 LE BOM... As long as we didn't attempt merges, we were ok, but as soon as we did, the code became corrupt and unreadable.

    I'm a Git noob (I pretty much rely on the SourceTree GUI) so I was hoping this was my fault, but it looks like I'm not alone in the UTF-16 woes...

  7. Scoopta

    All of my code is UTF-16BE and it'd be really nice to actually have a source view on the website. There is a mercurial extension for UTF-16 diffs. While I haven't used it that's primarily because I haven't needed to use diffs otherwise I'd probably give that a try but that doesn't fix the issues on bitbuckets end.

  8. Abhin Chhabra Account Deactivated

    Sorry for the wait. A fix for this has been merged and will be included in the next deploy.

  9. Abhin Chhabra Account Deactivated

    @Scoopta Yes. The fix checks for UTF-8, UTF-16-LE, UTF-16-BE, UTF-32-LE and UTF-32-BE BOMS.

  10. Abhin Chhabra Account Deactivated

    The fix has been deployed to production. The example link (in the bug report description) now works as expected. Bitbucket now respects the BOM in the file and doesn't consider files starting with the BOMs for UTF-8, UTF-16-LE, UTF-16-BE, UTF-32-LE and UTF-32-BE to be binary.

  11. Tom Kedem

    Hey, thanks a lot for finally fixing this :)

    However it's still not possible to view commit diff... I see a modified marker but it says File contents unchanged. while showing +0 and -0 lines added/removed, which is not the case. I expect those files having the same treatment as regular textual files.

  12. Abhin Chhabra Account Deactivated

    You're right @vToMy. But since that issue is unrelated to this one (this one was about the source view), I've created a separate ticket to track it (https://bitbucket.org/site/master/issues/13930/utf-16-and-utf-32-files-dont-show-up-in).

    In this case, the issue is that Git itself doesn't (by default) recognize UTF-16 files as text. In fact, running git show locally on a commit that updates a UTF-16 file also seems to claim that the 2 files are binary. As mentioned in that new ticket, it is possible to fix this, but it would have to be a separate piece of work.

  13. Nate Cook

    Dang, I thought this issue was about diffs. Just voted for the new issue. Hopefully it won't take 5 years to do that one. :-)

  14. Log in to comment