1. Bitbucket
  2. Public Issue Tracker
  3. master

Issues

Issue #5648 open

UTF-16 little endian files are downloaded rather than displayed (BB-6918)

Eric Knibbe
created an issue

When working in AppleScript, using Script Editor to save a file as text generally encodes it in Latin-9, or Mac OS Roman if accented characters are used. But if any characters not covered by those sets are present, the text file is encoded as UTF-16LE. And if such a file is clicked on in Bitbucket, the file is downloaded rather than displayed and colourized inline.

As an example, here's a UTF-16LE file: https://bitbucket.org/EricFromCanada/ericfromcanada.bitbucket.org/src/55865a3ee2bf/applescript/close%20Safari%20Web%20Inspector.applescript

And its next revision, after conversion to UTF-8+BOM: https://bitbucket.org/EricFromCanada/ericfromcanada.bitbucket.org/src/51ba083a8253/applescript/close%20Safari%20Web%20Inspector.applescript

Comments (20)

  1. Ben Lachman

    I don't think trivial is the correct priority for this bug. Cocoa projects lose the ability to diff strings files because of this bug which makes the localization process super annoying.

  2. Ben Lachman

    Good link Eric. That definitely improves things. I'm kind of surprised Xcode doesn't auto convert these at this point since the feature has been around for quite a while.

  3. _dev_

    I would expect that normal UTF16 files (the ones with BOM) should definitely be classed as text files. In any case, file content is already examined when deciding what type it is.

  4. Will Brown

    My organization was looking to move our Microsoft BizTalk Server codebase (which is developed in Visual Studio) into Bitbucket.org, but most of the BizTalk code artifacts are UTF-16 LE BOM... As long as we didn't attempt merges, we were ok, but as soon as we did, the code became corrupt and unreadable.

    I'm a Git noob (I pretty much rely on the SourceTree GUI) so I was hoping this was my fault, but it looks like I'm not alone in the UTF-16 woes...

  5. Log in to comment