Repository character encoding detection isn't accurate in some cases (BB-2979)

Karlo Bruni avatarKarlo Bruni created an issue

Dear BitBucket!

When i create folder "РегистрСведений1" create folder with name "–егистр—ведений1"

You can find this bug in this repo -

Thank you very much for response !

Comments (9)

  1. David Chambers

    Unless I'm mistaken, this is not a Bitbucket error. If a file's name is correctly encoded it will display correctly on Bitbucket. See, for example, davidchambers/i18n-test/src/35c52e8acd66/Семестр4.

    How was the folder created? If your operating system is letting you down you could use the mkdir command (or the Windows equivalent). This is how I created the folder displayed in the aforementioned link.

  2. Brodie Rao

    The problem here is that we're trying to convert the filename to Unicode, but our detection code thinks it's MacCyrillic instead of windows-1251.

    Detecting character encoding is inherently about making educated guesses, so we can't always get it right. When this happens, we do try to degrade gracefully, and I think we have in this case.

    That said, we can probably improve our character detection. For example, we could do character detection on an whole-repo level (using every filename), instead of at a per-filename/path level. I've filed an internal issue to take a look at doing this.

  3. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.