Issues

Issue #9915 resolved

Doubled wiki-pages and removal problems if name contains Russian «й» symbol.

yaroslavkharitonov
created an issue

Hello BitBucket Team!

Whenever I create wiki-page online with Russian «й» symbol in its name I’ve got immediate ‘uncommitted changes’ message after pulling changes locally. After commit and push I see doubled wiki-pages online and experiencing removal problems.

My issue looks very similar to «Can not remove file with wrong encoding from repo» opened about 10 hours ago: https://bitbucket.org/site/master/issue/9911/can-not-remove-file-with-wrong-encoding

Thanks!

Comments (7)

  1. Paul Phönixweiß

    +1, also sometimes this may affects any other uncommon latin characters such as German "ß", "ö", "ä", "ü".

    I think all the same problems is upon an encoding for non-UTF-8 characters.

    In fact, it's very difficult to control all the files in any repo with more than 1000+ files for a proper encoding on each file.

  2. yaroslavkharitonov reporter

    I'm not sure about "non-UTF-8 characters". Actually I noticed this bug anytime I used "интерфейс" ("interface") in wiki-page names, while there was no problem with other Russian words, e.g. "веб" ("web"). So I started to play around "интерфейс" string: created page with "и" - ok then renamed it to "ин" - ok "инт" - ok "инте" - ok "интер" - ok "интерф" - ok "интерфе" - ok "интерфей" - BUG!

    Created page "test_й" - BUG!

  3. Abhin Chhabra [Atlassian] staff

    Hi yaroslavkharitonov

    I was able to create a wiki on a repo with the name й.md and was able to make and push changes to it using both the web-ui and a terminal. I am using UTF-8 encoding on my computer. Please make sure your $LANG environment variable is set to en_US.UTF-8 and let me know if the error persists.

  4. yaroslavkharitonov reporter

    Hello Abhin Chhabra [Atlassian]

    Yes, It was ru_RU.UTF-8:

    $ locale
    LANG="ru_RU.UTF-8"
    LC_COLLATE="ru_RU.UTF-8"
    LC_CTYPE="ru_RU.UTF-8"
    LC_MESSAGES="ru_RU.UTF-8"
    LC_MONETARY="ru_RU.UTF-8"
    LC_NUMERIC="ru_RU.UTF-8"
    LC_TIME="ru_RU.UTF-8"
    LC_ALL=
    

    But changing it to en_US.UTF-8

    $ export LANG="en_US.UTF-8"
    $ locale
    LANG="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_CTYPE="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_ALL=
    

    didn't solve the problem - wiki-pages with "й" still double online after local pull&push.

  5. Abhin Chhabra [Atlassian] staff

    Hi yaroslavkharitonov

    We've been digging into this issue further and it seems that the problem is due to the way your machine is encoding the filename when it creates the new object (after you make a commit on your local machine). The root of the cause is really that git is not unicode aware.

    On our server, I can see two files in your last commit with the names t_\320\270\314\206.md and t_\320\271.md. If you look up these characters you'd notice that the first one is encoded as base-character+accent while the second one is encoded using the combined version.

    So basically the problem has to do with your environment encoding the above mentioned character differently from bitbucket itself. Since this problem has to do with your local machine (possibly your editor or your OS), I'm going to mark this issue as closed. I apologize for not being able to provide any further assistance on this.

    Abhin

  6. Log in to comment