1. Bitbucket
  2. Public Issue Tracker
  3. master
  4. Issues


Issue #2422 resolved

README.markdown and UTF-8 (BB-1092)

created an issue

For some reason the README parser has a slight hiccup when the README uses a UTF-8 (and perhaps other Unicode?) encoding.

On my particular README the first line uses the "#" syntax for a header and it doesn't get parsed correctly. If I switch the file to the ANSI encoding and push it to the repo the file gets parsed correctly.

See: https://bitbucket.org/seth/sandbox/src

Comments (4)

  1. Erik van Zijst staff

    Hi Seth,

    I cloned your public repo and saw that your README.markdone file does not really start with a '#'. In fact, the first 3 bytes of your utf-8 file contain the unicode codepoint 0xFEFF.

    Apparently 0xFEFF is the "Zero-Width No-Break Space" symbol, which I suppose explains why you would't see it on the screen (http://acronyms.thefreedictionary.com/ZWNBSP).

    Actually, if you look at the file's source page (https://bitbucket.org/seth/sandbox/src/1be43d012f6a/README.markdown) you can see the character taking up space (because that page uses a monospaced font :) ).

    To fix it, you can remove the first 3 bytes from the file.

    Cheers, Erik

  2. Erik van Zijst staff
    • changed status to open

    Hi Seth,

    I didn't realize 0xFEFF was used by unicode as BOM. It is not required or even recommended when utf8 is used as encoding and apparently may also confuse compilers and script shebang lines.

    Having said that (and having only just learned that!), having the unicode BOM in utf8 is not illegal and it would be nice if we would handle it more gracefully.

    I have raised an issue on our internal bugtracker for it.

    Cheers, Erik

  3. Log in to comment