console utils for NT: BOM instead of national letters

Issue #333 new
Yuri Safonov created an issue

Console utility generates an error message, where instead of letters appear character codes EF BF BD.

example: rm.exe file-not-found

rm: file-not-found: пїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅ.

Comments (3)

  1. Yuri Safonov reporter

    char *s="\xEF\xF0\xE8\xE2\xE5\xF2"; // word "привет" fprint(2,"%s",s); // bad: get "пїЅпїЅпїЅпїЅ"

    but fprint(2,"\xEF\xF0\xE8\xE2\xE5\xF2\n"); // ok: get "привет"

  2. Charles Forsyth

    WIndows support for Unicode from the command line seems to be generally troublesome: http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how and that's just one page but there are several others that turn up in Google where I can't find one that works out how to get everything to work correctly. Java has similar trouble with Unicode characters in arguments and console output. Basically, UTF-8 is still not properly supported.

    The fprint one is a little odd, if there's a difference between the two contexts for literal strings. I'd have expected the problem to be Windows (Visual Studio's) assumption that the source text is in the current "code page" (nicely 1980s) unless it is in UTF-16 with or without BOM. It can only be UTF-8 if it has a BOM, even though the whole point of UTF-8 is that there isn't a byte order as such. (Obviously they use it as an elaborate flag.) http://www.nubaria.com/en/blog/?p=289 discusses some of the problems.

    I'll leave it open for now, since there might be a complete scheme to fix it that's not well known, although it doesn't look very hopeful.

  3. Log in to comment