Something wrong with charset (utf-8 when locale isn't)

Issue #402 closed
Don created an issue

Latest Monit 5.19 from tar outputs monit summary in wrong encoding, probably utf-8, when locale is not set to utf-8. This makes terminal output not easy to interpret. Example (hostname is edited):

root@x [~]# monit summary
Monit uptime: 9m
Б■▄Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╛Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╛Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■░
Б■┌ Service Name                    Б■┌ Status                     Б■┌ Type          Б■┌
Б■°Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╓
Б■┌ x.xxxxxxxxxx.xx           Б■┌ Running                    Б■┌ System        Б■┌
Б■°Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╓
Б■┌ root                            Б■┌ Resource limit matched     Б■┌ Filesystem    Б■┌
Б■■Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╢Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╢Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■≤
root@x [~]# locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

Comments (5)

  1. Tildeslash repo owner

    1) On OS X we can set the terminal's Character Encoding in preferences. On Linux/Ubuntu I could not find any such preferences and I assume the Character Encoding used by the terminal is setup based on the locale settings. I tested and you are correct. If I put in my .profile export LANG=C the terminal does not support UTF8 character encoding and prints something like

    Screen Shot 2016-06-19 at 18.34.15.png

    2) If I change my profile to something like export LANG=nb_NO.UTF-8, the terminal does support UTF8 character encoding and output from Monit works as expected:

    Screen Shot 2016-06-19 at 18.36.00.png

    We'll see if Monit can be changed to better check if the terminal (TTY) supports UTF8 Character Encoding and if not, refine from printing tables. This is a minor inconvenience which can be worked around by setting up your terminal to permanently support UTF8 encoding as in 2) above or use the -B switch to Monit to prevent it from printing tabular output.

  2. Tildeslash repo owner

    There is no reliable way to determine a terminals character encoding from a program. Investigating locale settings by looking at environment variables such as LANG does not really say anything about the text encoding capabilities of the terminal.

    The setup is as follows, monit summary and monit procmatch will print UTF8 tabular characters to output data in a nice table. If your terminal is not configured to handle UTF8 encoding it is your responsibility to configure your terminal to support UTF8 character encoding.

    On modern systems (2000 and later) it is very unlikely that your terminal does not have the capability to handle UTF8 character encoding.

    In any case, it is possible to permanently turn out tabular output in Monit by putting this statement in your .monitrc file: set terminal batch. Ref terminal FAQ Entry

  3. Don reporter

    Correctly set up terminal usually have LANG, LC_ALL, or LC_CTYPE configured to indicate support of utf-8 output. So, even though terminal can not inform program of its charset support, it is possible to determine it indirectly. And, I think, derived LC_CTYPE value is enough. By 'derived' I meant determined value by locale algorithms, even if LC_CTYPE env is not set, like how glibc's locale program is doing.

    About modern systems. Yes, many modern terminals support utf-8, but, not all of them set to utf-8. There is still systems that use non-utf-8 charsets, like koi8-r/cp1251 in Russia, or eucjp/ujis in Japan. Putty or xfce4-terminal have charset encoding switching functionality in menu - why it have so if everybody supposedly is using UTF-8? Because, not everybody is using utf-8 even on capable terminals.

    On these non-utf-8, but still internationalized terminals supporting utf-8, utf-8 pseudo-graphics is not rendered correctly. And, better be output as ascii pseudographics with '-', '|','\', and '+'.

    Also, http://linux.die.net/man/3/setlocale

    The locale "C""" or "POSIX""" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.

    Utf-8 is definitely not 7-bit ascii.

  4. Don reporter

    Example program who is better supporting unicode graphics, but still supporting automatic fall back to ascii is lsblk. Output:

    root@test:~# lsblk
    NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    fd0                    2:0    1    4K  0 disk
    sda                    8:0    0 48.8G  0 disk
    ├─sda1                 8:1    0  500M  0 part /boot
    └─sda2                 8:2    0 48.3G  0 part
      ├─devel-root       253:0    0   40G  0 lvm  /
      └─devel-swap       253:1    0    4G  0 lvm  [SWAP]
    sr0                   11:0    1  603M  0 rom
    root@test:~# locale
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    root@test:~# unset LANG
    root@test:~# locale
    LANG=
    LC_CTYPE="POSIX"
    LC_NUMERIC="POSIX"
    LC_TIME="POSIX"
    LC_COLLATE="POSIX"
    LC_MONETARY="POSIX"
    LC_MESSAGES="POSIX"
    LC_PAPER="POSIX"
    LC_NAME="POSIX"
    LC_ADDRESS="POSIX"
    LC_TELEPHONE="POSIX"
    LC_MEASUREMENT="POSIX"
    LC_IDENTIFICATION="POSIX"
    LC_ALL=
    root@test:~# lsblk
    NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    fd0                    2:0    1    4K  0 disk
    sda                    8:0    0 48.8G  0 disk
    |-sda1                 8:1    0  500M  0 part /boot
    `-sda2                 8:2    0 48.3G  0 part
      |-devel-root       253:0    0   40G  0 lvm  /
      `-devel-swap       253:1    0    4G  0 lvm  [SWAP]
    sr0                   11:0    1  603M  0 rom
    
  5. Log in to comment