filesystem statistic error with zfs -2.1.1 and kernel 5.13.19

Issue #1021 resolved
Val Kulkov created an issue

As of zfs -2.1.1 and Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100), /proc/spl/kstat/zfs/<poolname>/io files are no longer available. As a result, Monit cannot read information about filesystems and it complains with:

[2021-12-05T12:51:55-0500] error    : filesystem statistic error: cannot read /proc/spl/kstat/zfs/rpool/io -- No such file or directory
[2021-12-05T12:51:55-0500] error    : Filesystem '/' not mounted
[2021-12-05T12:51:55-0500] error    : 'root' unable to read filesystem '/' state
[2021-12-05T12:51:55-0500] info     : 'root' trying to restart

monit summary or status command report that the filesystem “Does not exist“.

The usual filesystem operations, like “ls -l /" or “df -h”, work normally.

Listing of /proc/spl/kstat/zfs/rpool/ reveals that the “io” file is indeed no longer present there.

Comments (16)

  1. Lutz Mader

    Hello,
    the status information used by monit
    "/proc/spl/kstat/zfs/<POOL>/io"
    are moved to
    "/proc/spl/kstat/zfs/<POOL>/iostats"
    I think, could you check this, please.

    I have no access to a proper Linux at the time.

    With regards,
    Lutz

  2. Val Kulkov reporter

    Hi Lutz Mader! Are you saying I should try the HEAD version from this repo and see if I still have the problem?

  3. Richard Bergoin

    here is a sample output of:

    # cat /proc/spl/kstat/zfs/rpool/iostats
    25 1 0x01 18 4896 34810932099 312628448789680
    name                            type data
    trim_extents_written            4    0
    trim_bytes_written              4    0
    trim_extents_skipped            4    0
    trim_bytes_skipped              4    0
    trim_extents_failed             4    0
    trim_bytes_failed               4    0
    autotrim_extents_written        4    0
    autotrim_bytes_written          4    0
    autotrim_extents_skipped        4    0
    autotrim_bytes_skipped          4    0
    autotrim_extents_failed         4    0
    autotrim_bytes_failed           4    0
    simple_trim_extents_written     4    0
    simple_trim_bytes_written       4    0
    simple_trim_extents_skipped     4    0
    simple_trim_bytes_skipped       4    0
    simple_trim_extents_failed      4    0
    simple_trim_bytes_failed        4    0
    

  4. Richard Bergoin

    HEAD won’t work, still looking for “/io”:

    https://bitbucket.org/tildeslash/monit/src/62a014d0f3943dd1a30324e3c894e33a0423110f/src/device/sysdep_LINUX.c#lines-372

                            } else if (IS(mnt->mnt_type, "zfs")) {
                                    // ZFS
                                    inf->filesystem->object.getDiskActivity = _getZfsDiskActivity;
                                    // Need base zpool name for /proc/spl/kstat/zfs/<NAME>/io lookup:
                                    snprintf(inf->filesystem->object.key, sizeof(inf->filesystem->object.key), "%s", inf->filesystem->object.device);
                                    Str_replaceChar(inf->filesystem->object.key, '/', 0);
    

  5. Lutz Mader

    Hello Val,
    a "ls" or "cat" to the "/proc/spl/kstat/zfs/<POOL>/iostats" and "/proc/spl/kstat/zfs/<POOL>" should enought.

    With regards,
    Lutz

  6. Val Kulkov reporter
    root@pve1:~# ls -l /proc/spl/kstat/zfs/rpool/
    total 0
    -rw-r--r-- 1 root root 0 Dec  5 14:57 dmu_tx_assign
    -rw-r--r-- 1 root root 0 Dec  5 14:57 iostats
    -rw-r--r-- 1 root root 0 Dec  5 14:57 multihost
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x112
    -rw-r--r-- 1 root root 0 Dec  7 08:53 objset-0x12
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x129
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x183
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x189
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x36
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x394
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x41c
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x483
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x4a
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x58
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x58c
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x707
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x86
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x8b2
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x8c2
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x909
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0x94
    -rw-r--r-- 1 root root 0 Dec  5 14:57 objset-0xab
    -rw------- 1 root root 0 Dec  5 14:57 reads
    -rw-r--r-- 1 root root 0 Dec  5 14:57 state
    -rw-r--r-- 1 root root 0 Dec  5 14:57 txgs
    
    root@pve1:~# cat /proc/spl/kstat/zfs/rpool/iostats 
    25 1 0x01 18 4896 3219827410 158782342259233
    name                            type data
    trim_extents_written            4    0
    trim_bytes_written              4    0
    trim_extents_skipped            4    0
    trim_bytes_skipped              4    0
    trim_extents_failed             4    0
    trim_bytes_failed               4    0
    autotrim_extents_written        4    0
    autotrim_bytes_written          4    0
    autotrim_extents_skipped        4    0
    autotrim_bytes_skipped          4    0
    autotrim_extents_failed         4    0
    autotrim_bytes_failed           4    0
    simple_trim_extents_written     4    0
    simple_trim_bytes_written       4    0
    simple_trim_extents_skipped     4    0
    simple_trim_bytes_skipped       4    0
    simple_trim_extents_failed      4    0
    simple_trim_bytes_failed        4    0
    

    I have realised just now that the information above is not relevant to the filesystem of my LXC container. My instance of Monit runs within an LXC container, which itself runs within a Proxmos VE hypervisor.

    The following is the relevant information about filesystems of the ‘102’ container:

    root@pve1:~# df -h | grep subvol-102
    rpool/data/subvol-102-disk-0          40G  9.3G   31G  24% /rpool/data/subvol-102-disk-0
    local-zfs-1/subvol-102-disk-0         40G  6.1G   34G  16% /local-zfs-1/subvol-102-disk-0
    rpool/data/subvol-102-disk-1          20G  562M   20G   3% /rpool/data/subvol-102-disk-1
    

  7. Richard Bergoin

    here is a “strace” of what `df -h /` does:

    stat("/", {st_mode=S_IFDIR|0755, st_size=24, ...}) = 0
    uname({sysname="Linux", nodename="mmonit", ...}) = 0
    statfs("/", {f_type=ZFS_SUPER_MAGIC, f_bsize=131072, f_blocks=131072, f_bfree=107218, f_bavail=107218, f_files=27501798, f_ffree=27447840, f_fsid={val=[960432164, 13924911]}, f_namelen=255, f_frsize=131072, f_flags=ST_VALID|ST_NOATIME}) = 0
    openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=26402, ...}) = 0
    mmap(NULL, 26402, PROT_READ, MAP_SHARED, 3, 0) = 0x7f4ba3133000
    close(3)                                = 0
    fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x2), ...}) = 0
    write(1, "Filesystem                    Si"..., 63Filesystem                    Size  Used Avail Use% Mounted on
    ) = 63
    write(1, "rpool/data/subvol-999-disk-0   1"..., 54rpool/data/subvol-999-disk-0   16G  3.0G   14G  19% /
    ) = 54
    

  8. Richard Bergoin

    Just saw the 5.30.0 is out, I’d love to contribute to fix this issue for 5.30.1 (or 5.31.0), maybe a more complete strace will be useful:

    # df -h /
    Filesystem                    Size  Used Avail Use% Mounted on
    rpool/data/subvol-999-disk-0   16G  3.1G   13G  19% /
    

    the full strace:

    # strace df -h /
    execve("/bin/df", ["df", "-h", "/"], 0x7ffd1af0a5c0 /* 22 vars */) = 0
    brk(NULL)                               = 0x5640c163f000
    access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=35243, ...}) = 0
    mmap(NULL, 35243, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f88295b0000
    close(3)                                = 0
    openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
    read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832
    fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f88295ae000
    mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f88293ed000
    mprotect(0x7f882940f000, 1658880, PROT_NONE) = 0
    mmap(0x7f882940f000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f882940f000
    mmap(0x7f8829557000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7f8829557000
    mmap(0x7f88295a4000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f88295a4000
    mmap(0x7f88295aa000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f88295aa000
    close(3)                                = 0
    arch_prctl(ARCH_SET_FS, 0x7f88295af540) = 0
    mprotect(0x7f88295a4000, 16384, PROT_READ) = 0
    mprotect(0x5640bfd9d000, 4096, PROT_READ) = 0
    mprotect(0x7f88295e0000, 4096, PROT_READ) = 0
    munmap(0x7f88295b0000, 35243)           = 0
    brk(NULL)                               = 0x5640c163f000
    brk(0x5640c1660000)                     = 0x5640c1660000
    openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=3036208, ...}) = 0
    mmap(NULL, 3036208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8829107000
    close(3)                                = 0
    openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
    read(3, "# Locale name alias data base.\n#"..., 3072) = 2995
    read(3, "", 3072)                       = 0
    close(3)                                = 0
    openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat("/", {st_mode=S_IFDIR|0755, st_size=24, ...}) = 0
    openat(AT_FDCWD, "/", O_RDONLY|O_NOCTTY) = 3
    close(3)                                = 0
    openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
    read(3, "2973 2705 0:52 / / rw,noatime ma"..., 1024) = 1024
    read(3, "974 0:287 / /dev/.lxc/sys rw,rel"..., 1024) = 1024
    read(3, "- fuse.lxcfs lxcfs rw,user_id=0,"..., 1024) = 1024
    read(3, "osuid,noexec,relatime master:3 -"..., 1024) = 1024
    read(3, "dev/mqueue rw,relatime - mqueue "..., 1024) = 167
    read(3, "", 1024)                       = 0
    lseek(3, 0, SEEK_CUR)                   = 4263
    close(3)                                = 0
    stat("/", {st_mode=S_IFDIR|0755, st_size=24, ...}) = 0
    uname({sysname="Linux", nodename="mmonit", ...}) = 0
    statfs("/", {f_type=ZFS_SUPER_MAGIC, f_bsize=131072, f_blocks=131072, f_bfree=106395, f_bavail=106395, f_files=27291221, f_ffree=27237152, f_fsid={val=[960432164, 13924911]}, f_namelen=255, f_frsize=131072, f_flags=ST_VALID|ST_NOATIME}) = 0
    openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=26402, ...}) = 0
    mmap(NULL, 26402, PROT_READ, MAP_SHARED, 3, 0) = 0x7f88295b2000
    close(3)                                = 0
    fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x2), ...}) = 0
    write(1, "Filesystem                    Si"..., 63Filesystem                    Size  Used Avail Use% Mounted on
    ) = 63
    write(1, "rpool/data/subvol-999-disk-0   1"..., 54rpool/data/subvol-999-disk-0   16G  3.1G   13G  19% /
    ) = 54
    close(1)                                = 0
    close(2)                                = 0
    exit_group(0)                           = ?
    +++ exited with 0 +++
    

  9. Tildeslash repo owner

    The problem is little bit tricky, i didn’t have time to look on it in detail yet.

    The ZFS I/O statistics are available in /proc/spl/kstat/zfs/<pool>/objset-<hex>now.

    The <hex> is unique for each filesystem in that pool. It is not clear to me yet if it is possible to get the <hex> for given filesystem without parsing all objset* files.

    Example for tank zpool:

    alpine-arm64:~# cat /proc/spl/kstat/zfs/tank/objset-0x36 
    40 1 0x01 7 2160 83309813285 113514182925
    name                            type data
    dataset_name                    7    tank
    writes                          4    0
    nwritten                        4    0
    reads                           4    0
    nread                           4    0
    nunlinks                        4    0
    nunlinked                       4    0
    
    alpine-arm64:~# cat /proc/spl/kstat/zfs/tank/objset-0x10 
    43 1 0x01 7 2160 83314151160 132322538725
    name                            type data
    dataset_name                    7    tank/test1
    writes                          4    0
    nwritten                        4    0
    reads                           4    0
    nread                           4    0
    nunlinks                        4    0
    nunlinked                       4    0
    

    I’ll look on it.

  10. Richard Bergoin

    OK, let’s give some help/clue, there is a command in zfs repository to get path from id:

        fprintf(stderr, "Usage: zfs_ids_to_path [-v] <pool> <objset id> "
            "<object id>\n");
    

    https://github.com/openzfs/zfs/blob/60ffc1c460e4cdf3c3ca12c8840fd0675a98ed0d/cmd/zfs_ids_to_path/zfs_ids_to_path.c#L89

    it call the function zpool_obj_to_pathdefined here: https://github.com/openzfs/zfs/blob/f291fa658efd146540b03ce386133632bde237bf/lib/libzfs/libzfs_pool.c#L4737

    then I think I got a nice clue:

    /*
     * Convert from a dataset to a objset id. Note that
     * we grab the object number from the inode number.
     */
    static int
    object_from_path(const char *dataset, uint64_t object, zinject_record_t *record)
    {
        zfs_handle_t *zhp;
    
        if ((zhp = zfs_open(g_zfs, dataset, ZFS_TYPE_DATASET)) == NULL)
            return (-1);
    
        record->zi_objset = zfs_prop_get_int(zhp, ZFS_PROP_OBJSETID);
        record->zi_object = object;
    
        zfs_close(zhp);
    
        return (0);
    }
    

    from: https://github.com/openzfs/zfs/blob/60ffc1c460e4cdf3c3ca12c8840fd0675a98ed0d/cmd/zinject/translate.c#L129

  11. Tildeslash repo owner

    Thanks for information. The libzfs would be probably problematic though:

    1.) we want to limit the dependency of Monit on 3rd party libraries to the minimum (i’d prefer not to link with libzfs)

    2.) the CDDL license may not be fully compatible with Monit’s AGPLv3 license

    In the worst case, we can use the ‘brute force’ scan of objset-<hexa> files as mentioned in previous post

  12. Richard Bergoin

    Assuming df is not linked to libzfs, the interesting line in the strace log, is the statfs syscall:

    statfs("/", {f_type=ZFS_SUPER_MAGIC, f_bsize=131072, f_blocks=131072, f_bfree=106395, f_bavail=106395, f_files=27291221, f_ffree=27237152, f_fsid={val=[960432164, 13924911]}, f_namelen=255, f_frsize=131072, f_flags=ST_VALID|ST_NOATIME}) = 0
    

  13. Richard Bergoin
    # zfs get objsetid rpool/data/subvol-999-disk-0
    NAME                          PROPERTY  VALUE     SOURCE
    rpool/data/subvol-999-disk-0  objsetid  105740    -
    

    105740 => 0x19d0c so:

    # cat /proc/spl/kstat/zfs/rpool/objset-0x19d0c
    43 1 0x01 7 2160 55671716180 3540084614242925
    name                            type data
    dataset_name                    7    rpool/data/subvol-999-disk-0
    writes                          4    35128423
    nwritten                        4    842467604105
    reads                           4    8558897
    nread                           4    41588065809
    nunlinks                        4    33062
    nunlinked                       4    32983
    

  14. Log in to comment