Pipeline not dying when started from systemd

Issue #96 resolved
Former user created an issue

Hi, I see this on SuSE Linux Enterprise 12 sp3 with sh=/bin/bash. No idea if that's linux- or bash- or only SuSE-specific. Starting /usr/sbin/sshguard manually and then killing any part of the pipeline (the backend, sshg-parser, sshg-blocker) results in the whole sshguard process to end. That's fine.

But when started from systemd, all (or most) other parts of the pipeline will continue to run when one is killed. Initially it will look like:

   Active: active (running) since Mon 2018-07-23 10:19:07 CEST; 1s ago
 Main PID: 3971 (sshguard)
    Tasks: 8 (limit: 1024)
   CGroup: /system.slice/mysshguard.service
           |-3971 /bin/sh /usr/sbin/sshguard
           |-3975 /bin/sh /usr/sbin/sshguard
           |-3976 /usr/lib/sshg-parser
           |-3977 /usr/lib/sshg-blocker -a 100 -b 240:/var/log/sshguard.db -p 1200 -s 5400 -N 128 -n 32 -w /etc/sshguard.whitelist
           |-3978 /bin/sh /usr/sbin/sshguard
           |-3979 /bin/bash /usr/sbin/mysshguard
           `-3980 tail -F -n 0 /var/log/messages

but after e.g. "pkill -f sshg-blocker" we see this:

   Active: active (running) since Mon 2018-07-23 10:19:07 CEST; 7s ago
 Main PID: 3971 (sshguard)
    Tasks: 4 (limit: 1024)
   CGroup: /system.slice/mysshguard.service
           |-3971 /bin/sh /usr/sbin/sshguard
           |-3975 /bin/sh /usr/sbin/sshguard
           |-3976 /usr/lib/sshg-parser
           `-3980 tail -F -n 0 /var/log/messages

and for systemd it looks like the process is still fine. Using -$$ inside the sshguard script to kill the whole cgroup doesn't work although it should solve exactly this problem. In my environment the solution is adding -9 to the kill process in the last line of sshguard:

eval $tailcmd | $libexec/sshg-parser | \ $libexec/sshg-blocker $flags | ( $BACKEND ; kill -9 -PIPE $$ )

With -9 it doesn't matter if you use -PIPE or not or -$$ or $$, it always works. As soon as any part of the pipeline dies, systemd restarts the whole process group and everything is running fine again.

Of course I don't know if this is a general solution for all shells or distributions...

cu, Frank

Comments (4)

  1. Frank Steiner

    Even with the -9 it's still possible to get the systemd into an insane state. Initially it looks like this: Active: active (running) since Mon 2018-08-06 13:00:44 CEST; 22s ago Main PID: 64528 (sshguard) Tasks: 8 (limit: 1024) CGroup: /system.slice/mysshguard.service |-64528 /bin/sh /usr/sbin/sshguard |-64530 /bin/sh /usr/sbin/sshguard |-64531 /usr/lib/sshg-parser |-64532 /usr/lib/sshg-blocker -a 80 -b 24000:/var/log/sshguard.db -p 1200 -s 5400 -N 128 -n 32 -w /etc/sshguard.whitelist |-64533 /usr/bin/journalctl -afb -p info -n1 -t sshd -t pure-ftpd -t ftpd -t ftp -o cat |-64534 /bin/sh /usr/sbin/sshguard `-64535 /bin/bash /usr/sbin/mysshguard

    Killing the backend (/usr/sbin/mysshguard) or sshg-parster or -blocker causes a complete restart by systemd which is fine. But after killing last sshguard (PID 64534) you can e.g. kill the parser which will also finish the backend and the blocker. But for systemd it still looks fine:

    Active: active (running) since Mon 2018-08-06 13:00:44 CEST; 2min 13s ago Main PID: 64528 (sshguard) Tasks: 3 (limit: 1024) CGroup: /system.slice/mysshguard.service |-64528 /bin/sh /usr/sbin/sshguard |-64530 /bin/sh /usr/sbin/sshguard `-64533 /usr/bin/journalctl -afb -p info -n1 -t sshd -t pure-ftpd -t ftpd -t ftp -o cat

    And so "systemctl status sshguard" will return 0.

    No idea if that could really happen, but it seems that it's very problematic to make systemd monitor all parts of a longer pipe.

  2. Log in to comment