exec'd program aborts with SIGPIPE when writing to stderr
My exec program (which is bash script) stopped to work recently. Journal is getting such error messages:
Aug 05 16:28:59 live monit[1101]: 'apache' exec: /root/apache-fail.sh
Aug 05 16:28:59 live systemd[1]: Received SIGCHLD from PID 26094 (apache-fail.sh).
Aug 05 16:28:59 live systemd[1]: Child 26094 (apache-fail.sh) died (code=killed, status=13/PIPE)
Script was run with #!/bin/bash -x
shebang, thus, outputing every command into stderr. When I remove -x
and inside of script redirect stdout and stderr into a file script started to work again.
lsof -nP -p $$
inside of exec script shows:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
apache-fa 8012 root cwd DIR 253,0 4096 2 /
apache-fa 8012 root rtd DIR 253,0 4096 2 /
apache-fa 8012 root txt REG 253,0 960376 917674 /usr/bin/bash
apache-fa 8012 root mem REG 253,0 61928 927293 /usr/lib64/libnss_files-2.17.so
apache-fa 8012 root mem REG 253,0 106065056 925303 /usr/lib/locale/locale-archive
apache-fa 8012 root mem REG 253,0 2112384 927116 /usr/lib64/libc-2.17.so
apache-fa 8012 root mem REG 253,0 19520 927141 /usr/lib64/libdl-2.17.so
apache-fa 8012 root mem REG 253,0 174520 927410 /usr/lib64/libtinfo.so.5.9
apache-fa 8012 root mem REG 253,0 164440 927070 /usr/lib64/ld-2.17.so
apache-fa 8012 root mem REG 253,0 26254 1182178 /usr/lib64/gconv/gconv-modules.cache
apache-fa 8012 root 0r CHR 1,3 0t0 1028 /dev/null
apache-fa 8012 root 1u unix 0xffff880036ac5a00 0t0 22102 socket
apache-fa 8012 root 2u unix 0xffff880036ac5a00 0t0 22102 socket
apache-fa 8012 root 255r REG 253,0 796 2108312 /root/apache-fail.sh
Is it correct that descriptors 1 and 2 are connected to some unix/socket instead of FIFO/pipe?
This is in monit 5.18, I didn't notice such behavior in 5.14.
Comments (6)
-
repo owner -
reporter Do you think having socket on the stdout and stderr is ok? I can reproduce that (just
lsof -nP -p $$ >> /root/exec.log
at the top of the script).Problem with script aborting with SIGPIPE is intermittent. I did not investigate it deeper than finding socket on stdout and stderr — and asking you question, which you did not answer. Month before it was working without problem, but since last week I got a lot of failures.
-
reporter I am unable to reproduce SIGPIPE abort on other box in minimal configuration. On where it aborts there is many monit rules (maybe that's important factor), and example simplified rule is:
check file restart-apache-test with path /root/restart-apache-test if changed timestamp then exec "/root/apache-fail.sh"
Simplified version of script is:
#!/bin/bash set -x cd /root D=`date +"%Y-%m-%dT%H:%M:%S"` mkdir $D cd $D exec >> stdout.log 2>&1 date ps axuwwwf date systemctl restart httpd.service date
Experiment:
root@live:~# touch /root/restart-apache-test # I wait for alert, then root@live:~# ls -altr|tail -rwxr-xr-x 1 root root 678 Aug 6 20:25 apache-fail.sh -rw-r--r-- 1 root root 0 Aug 6 20:26 restart-apache-test # dir with date is not created. root@live:~# journalctl -r|grep 'apache-fail.sh' Aug 06 20:30:02 live systemd[1]: Child 7168 (apache-fail.sh) died (code=killed, status=13/PIPE) Aug 06 20:30:02 live systemd[1]: Received SIGCHLD from PID 7168 (apache-fail.sh). Aug 06 20:30:02 live monit[1101]: 'restart-apache-test' exec: /root/apache-fail.sh ...
-
reporter Experiment 2. Script:
#!/bin/bash date >> /root/restart-apache.log echo stdout echo stdout >> /root/restart-apache.log echo stderr >&2 echo stderr >> /root/restart-apache.log set -x echo after set-x >> /root/restart-apache.log cd /root echo after cd/root >> /root/restart-apache.log D=`date +"%Y-%m-%dT%H:%M:%S"` mkdir $D cd $D exec >> stdout.log 2>&1 date ps axuwwwf date systemctl restart httpd.service date
After restart-apache-test is triggered and script aborted with SIGPIPE:
root@live:~# cat restart-apache.log Sat Aug 6 20:39:26 MSK 2016
-
reporter root@live:~# ps ax|grep monit 1101 ? Ssl 2:01 /usr/bin/monit -I root@live:~# strace -o monit.strace.log -ff -p 1101 & root@live:~# touch /root/restart-apache-test
After alert is received:
root@live:~# egrep -i 'execve|clone|pipe' monit.strace.log.* monit.strace.log.1101:write(2, "'restart-apache-test' timestamp "..., 130) = -1 EPIPE (Broken pipe) monit.strace.log.1101:--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1101, si_uid=0} --- monit.strace.log.1101:write(1, "'restart-apache-test' exec: /roo"..., 49) = -1 EPIPE (Broken pipe) monit.strace.log.1101:--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=1101, si_uid=0} --- monit.strace.log.1101:clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc5bfa64b10) = 8339 monit.strace.log.8339:clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc5bfa64b10) = 8341 monit.strace.log.8341:rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fc5bf13c100}, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7fc5bf13c100}, 8) = 0 monit.strace.log.8341:execve("/root/apache-fail.sh", ["/root/apache-fail.sh"], [/* 7 vars */]) = 0 monit.strace.log.8341:clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6139457a10) = 8344 monit.strace.log.8341:write(1, "stdout\n", 7) = -1 EPIPE (Broken pipe) monit.strace.log.8341:--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=8341, si_uid=0} --- monit.strace.log.8341:+++ killed by SIGPIPE +++ monit.strace.log.8344:execve("/usr/bin/date", ["date"], [/* 10 vars */]) = 0
No pipe() syscalls.
root@live:~# monit -V This is Monit version 5.18 Built with ssl, with pam and with large files Copyright (C) 2001-2016 Tildeslash Ltd. All Rights Reserved.
-
repo owner - changed version to 5.18.0
- Log in to comment
Please provide the following so we can reproduce: