Piler does not respond

Issue #324 resolved
s s created an issue

Hi We have piler 0.1.25-master-branch, build 857 Sometimes i noticed that piler does not respond (when i make telnet localhost 25, for ex.). I see following record in log (verbosity=5)

un 16 12:08:38 server piler[11370]: child (pid: 11370) served enough: 1000
Jun 16 12:08:38 server piler[11370]: child decides to exit (pid: 11370)

i tried to stop piler, but process '/usr/local/sbin/piler' was still alive. So only after i killed it i was able to start piler again and accept mail.

What is the limit for child process? I think piler could not fork another process.

I can not update this server to the latest piler. Please update if you have solution for this issue.

Thank you.

Comments (12)

  1. s s reporter

    I see this setting in config: max_requests_per_child=1000 If i change it to 1000000 i need anyway restart piler when max_requests_per_child exeeded? What is the max value for this?

  2. Janos SUTO repo owner

    The syslog message looks normal with verbose logging. It merely informs you that a child has processed 1000 (by default) messages, and it's time to die, and allow or force the master process to fork a new child. Note that you may disable this behaviour by setting max_requests_per_child=0.

    However I think it may not be the root of your problems. I had a similar issue, and had 10(+1) piler processes running, and still couldn't process any new emails. It turned out that a buggy attachment made a helper program (eg. pdftotext, etc) to hang, thus blocking the piler child process indefinitely.

    The solution was to introduce the usage of the timeout program. To figure out the workaround for you, please run ./configure --help, and check if you have --with-plugin-timeout=N

  3. s s reporter

    Thank you for the answer. I do not have option '--with-plugin-timeout'. And i think you're right because this server did not have this issue previously (it hanged on Sunday and Saturday). Previously it was working for several month with ~25000 per day. If i decide to compile latest version of piler, which value for --with-plugin-timeout you recommend?

  4. Janos SUTO repo owner

    Yes, upgrading is one possible solution, or fixing the piler-config.h file. To do this you may edit eg. HAVE_PDFTOTEXT to look like this:

    #define HAVE_PDFTOTEXT "/bin/timeout /usr/bin/pdftotext"

    and perhaps with the other helpers, and then all you have to do is to run "make clean all" to get the updated binaries.

  5. s s reporter

    With the latest version from master branch i see this option available.

    --with-plugin-timeout=N
    

    Which value for N you recommend?

  6. Janos SUTO repo owner

    I'd say that many seconds until all the plugins can safely do their jobs to extract the text. 10...15 seconds could be a sane value.

  7. Janos SUTO repo owner

    I don't think it's worth to disable the "max_requests_per_child" feature. Instead try the timeout fix I've suggested.

  8. Log in to comment