Mailpiler failed after Debain kernel upgrade

Issue #433 resolved
Michael Schefczyk created an issue

Dear Janos Suto, dear All,

Early 2014, I switched to archiving my personal e-mail history using mailpiler. Thanks to Janos Suto for the excellent software! I am running mailpiler on a debian wheezy virtual machine.

Until yesterday, everything worked fine. Today, I installed a new linux kernel (3.2.0-4-amd64). Thereafter, the VM with that kernel, Debian 4.7.2-5, Piler 1.1.0, build 884 and mysql 5.5.38-0+wheezy1 failed. Of course, I do not know about causality.

Trying to access mailpiler from the browser, I received: Error: SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111) on database: sphinx

I was a bit surprised, as I knew of a mysql DB “piler” but not of a mysql database called like the search package “sphinx” at any time. I have the piler key, all relevant directories (/var/piler, /var/lib/mysql) and the databases backed up at all times. Trying to revert the VM from a backup did not help. I could also not detect basic mysql issues including issues around InnoDB. After some experimenting, mysql would no longer start for a missing mysqld.sock file. I then decided to reinstall a VM with mailpiler. That did work, basically. However, when mounting not only the usual /var/piler but also /var/lib/mysql, mysql would again not start. Next, I started with a fresh VM with the usual /var/piler, but instead of mounting /var/lib/mysql, I did import the piler DB via phpmyadmin and set the piler key right after make postinstall. That did work in terms of mysql working, the system recognizing users, and the health monitor looking OK. While the health monitor did show the expected number of messages, I could not search for messages. My next step, reindexing, seemed to work.

After a reboot, unfortunately, I did again see Error: SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111) on database: sphinx

Now, mysql is running, but reindexing does not work (“cant open: xxxxxx.eml”), while the key is present, as well as /var/piler/store does contain a lot of data.

Please let me know what might be the issue and how to fix it.

Regards,

Michael Schefczyk

Comments (11)

  1. Janos SUTO repo owner

    Hello Michael,

    a deban upgrade shouldn't cause any issue.

    Error: SQLSTATE[HY000] [2003] Can't connect to MySQL server on '127.0.0.1' (111) on database: sphinx
    

    The above message says, that the gui can't connect to searchd. Sphinx actually is not a mysql database, rather sphinx provides an sql-like interface that can be accessed like a mysql database. So check if searchd is running, and listens on 127.0.0.1:9306. It's important to use the piler shipped version of the rc.searchd script, because it starts it as user piler.

    Regarding the reindexing issue: make sure you go to a directory where user piler can write files, eg. /tmp or /var/piler/imap, etc. Recent builds of piler actually warns you if the current directory is not good for reindex.

    So keep me posted how it's going.

  2. Michael Schefczyk reporter
    • changed status to open

    Dear Janos,

    Thank you very much for your prompt response. I do run four Debian wheezy machines and on two of them - including the mailpiler one - I had issues with mysql during the kernel/mysql update. In both cases, the InnoDB needed repair and on the mailpiler server, I could not disentangle that from the sphinx issue.

    As it turns out, the only specific cause here will have been that searchd was not running. While /etc/init.d/rc.searchd seems to be the same file before and after the upgrade which was copied from your piler file package to /etc/init.d and which seems to have the right permissions and be executable throughout, searchd plainly refuses to start.

    Starting searchd manually by entering "/etc/init.d/rc.searchd start" does work with no issues. Having it start automatically upon bootup, which is the purpose of /etc/init.d as far as I know, does not work. Starting it from crontab via (at) reboot ... as a measure of last resort does not work either.

    For the time being, I am unable to resolve the basic issue so I will need to start searchd manually after rebooting the server. At least, however, all data and the complete functionality are available.

    Regards,

    Michael

  3. Janos SUTO repo owner

    Is a teamviewer or similar session possible? If so, then find me on skype (janos.suto).

  4. Michael Schefczyk reporter

    Dear Janos,

    thank you very much for your intervention! Thereafter, I did find a way to cure the problem. While the VM does have sufficient CPU and RAM, after the kernel upgrade, there seems to be a timing issue: Searchd does start, but it will not start right at bootup. It works after waiting for a short while. Thus it works, if I enter the following line to crontab:

    (at)reboot root sleep 120 && /etc/init.d/rc.searchd start

    Making the gap substantially lower than 120 seconds creates problems. Maybe one must consider that /var/piler is attached from a NAS via fstab, but that has been the case with my server all along.

    This was not necessary for many months before. To determine that this is the issue, I did rebuild the VM from scratch again.

    That, however, brought another critical problem: I did backup all data and database directories. Now, if I want to do a reindex, I get:

    zpipe: invalid or incomplete deflate data

    The key is certainly the right one and its permissions are OK as well. To be safe, I set the permissions of /var/piler to 777 and the owner to piler:piler all of this recursively. The system (Debian wheezy 7.7 Kernel 3.2.0-4-amd64) indicates that zlib1g-dev is the most recent one.

    Should I rebuild the server again or does this zpipe issue tell you anything?

    Regards,

    Michael

  5. Janos SUTO repo owner

    You also need the "iv" parameter from (the old?) piler.conf. The encryption uses both the iv vector and the piler key. Btw. can you measure somehow if there's any delay before the /var/piler mount gets available after running rc.local? Eg. put "df -h > /tmp/df.test" as the first command in rc.local

  6. Michael Schefczyk reporter

    Dear Janos,

    Thank you very much and sorry for me having overlooked that fact. Last rebuild, I just copied the piler.conf file. This time, I did enter parameters individually, forgetting one critical item. With the correct iv, the reindexing does run.

    Regards,

    Michael

  7. datapharmer

    I'm also having trouble with searchd not starting at boot under debian. Is (at)reboot root sleep 120 && /etc/init.d/rc.searchd start considered the official fix or is there a better solution that I've missed?

  8. Log in to comment