pilerpurge causes a segfault in libmysqlclient.so.18.0.0

Issue #1228 closed
supporthq created an issue

Hi Janos,

It has been a very long time posting an issue as your solution was been working great for many years….

I have enabled pilerpurge now as it has been 7+ years of collecting emails, and there are now emails that it can delete.

Running pilerpurge as user piler faulst roughly around the same spot, give or take a few records

removing attachment: */var/piler/store/00/538/91/1a/40000000538b62db354f02c400a5deb0911a.a3*
removing attachment: */var/piler/store/00/538/91/1a/40000000538b62db354f02c400a5deb0911a.a4*
removing attachment: */var/piler/store/00/538/91/1a/40000000538b62db354f02c400a5deb0911a.a5*
removing attachment: */var/piler/store/00/538/91/1a/40000000538b62db354f02c400a5deb0911a.a6*
removing attachment: */var/piler/store/00/538/91/1a/40000000538b62db354f02c400a5deb0911a.a7*
Segmentation fault

In the logs, I can see this entry:

Jan 30 11:41:10 piler kernel: [ 2857.729531] pilerpurge[3730]: segfault at 2c02cc3711 ip 00007f2e22a0c79a sp 00007fffee763370 error 4 in libmysqlclient.so.18.0.0[7f2e229d7000+2bf000]

Have tried lots of this like stopping piler, searchd, cron while running pilerpurge, also gave the machine lots more RAM and CPU but no improvement

My version of piler is very old, haven't had a need to update, am a little nervous about upgrade, if you think that is the solution then are there any instructions for this many versions or other suggestions….maybe just export into a new OVA??

root@piler:~# piler -v
piler 1.1.1, build 904, Janos SUTO <sj@acts.hu>

Build Date: Sun Sep 27 15:05:42 EST 2015
ldd version: ldd (Debian EGLIBC 2.13-38) 2.13
gcc version: gcc version 4.7.2 (Debian 4.7.2-5)
Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-starttls --enable-tcpwrappers

Comments (17)

  1. supporthq reporter

    I may have worked aournd the issue, I am setting the date on the system back to 2013 and running pilerpurge, seems to complete without error discarding a few thousand emails…..then setting date to 2014 and running pilerpurge again, and so on. So far it seems to work.

    Out of interest, how often should you run it? daily? weekly? monthly?

  2. Janos SUTO repo owner

    Well, I usually suggest to upgrade for both new features and bugfixes. Since your version is indeed pretty old it’s worth to upgrade, however Debian 7 is outdated as well, and the recent versions of piler require recent OS packages as well.

    Anyway, if you managed to find a solution, it’s great. I’d run pilerpurge daily, because in that case it needs to remove fewer messages, and it finishes faster.

  3. supporthq reporter

    Thanks for your reply…

    I have been rolling thru the purge….I suspect it is faulting on a particular record rather than the volume of records to purge…..I will try and narrow it down

  4. supporthq reporter

    I stand corrected….It was seg faulting when I would purge a whole month between 2020-04 and 2020-05, so i made up a script that would increment the clock by one day and run the purge…and it completed without issue, each day takes around 8 to 9 minutes. So maybe it is the volume after all, if not emails but the total number of attachments???

    Here is when the fault occured for a 1 month batch

    May 1 01:07:03 piler kernel: [ 8720.063675] pilerpurge[4307]: segfault at 2c019d380a ip 00007fec07c7c79a sp 00007fffdb7b2c20 error 4 in libmysqlclient.so.18.0.0[7fec07c47000+2bf000]

    These are the ‘day by day’ batches….

    Apr 1 01:02:16 piler pilerpurge[4809]: purged 0 messages, 0 bytes
    Apr 2 01:09:57 piler pilerpurge[4827]: purged 123 messages, 25842526 bytes
    Apr 3 01:09:04 piler pilerpurge[4895]: purged 619 messages, 160913947 bytes
    Apr 4 01:08:18 piler pilerpurge[4959]: purged 690 messages, 238340349 bytes
    Apr 5 01:08:32 piler pilerpurge[5007]: purged 597 messages, 256511351 bytes
    Apr 6 01:07:56 piler pilerpurge[5049]: purged 454 messages, 111199477 bytes
    Apr 7 01:06:54 piler pilerpurge[5100]: purged 92 messages, 35165971 bytes
    Apr 8 01:08:16 piler pilerpurge[5144]: purged 121 messages, 40104701 bytes
    Apr 9 01:08:18 piler pilerpurge[5184]: purged 571 messages, 132024614 bytes
    Apr 10 01:08:45 piler pilerpurge[5234]: purged 576 messages, 186112442 bytes
    Apr 11 01:08:15 piler pilerpurge[5286]: purged 555 messages, 151551863 bytes
    Apr 12 01:08:14 piler pilerpurge[5334]: purged 470 messages, 168594676 bytes
    Apr 13 01:08:15 piler pilerpurge[5374]: purged 424 messages, 166376689 bytes
    Apr 14 01:06:56 piler pilerpurge[5429]: purged 86 messages, 16708693 bytes
    Apr 15 01:07:15 piler pilerpurge[5473]: purged 156 messages, 73282748 bytes
    Apr 16 01:09:13 piler pilerpurge[5508]: purged 601 messages, 139985711 bytes
    Apr 17 01:08:02 piler pilerpurge[5573]: purged 414 messages, 62453444 bytes
    Apr 18 01:09:04 piler pilerpurge[5622]: purged 449 messages, 238614441 bytes
    Apr 19 01:08:12 piler pilerpurge[5679]: purged 527 messages, 108773786 bytes
    Apr 20 01:08:27 piler pilerpurge[5727]: purged 387 messages, 104392763 bytes
    Apr 21 01:06:52 piler pilerpurge[5778]: purged 88 messages, 10268590 bytes
    Apr 22 01:07:20 piler pilerpurge[5822]: purged 100 messages, 66370951 bytes
    Apr 23 01:09:30 piler pilerpurge[5856]: purged 667 messages, 214714134 bytes
    Apr 24 01:08:56 piler pilerpurge[5927]: purged 643 messages, 157633252 bytes
    Apr 25 01:08:30 piler pilerpurge[5979]: purged 547 messages, 137847128 bytes
    Apr 26 01:07:20 piler pilerpurge[6029]: purged 118 messages, 23254623 bytes
    Apr 27 01:08:10 piler pilerpurge[6064]: purged 441 messages, 117676670 bytes
    Apr 28 01:07:27 piler pilerpurge[6114]: purged 82 messages, 41702033 bytes
    Apr 29 01:07:19 piler pilerpurge[6160]: purged 180 messages, 169735874 bytes
    Apr 30 01:09:07 piler pilerpurge[6195]: purged 607 messages, 168567069 bytes

  5. supporthq reporter

    Seems to be doing it quite randomly now, I might just let it keep running and hoping it eventually finishes….

    Not sure what ‘state’ the database and filesystem are left in when the purge has a seg fault…..is there are query/script I can run to check the integrity? Worried that it is leaving orphans files or records.

    30-04-2020
    Thu Apr 30 00:00:00 EST 2020
    07-05-2020
    Thu May 7 00:00:00 EST 2020
    14-05-2020
    Thu May 14 00:00:00 EST 2020
    21-05-2020
    Thu May 21 00:00:00 EST 2020
    Segmentation fault
    28-05-2020
    Thu May 28 00:00:00 EST 2020
    Segmentation fault
    04-06-2020
    Thu Jun 4 00:00:00 EST 2020
    11-06-2020
    Thu Jun 11 00:00:00 EST 2020
    18-06-2020
    Thu Jun 18 00:00:00 EST 2020
    Segmentation fault
    25-06-2020
    Thu Jun 25 00:00:00 EST 2020
    Segmentation fault
    02-07-2020
    Thu Jul 2 00:00:00 EST 2020
    09-07-2020
    Thu Jul 9 00:00:00 EST 2020
    16-07-2020
    Thu Jul 16 00:00:00 EST 2020
    23-07-2020
    Thu Jul 23 00:00:00 EST 2020
    30-07-2020
    Thu Jul 30 00:00:00 EST 2020
    06-08-2020
    Thu Aug 6 00:00:00 EST 2020

  6. supporthq reporter

    I managed to get thru a full year by incrementing one day at a time….it is a slow process with takes about 7min to run the purge on each instance. I thin I can manage through it but definately seems like a daily task to keep the database in tip top shape.

    I do need help with one thing however, i had three emails that where imported with a weird future date in the year 2034, I followed some advice in another thread that said to update the metadata record with an old retention date…interestingly, the three emails are still appearing at the top of the search but a marked as ‘message not verified’, if I try download it I get a “pilerget cannot open' error. So I seems that the files have been purged, but the metadata record is still in the database.

    What steps can I take to manually remove this entry?

  7. Janos SUTO repo owner

    In order to keep the messaging history the purge utility doesn’t remove anything from the piler mysql database.

    If you insist to remove those messages (I personally believe that it’s not a good idea), then delete the affected id from both metadata and rcpt tables.

  8. supporthq reporter

    Hello again, I thin I am completed with the purge, putting in the daily routine so should be OK from now on. One thing I did notice is that all the ‘purges’ messages are still appearing in the auditor search with a ‘message not verified’ message. It indeed looks like the files have been purged and the database as deleted = 1 in the metadata…..Is this expected behavior or does something need to run remove these purged messages from appearing in the search?

  9. Janos SUTO repo owner

    Yes, the deleted=1 column setting is intentional. Run the delta indexer to get rid of the deleted messages from the gui.

  10. supporthq reporter

    As far as I can tell, the messages have record deleted=1, the .m file has been deleted, and the .a files are still present, and the messages still appear in the search (example below). I have these two items in crontab, am I missing something?

    5,35 4-23 * * * /usr/local/libexec/piler/indexer.delta.sh
    30   2    * * * /usr/local/libexec/piler/indexer.main.sh
    

  11. Janos SUTO repo owner

    No, the crontab entries look fine. I’m not sure why the kill-list doesn’t kick in. Btw. what sphinx version do you have?

  12. supporthq reporter

    Here is the sphinx version

    Sphinx 2.0.4-release (r3135)
    Copyright (c) 2001-2012, Andrew Aksyonoff
    Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
    

  13. supporthq reporter

    Would it have anything to do with being up to main4’?

    /etc/sphinxsearch/sphinx.conf
    
    index main1
    {
            source                  = main1
            path                    = /var/piler/sphinx/main1
            docinfo                 = extern
            charset_type            = utf-8
            enable_star             = 1
            min_prefix_len          = 6
            min_word_len            = 1
    
    }
    
    index main2
    {
            source                  = main2
            path                    = /var/piler/sphinx/main2
            docinfo                 = extern
            charset_type            = utf-8
            enable_star             = 1
            min_prefix_len          = 6
            min_word_len            = 1
    }
    
    index main3
    {
            source                  = main3
            path                    = /var/piler/sphinx/main3
            docinfo                 = extern
            charset_type            = utf-8
            enable_star             = 1
            min_prefix_len          = 6
            min_word_len            = 1
    }
    
    index main4
    {
            source                  = main4
            path                    = /var/piler/sphinx/main4
            docinfo                 = extern
            charset_type            = utf-8
            enable_star             = 1
            min_prefix_len          = 6
    

  14. Janos SUTO repo owner

    No, I don’t think so. I suspect that the ‘no subject’ email is caused by a phantom or bogus sphinx entry. Check the mail log when you click on it whether it syslogs any error.

  15. Log in to comment