After Delta / Main Index Job less emails in web ui

Issue #1229 closed
Alexander Rieder created an issue

Hey 🙂

i am facing a strange issue.. I imported several PST files to piler.

After not all of them appeared in the web ui for the auditor, I reindexed the mails using “reindex -a” as the piler user. After manually executing the delta-indexer, there are about 40000 emails visible in the webui. Executing the delta-indexer again leads to only 30000 emails. And after executing the main-indexer, the email count drops to 15000.

Here is my piler -V output:

piler 1.3.9, build 998, Janos SUTO <sj@acts.hu>

Build Date: Thu Sep 3 10:42:52 CEST 2020
ldd version: ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31
gcc version: gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)
OS: Linux archive01 5.4.0-45-generic #49-Ubuntu SMP Wed Aug 26 13:38:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-memcached
MySQL client library version: 10.4.14
Extractors: /usr/bin/pdftotext /usr/bin/catdoc /usr/bin/catppt /usr/bin/xls2csv /usr/local/bin/ppthtml /usr/bin/unrtf /usr/bin/tnef libzip

Here are the relevant sphinx-queries from the syslog:

Feb  3 16:14:40 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 39726 total found
Feb  3 16:14:41 archive01 qemu-ga: info: guest-ping called
Feb  3 16:14:45 archive01 piler-webui[128756]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 39726 total found
Feb  3 16:14:51 archive01 root: INDEXER INFO: indexing delta1 started
Feb  3 16:14:51 archive01 root: INDEXER INFO: indexing delta1 finished
Feb  3 16:14:52 archive01 qemu-ga: info: guest-ping called
Feb  3 16:14:54 archive01 piler-webui[741]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:14:56 archive01 root: INDEXER INFO: merging delta to dailydelta started
Feb  3 16:14:57 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:14:58 archive01 piler-webui[128756]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 30014 total found
Feb  3 16:15:01 archive01 root: INDEXER INFO: merging delta to dailydelta finished
Feb  3 16:15:01 archive01 CRON[173405]: (piler) CMD (/usr/bin/find /var/piler/www/tmp -type f -name i.\* -exec rm -f {} \;)
Feb  3 16:15:01 archive01 CRON[173406]: (piler) CMD (/usr/local/bin/indexer --quiet tag1 --rotate --config /usr/local/etc/piler/sphinx.conf)
Feb  3 16:15:01 archive01 CRON[173407]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb  3 16:15:01 archive01 CRON[173409]: (piler) CMD (/usr/local/bin/indexer --quiet note1 --rotate --config /usr/local/etc/piler/sphinx.conf)
Feb  3 16:15:01 archive01 CRON[173408]: (piler) CMD (/usr/bin/find /var/piler/error -type f|wc -l > /var/piler/stat/error)
Feb  3 16:15:03 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:14 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:25 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:36 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:46 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:15:47 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:52 archive01 piler-webui[128756]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 30014 total found
Feb  3 16:15:54 archive01 root: INDEXER INFO: indexing delta1 started
Feb  3 16:15:54 archive01 root: INDEXER INFO: indexing delta1 finished
Feb  3 16:15:57 archive01 piler-webui[741]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:15:58 archive01 qemu-ga: info: guest-ping called
Feb  3 16:15:59 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:15:59 archive01 root: INDEXER INFO: merging delta to dailydelta started
Feb  3 16:16:04 archive01 root: INDEXER INFO: merging delta to dailydelta finished
Feb  3 16:16:09 archive01 qemu-ga: info: guest-ping called
Feb  3 16:16:20 archive01 qemu-ga: info: guest-ping called
Feb  3 16:16:29 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:16:31 archive01 qemu-ga: info: guest-ping called
Feb  3 16:16:31 archive01 piler-webui[128756]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 30014 total found
Feb  3 16:16:42 archive01 qemu-ga: info: guest-ping called
Feb  3 16:16:53 archive01 qemu-ga: info: guest-ping called
Feb  3 16:16:56 archive01 piler-smtp[858]: connected from 127.0.0.1:34502 on fd=6 (active connections: 1)
Feb  3 16:16:56 archive01 piler-smtp[858]: disconnected from 127.0.0.1 on fd=6 (0 active connections)
Feb  3 16:17:01 archive01 piler-smtp[858]: connected from 127.0.0.1:34508 on fd=6 (active connections: 1)
Feb  3 16:17:01 archive01 piler-smtp[858]: disconnected from 127.0.0.1 on fd=6 (0 active connections)
Feb  3 16:17:01 archive01 CRON[173733]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Feb  3 16:17:05 archive01 qemu-ga: info: guest-ping called
Feb  3 16:17:16 archive01 qemu-ga: info: guest-ping called
Feb  3 16:17:27 archive01 qemu-ga: info: guest-ping called
Feb  3 16:17:38 archive01 qemu-ga: info: guest-ping called
Feb  3 16:17:49 archive01 qemu-ga: info: guest-ping called
Feb  3 16:17:53 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:00 archive01 qemu-ga: info: guest-ping called
Feb  3 16:18:02 archive01 root: INDEXER INFO: merging to main started
Feb  3 16:18:05 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:07 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:08 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:09 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:10 archive01 root: INDEXER INFO: merging to main finished
Feb  3 16:18:11 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:12 archive01 qemu-ga: info: guest-ping called
Feb  3 16:18:12 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.01 s, 20 hits, 30014 total found
Feb  3 16:18:13 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 30014 total found
Feb  3 16:18:14 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 30014 total found
Feb  3 16:18:15 archive01 root: INDEXER INFO: resetting daily delta started
Feb  3 16:18:15 archive01 root: INDEXER INFO: resetting daily delta finished
Feb  3 16:18:16 archive01 piler-webui[742]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 20 hits, 15507 total found

In advance I thank you for your time :)

Comments (16)

  1. Alexander Rieder reporter

    Thanks for the response. Just as a note: The auditor/revisor has only access to his emails which should theoretically be about 50000. However I think some are “invalid” and therefore i expect about 45k emails.

    here are my results:

    285658 metadata rows
    auditor has 15608 items in webui count

    reindexing:

    reindex -a -p
    failed to add to sph_index table: 285651.eml
    failed to add to sph_index table: 285652.eml
    failed to add to sph_index table: 285653.eml
    failed to add to sph_index table: 285654.eml
    failed to add to sph_index table: 285655.eml
    failed to add to sph_index table: 285656.eml
    failed to add to sph_index table: 285657.eml
    failed to add to sph_index table: 285658.eml
    failed to add to sph_index table: 285659.eml
    failed to add to sph_index table: 285660.eml
    failed to add to sph_index table: 285661.eml
    processed:   285647 [ 99%]
    put 285647 messages to sph_index table for reindexing
    

    after reindex:

    285658 metadata rows
    auditor has 15608 items in webui count

    after reindex and first delta-index run:
    285658 metadata rows
    auditor has 44824 items in webui count

    after second delta-index run:
    285658 metadata rows
    auditor has 30216 items in webui count

    after third delta-index run:
    285658 metadata rows
    auditor has 30216 items in webui count

    after main-index run:
    285658 metadata rows
    auditor has 15608 items in webui count

  2. Janos SUTO repo owner

    I think didn’t mention that you turned on ENABLE_SAAS variable. Try the following:

    $ mysql -h 127.0.0.1 -P9306
    mysql> select id FROM main1,dailydelta1,delta1;
    mysql> show meta like 'total_found';
    

  3. Alexander Rieder reporter

    Here is the output:

    MySQL [(none)]> select id FROM main1,dailydelta1,delta1;
    +------+
    | id   |
    +------+
    |    1 |
    |    2 |
    |    3 |
    |    4 |
    |    5 |
    |    6 |
    |    7 |
    |    8 |
    |    9 |
    |   10 |
    |   11 |
    |   12 |
    |   13 |
    |   14 |
    |   15 |
    |   16 |
    |   17 |
    |   18 |
    |   19 |
    |   20 |
    +------+
    20 rows in set (0.003 sec)
    
    MySQL [(none)]> show meta like 'total_found';
    +---------------+--------+
    | Variable_name | Value  |
    +---------------+--------+
    | total_found   | 286919 |
    +---------------+--------+
    1 row in set (0.000 sec)
    

    Actually I have not enabled “ENABLE_SAAS”… should I?

    cat /var/piler/www/config-site.php
    <?php
    $config['ENABLE_PDF_DOWNLOAD'] = 1;
    $config['RESTRICTED_AUDITOR'] = 1;
    $config['ENABLE_MOBILE_PREVIEW'] = 1;
    

  4. Janos SUTO repo owner

    No, you shouldn’t. Well, if ENABLE_SAAS is not enabled, then the auditor user is not really an auditor. Auditor users don’t have any email filter applied to their sphinx query like MATCH(' (@rcptdomain xyzdomainXde | @senderdomain xyzdomainXde ) ')

  5. Alexander Rieder reporter

    Ok, so I think I “misused” the ‘RESTRICTED_AUDITOR’ to provide some kind of a multi-tenant function.

    I am still facing the issue, that in the webui I am missing emails after several index runs

  6. Alexander Rieder reporter

    Ok - some more news….

    I have the problem (despite the count number) that some Emails can not be found (especially sent items). It seems some emails have not the correct “From”-Header (mailer-daemon).

    Is there any way to correct the header in the exisiting emails and reimport them again? How could I only delete Emails having a wrong “from”-Header and then reimport them again?

  7. Janos SUTO repo owner

    Unfortunately, no. The primary objective of the archive is to preserve emails as they are. Since you have ~300k emails, I’d suggest to drop the archived content, reinitialize the archive, fix the email headers where needed, the import them again.

  8. Janos SUTO repo owner

    Also note that the open source edition of piler has only limited support for multitenancy, and I’ve discontinued developing this feature. If you need multitenancy, then check the commercial edition for providers at https://mailpiler.com/

  9. Alexander Rieder reporter

    Hosting an email-archive for our customers as a service. We would like to host one instance for multiple clients

  10. Janos SUTO repo owner

    OK, then you definitely need multitenancy. Anyway, you should fix the email headers where necessary. The readpst command sometimes either can’t extract properly the email addresses in the headers or they are not present in the pst at all, and “mailer-daemon” is kind of a bogus placeholder. When you have fixed all emails, then you may either

    a) start over importing the emails. Then set ENABLE_SAAS=1 and RESTRICTED_AUDITOR=1, and see if you got what you expected

    or

    b) deploy the commercial version, and see what it can do for you

    I’m a little biased to option “b” :-)

  11. Alexander Rieder reporter

    Thanks for your fast response 😉

    what are the benefits of option b in regards to the 800$ per year compared to the option a?

  12. Log in to comment