Imported emails disappearing from search index / main1 index missing

Issue #1123 resolved
rient created an issue

Hi,

A few days ago I’ve installed Piler on a fresh ubuntu 18.04 server.

After I verified that the always_bcc forwarded emails from my mail server arrived at Piler and where visible for the auditor I started importing mails via imap. So far so good, many old emails where showing up in Piler.

What I didn’t notice directly was that imported emails disappear after a while.
They will no longer show up once the indexer.delta.sh was run again from cron.

Piler version

piler 1.3.9, build 998, Janos SUTO <sj@acts.hu>

Build Date: Fri Oct 30 13:31:58 CET 2020
ldd version: ldd (Ubuntu GLIBC 2.27-3ubuntu1.2) 2.27
gcc version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
OS: Linux mail-archief.regiobode.nl 4.15.0 #1 SMP Tue Jun 9 12:58:54 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-memcached
MySQL client library version: 10.5.6
Extractors: /usr/bin/pdftotext /usr/bin/catdoc /usr/bin/catppt /usr/bin/xls2csv /usr/local/bin/ppthtml /usr/bin/unrtf /usr/bin/tnef libzip

When the indexer.delta.sh is run manually it throws the following error.

/usr/local/libexec/piler/indexer.delta.sh
FATAL: failed to merge index 'delta1' into index 'dailydelta1': kbatch target 'main1' from merge source index must also be in destination index

Likewise indexer.main.sh shows

/usr/local/libexec/piler/indexer.main.sh
FATAL: failed to merge index 'dailydelta1' into index 'main1': failed to open /var/piler/sphinx/main1.sph: No such file or directory

It appears there’s a problem with my search index. The file /var/piler/sphinx/main1.sph is missing

Restarting the rc.searchd service gives similar results and logs this to syslog

Nov  3 12:29:29 mail-archief rc.searchd[26568]: stopping searchd
Nov  3 12:29:29 mail-archief rc.searchd[26570]: starting searchd . . .
Nov  3 12:29:29 mail-archief rc.searchd[26570]: [Tue Nov  3 12:29:29.688 2020] [26574] using config file '/usr/local/etc/piler/sphinx.conf'...
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Sphinx 3.3.1 (commit b72d67b)
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2001-2020, Andrew Aksyonoff
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Nov  3 12:29:29 mail-archief rc.searchd[26570]: listening on 127.0.0.1:9312
Nov  3 12:29:29 mail-archief rc.searchd[26570]: listening on 127.0.0.1:9306
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Sphinx 3.3.1 (commit b72d67b)
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2001-2020, Andrew Aksyonoff
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'main1'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: index 'main1': prealloc: failed to open /var/piler/sphinx/main1.sph: No such file or directory; NOT SERVING
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'main2'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: index 'main2': prealloc: failed to open /var/piler/sphinx/main2.sph: No such file or directory; NOT SERVING
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'delta1'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'tag1'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'main3'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'main4'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'dailydelta1'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: precaching index 'note1'
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: index 'main3': prealloc: failed to open /var/piler/sphinx/main3.sph: No such file or directory; NOT SERVING
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: index 'main4': prealloc: failed to open /var/piler/sphinx/main4.sph: No such file or directory; NOT SERVING
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: kbatch: index 'delta1': target 'main1' not found; kbatch not applied
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: kbatch: index 'delta1': target 'main2' not found; kbatch not applied
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: kbatch: index 'delta1': target 'main3' not found; kbatch not applied
Nov  3 12:29:29 mail-archief rc.searchd[26570]: WARNING: kbatch: index 'delta1': target 'main4' not found; kbatch not applied
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Sphinx 3.3.1 (commit b72d67b)
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2001-2020, Andrew Aksyonoff
Nov  3 12:29:29 mail-archief rc.searchd[26570]: Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

Listing /var/piler.sphinx does indeed show the files are missing.

ls -alh  /var/piler/sphinx/
total 200K
drwx------ 2 piler piler 4.0K Nov  3 12:35 .
drwxr-xr-x 9 piler piler 4.0K Nov  2 22:03 ..
-rw-r--r-- 1 piler piler    0 Nov  3 12:23 dailydelta1.spa
-rw-r--r-- 1 piler piler    1 Nov  3 12:23 dailydelta1.spd
-rw-r--r-- 1 piler piler    1 Nov  3 12:23 dailydelta1.spe
-rw-r--r-- 1 piler piler 1.3K Nov  3 12:23 dailydelta1.sph
-rw-r--r-- 1 piler piler    1 Nov  3 12:23 dailydelta1.spi
-rw-r--r-- 1 piler piler    8 Nov  3 12:23 dailydelta1.spj
-rw-r--r-- 1 piler piler    0 Nov  3 12:23 dailydelta1.spk
-rw-r--r-- 1 piler piler    0 Nov  3 12:29 dailydelta1.spl
-rw-r--r-- 1 piler piler    1 Nov  3 12:23 dailydelta1.spp
-rw-r--r-- 1 piler piler    0 Nov  3 12:35 dailydelta1.tmp.spa
-rw-r--r-- 1 piler piler  200 Nov  3 12:35 delta1.spa
-rw-r--r-- 1 piler piler  28K Nov  3 12:35 delta1.spd
-rw-r--r-- 1 piler piler    1 Nov  3 12:35 delta1.spe
-rw-r--r-- 1 piler piler 1.3K Nov  3 12:35 delta1.sph
-rw-r--r-- 1 piler piler  39K Nov  3 12:35 delta1.spi
-rw-r--r-- 1 piler piler 7.7K Nov  3 12:35 delta1.spj
-rw-r--r-- 1 piler piler    4 Nov  3 12:35 delta1.spk
-rw-r--r-- 1 piler piler    0 Nov  3 12:35 delta1.spl
-rw-r--r-- 1 piler piler  26K Nov  3 12:35 delta1.spp
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 note1.spa
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 note1.spd
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 note1.spe
-rw-r--r-- 1 piler piler  911 Nov  3 12:30 note1.sph
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 note1.spi
-rw-r--r-- 1 piler piler    8 Nov  3 12:30 note1.spj
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 note1.spk
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 note1.spl
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 note1.spp
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 tag1.spa
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 tag1.spd
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 tag1.spe
-rw-r--r-- 1 piler piler  910 Nov  3 12:30 tag1.sph
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 tag1.spi
-rw-r--r-- 1 piler piler    8 Nov  3 12:30 tag1.spj
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 tag1.spk
-rw-r--r-- 1 piler piler    0 Nov  3 12:30 tag1.spl
-rw-r--r-- 1 piler piler    1 Nov  3 12:30 tag1.spp

When and how should the main1 index be created? Why doesn’t it exist yet?

Should I run reindex? If so, which switches are needed?

Edit:

Scrolling way back in my terminal I found these messages after running make postinstall

Continue and modify system? [Y/N] [N] y

Creating mysql database... Done.
Writing sphinx configuration... Done.
Sphinx 3.3.1 (commit b72d67b)
Initializing sphinx indices... Sphinx 3.3.1 (commit b72d67b)
Copyright (c) 2001-2020, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/usr/local/etc/piler/sphinx.conf'...
indexing index 'main1'...
ERROR: index 'main1': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'main2'...
ERROR: index 'main2': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'main3'...
ERROR: index 'main3': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'main4'...
ERROR: index 'main4': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'dailydelta1'...
ERROR: index 'dailydelta1': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'delta1'...
ERROR: index 'delta1': reserved field name 'from'.
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'tag1'...
collected 0 docs, 0.0 MB
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
indexing index 'note1'...
collected 0 docs, 0.0 MB
total 0 docs, 0.0 Kb
total 0.0 sec, 0.0 Kb/sec, 0 docs/sec
Done.
installing cron entries for piler... Done.
installing keyfile (piler.key) to /usr/local/etc/piler/piler.key... Done.
Fix piler.conf path in pilerpurge.py
Making an ssl certificate ... Can't load /root/.rnd into RNG
139770627531200:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/root/.rnd
Generating a RSA private key
.......................................................................................................................++++
..........................................................++++
writing new private key to '/usr/local/etc/piler/piler.pem'
-----
Copying www files to /var/piler/www... Done.

Done post installation tasks.

Thanks in advance

Comments (6)

  1. Janos SUTO repo owner

    You need to set SPHINX_STRICT_SCHEMA to 1 in sphinx.conf. Then stop searchd, and run as user piler

     indexer --all --config /usr/local/etc/piler/sphinx.conf
    

    Then start searchd, and reindex everything.

  2. rient reporter

    Hi Janos,

    Funny thing is that SPHINX_STRICT_SCHEMA was already set to 1 in my sphinx.conf. I must have set it after the postinstall then.

    Ran the indexer and reindex commands but I now have an unrealistic message count in the webui.

    I removed all files from /var/piler/sphinx/

    Ran the indexer --all --config /usr/local/etc/piler/sphinx.conf command which created the missing index files.

    Started reindex -a

    While re-indexing the messages my connection broke so I restarted reindex -a in a screen session.

    After that I noticed the message count in the webui is totally off from the message count processed by the reindex command and shown on the admins health page. So I wiped the sphinx data again and ran the reindex -a command once more. Resulting in a webui message count about three times the real message count.

    How can I get the webui to show a realistic message count matching the real number of messages?

    Thanks again.

  3. Janos SUTO repo owner

    The message count reported by sphinx will eventually close to the real number of archived messages. The (initial) difference is because you have 3 index files defined for the query: main1,dailydelta1,delta1, and each’s count contributes to the aggregated number, and it somewhat distorts the number. As the indexed data grows the difference becomes less and less over time. I know, it’s somewhat confusing, but that’s how these joint indices lie to a certain degree.

    So let the reindex process complete, it will be fine, because you are doing it right.

  4. rient reporter

    In that case, I assume that this is only a deviating representation and indeed, in the course of time, the counters will become more accurate again.

    Thank you for your help Janos,

  5. Log in to comment