reindex does not reindex all mails

Issue #1030 closed
Torsten Kruse created an issue

Setup: piler 1.2.0 and sphinx 2.2.11-r2 (from gentoo portage)

Our sphinx index got broken some days ago. There were some remaining new and temp files and log contained segfaults.

So we tried to setup new index.

This was easy:

rm /var/piler/sphinx/*
indexer --all

Butreindex -adid not reindex all (about 4M mails), but only 20% (about 780.000).

Then we tried reindex with -f/-t parameters. (see details below)

I don’t understand, why from 1 to 500.000 there are no mails to put in index and why the others calls always don’t go up to 100% but stop before.

After the below processing we had 1254921 hits for empty search, although there are about 4M mails in archive (piler stats page).

Then we reactivated the cron jobs and some time later, there are again just about 780.000 mails found. This is the same number as “reindex -a” will process.

What should we do to get all mails in the search?

piler@archive02 /tmp $ reindex -f 1 -t 500000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 500000 -t 1000000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 1500000 -t 2000000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 2500000 -t 3000000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 3500000 -t 4000000 -p
processed:   312024 [ 62%]
put 312024 messages to sph_index table for reindexing
piler@archive02 /tmp $ /usr/libexec/piler/indexer.delta.sh
piler@archive02 /tmp $ /usr/libexec/piler/indexer.main.sh
piler@archive02 /tmp $ reindex -f 4500000 -t 5000000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 4000000 -t 4500000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 3000000 -t 3500000 -p
processed:   471449 [ 94%]
put 471449 messages to sph_index table for reindexing
piler@archive02 /tmp $ /usr/libexec/piler/indexer.delta.sh
piler@archive02 /tmp $ /usr/libexec/piler/indexer.main.sh
piler@archive02 /tmp $ reindex -f 2000000 -t 2500000 -p

put 0 messages to sph_index table for reindexing
piler@archive02 /tmp $ reindex -f 1000000 -t 1500000 -p

put 0 messages to sph_index table for reindexing

Comments (13)

  1. Janos SUTO repo owner
    piler@archive02 /tmp $ reindex -f 1 -t 500000 -p
    put 0 messages to sph_index table for reindexing
    

    It’s odd. I suspect that the sph_index table is pretty huge. Check select count(*) from sph_index;

    After running the delta indexer it should be emptied. Also make sure you have enough memory allocated for the mysql buffers. The below settings are suggested in the faq as a starting point. Feel free to set bigger buffers, big enough to support running through 500k messages.

    innodb_buffer_pool_size = 256M
    innodb_flush_log_at_trx_commit=1
    innodb_log_buffer_size=64M
    innodb_log_file_size=64M
    innodb_read_io_threads=4
    innodb_write_io_threads=4
    innodb_log_files_in_group=2
    innodb_file_per_table
    

    Finally it’s worth to backup the main index files as well.

  2. Torsten Kruse reporter

    The sph_index count is zero (or nearly zero when receiving new mails).

    I checked the mysql settings and had to increase log buffer and files from 48M to 64M. Also I added both thread params. Others were ok.

    Then I started a new try.

    I deleted sphinx folder content.

    Now as user piler I call: indexer --all

    It runs through main1 to note1 without finding anything, just att1 will collect about 1M documents (75 MB).

    Then “reindex -a” will process the known mails (about 780.000) which is btw the sum of 312024 and 471449 from the manual indexing in steps.

    So no progress with the changed mysql settings so far.

  3. Janos SUTO repo owner

    I suggest to remove the att1 index in sphinx.conf as it’s not that properly developed query, and it may slow down the indexer later.

    So if the mysql config is fine, then start over the process. Stop searchd, remove any index data just as you did before. Index a smaller chunk, eg 100k messages in one batch, then run the delta indexer as user piler, and make sure that reindex was able to put 100k messages to the sph_index table (unless some of them were deleted before), and that the dailydelta1 index actually contains 100k index entries. Then proceed to the next batch, ie. 100001 to 200000, and so on.

  4. Torsten Kruse reporter

    Ok, I removed att1 index definition (not the att source definition) , removed sphinx index files.

    Then a piler user I called “indexer --all”.

    Table sph_index is empty.

    Then I call:

    piler@archive02 /tmp $ reindex -f 1 -t 100000 -p
    
    put 0 messages to sph_index table for reindexing
    

    So again no messages there?

    A “select min(id) from metadata;” returns “1”. But perhaps I still did not understand from where the messages are read?

    When trying to reindex with 100k batches, the first message are processed between 3.000.000 and 3.100.000.

    piler@archive02 /tmp $ reindex -f 3000001 -t 3100000 -p
    processed:    71449 [ 71%]
    put 71449 messages to sph_index table for reindexing
    

    I don’t understand why 71449 and not 100000 but I’ll proceed.

    The same number of records is in sph_index.

    Now I start searchd again (because else indexerdelta will warn about failed to open searchd.pid and indices NOT rotated).

    Then I run delta indexer. (not main indexer)

    After that sph_index is 0 again.

    dailydelta1 contains 71449+17 records (maybe some stuff from normal processing before I reset?)

    delta1 contains the 71449 messages from above.

    Now I process from 3100000 to 3500000 (doesn’t look like smaller batches changes anything).

    This processed exactly the expected 400000 messages.

    Records in sph_index: 400000 / in delta: 0 / in dailydelta 71449

    Running delta indexer. (not main indexer)

    Records in sph_index: 0 / in delta: 400000 / in dailydelta 471449

    Now I process from 3500000 to 4000000.

    Processed records: 312773 / same count in sph_index

    Records in sph_index: 312773 / in delta: 400000 / in dailydelta 471449

    Running delta indexer. (not main indexer)

    Records in sph_index: 0 / in delta: 712773 / in dailydelta 784222

    Processing reindex with id above 4M will not find any messages.

    So in result we are back again at the 784222

    Finally a run main indexer which wills the 784222 to the main index table.

    The dailydelta is empty ofter that, but the delta still contains 312773 records - the count of the last re-index run.

    The piler UI shows me as auditor 1095995 messages (sum of main1 and delta).

    I’ll now activate the piler cron jobs again and expect the 312773 to vanish from delta and search result.

    So finally I’m back at 784222 mails.

    But piler health page says: “received messages: 4138659

    Table metadata contains: 3812773 records (the difference might be the mails filtered according to archiving rules)

    But what am I missing? Where are the other 2M mails in my search?

    Is there any possible reason why mails are not processed during indexing?

    Hoping for help… 🙂

  5. Janos SUTO repo owner

    When you look at the first 100k messages, can you retrieve any of them? Connect to the mysql database, and run

    select piler_id from metadata limit 10;
    

    Then try running pilerget ‘piler_id', and see if it can retrieve these messages. If it can’t then we know what the problem is. If it can, then I’d like to look around and see what’s going on. In this case see my email in piler -V output to discuss the details.

  6. Torsten Kruse reporter

    pilerget can retrieve this messages.

    When I search in piler UI for keywords from these messages, they are not found.

    So I’ll switch to your email.

  7. Torsten Kruse reporter

    Finally we found out, what is happening here: As we started to use piler, we had enable_folders=0 (or feature did not exist at all). During update to piler 1.2.0 wie enabled this feature. So all messages imported after the update had the required records in folder_messages while the older one did not have. The reindex tool selects the messages to reindex according to the enable_folders config value. The result: For the older messages with missing folder_messages record the reindex tool did not select a record and so did not process them for reindexing.

    We set enable_folders back to 0 und then it was possible to reindex all messages.

  8. Log in to comment