Emails disappear in WebGUI after second partial Reindex and Indexer

Issue #998 resolved
MS created an issue

Hello,

I have another question:

I try to restore our piler Emails after adding additional disk space.
The first reindex is: nohup reindex -f 1 -t 300000
after that I did a nohup /usr/local/libexec/piler/indexer.delta.sh and a nohup /usr/local/libexec/piler/indexer.main.sh
So far everything worked fine and the webGui showed 275298 Emails.
After that I did a nohup reindex -f 300001 -t 600000 and then again a nohup /usr/local/libexec/piler/indexer.delta.sh and a nohup /usr/local/libexec/piler/indexer.main.sh
After that it only showed 190567 Emails in the WebGui.
Could you give me instructions on how to properly restore all emails when i can’t restore all of them at once, so I can do it in batches?

Thanks!

Comments (10)

  1. Janos SUTO repo owner

    Hello. You apparently did it right. Try the following:

    mysql -h 127.0.0.1 -P 9306
    
    select * from main1;
    
    SHOW META LIKE 'total_found'
    

    The latter should reveal how many messages are in the main1 index. You should do this after each batch, and verify that you have more and more emails indexed.

  2. MS reporter

    So select * from main1 shows me 20 Rows and Show Meta like ‘total_found’ shows me 586.150, which would make somewhat sense (after reindexing the first 600.000) at this point the WebGUi showed the 190567 mails.
    I tested to reindex another 20.000 emails (reindex -f 600001 -t 620000) and the total_found increased to 614.730.
    However they are not shown in the WebGUI. The WebGUI now only shows 183233 at the moment.

  3. MS reporter

    So as Auditor it shows me 638310 Emails in the WebGui and the query is:
    Jul 15 09:37:10 mailarchiv01 piler-webui[19331]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('') ORDER BY sent DESC LIMIT 0,5000 OPTION max_matches=5000' in 0.21 s, 5000 hits, 638310 total found

    However I also tried it as user (who is added to all email addresses) and the query appears to be cut in the log as it stops in the middle of an Email-Address in the MATCH clause. But I think that is more a log-limitation than an actual issue.

  4. Janos SUTO repo owner

    Auditor user can see all emails, while users can see their own only. This should explain why a regular user can see less emails. I don’t see a bug here.

  5. MS reporter

    The User I use to test is added to all groups, so he should see all the emails. Also i don’t see any reason, why after each reindex the user sees less and less emails, without changing Access to groups for the user.

  6. Janos SUTO repo owner

    Try this: every time you finish processing a reindexed batch, let this user hit the search button, and notice the sphinx query, as well as the total hits (from the sphinx query, not the gui). When you have 5-10 such items on your list, then show me them.

  7. MS reporter

    Hi,
    I would love to do that, the problem is, that the query is too long for the log. It cuts off in the middle of the “Match” clause:

    Jul 15 11:08:29 mailarchiv01 piler-webui[24205]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH(' (@from testXsXcompanyXde| paymentXconfirmationXenXarchivXcompanyXde| premiumXrenewalXinfoXenXarchivXcompanyXde| renewalXconfirmationXenXarchivXcompanyXde| chargebackXenXarchivXcompanyXde| paymentXconfirmationXdachXarchivXcompanyXde| premiumXrenewalXinfoXdachXarchivXcompanyXde| renewalXconfirmationXdachXarchivXcompanyXde| chargebackXdachXarchivXcompanyXde| premiumXrenewalXinfoXatXcompanyXat| paymentXconfirmationXemXarchivXcompanyXde| premiumXrenewalXinfoXemXarchivXcompanyXde| renewalXconfirmationXemXarchivXcompanyXde| chargebackXemXarchivXcompanyXde| paymentXconfirmationXnlXarchivXcompanyXde| chargebackXnlXarchivXcompanyXde| paymentXconfirmationXruXarchivXcompanyXde| chargebackXruXarchivXcompanyXde| paymentXconfirmationXplXarchivXcompanyXde| chargebackXplXarchivXcompanyXde| paymentXconfirmationXnordicsXarchivXcompanyXde| premiumXrenewalXinfoXnordicsXarchivXcompanyXde| renewalXconfirmationXnordicsXarchivXcompanyXde| chargebackXnordicsXarchivXcompanyXde| paymentXconfirmationXitXarchivXcompanyXde| chargebackXitXarchivXcompanyXde| premiumXrenewalXinfoXfrXcompanyXfr| premiumXrenewalXinfoXfrXerXcompanyXfr| paymentXconfirmationXfrXcompanyXfr| paymentXconfirmationXfrXerXcompanyXfr| infoXnewsXcompanyXfr| infoXnewsXcompanyXfr| ServicesXPremiumXnewsXcompanyXfr| paymentXconfirmationXfrXarchivXcompanyXde| premiumXrenewalXinfoXFRXarchivXcompanyXde| renewalXconfirmationXfrXarchivXcompanyXde| chargebackXFRXarchivXcompanyXde| jeanXloumXcompanyXde| sabrinaXjeblaouiXcompanyXde| caroleXlevyXcompanyXde| paymentXconfirmationXesXarchivXcompanyXde| chargebackXesXarchivXcompanyXde| agatheXcontactXcompanyXnet| amandineXcontactXcompanyXnet| archivesXawXcompanyXnet| archivageXcontactXcompanyXcom| brunoXcontactXcompanyXnet| contactXcompanyXnet| julienXeventXcompanyXnet| infoX

    So I don’t see the end of the query in the log. is there any other place, where I could see it?

  8. Janos SUTO repo owner

    Well, it could be an issue even for sphinx as well. Check the limits of the select query length for sphinx, as well as the MATCH length limits. Feel free to check out the sphinx forum. However, wouldn’t it be way much simpler to use an auditor account for such purpose?

  9. Log in to comment