Duplicate mails showing in piler

Issue #609 closed
Arunmani Murugan created an issue

Dear jsuto,

We are importing our zimbra data(4.9TiB) into piler 1.1.1, while importing data,unfortunately the process interrupted And we started importing the data from the beginning again and the importing process successfully completed. Now, the issue is We are getting each single mail twice in auditor login.

How can I remove those duplicated mails? Kindly advice and guide me to resolve this issue.

Comments (26)

  1. Janos SUTO repo owner

    So you have 200 hits for every 100 messages? What sphinx version do you have? If that's the case, then reset the sphinx indices, then reindex everything. See the FAQ on how to do that.

  2. Arunmani Murugan reporter

    Yes, I can see each mail twice in auditor and message count also doubled than actual. We are using below sphinx version,

    -sh-4.1$ searchd -piler Sphinx 2.2.9-id64-release (rel22-r5006)

    If I reset sphinx indices and reindex, will my mail data loss in piler ?

  3. Janos SUTO repo owner

    OK, sphinx version is fine. You emails won't be lost, you just recreate the index data used for searching.

  4. Arunmani Murugan reporter

    Dear jsuto, I have started re-indexer in piler machine. Will get back with its output.

  5. Arunmani Murugan reporter

    Dear jsuto,

    I have done reset sphinx indices and re indexed everything. still i can see same mail twice in auditor login i.e., actual mails 12000 but still it showing around 24000.

  6. Janos SUTO repo owner

    Show me the output of the following mysql query. Be sure to format it!

    $mysql -u piler -p piler
    mysql> select id, message_id from metadata order by id asc limit 30;
    
  7. Janos SUTO repo owner

    Running the import twice is not a problem for good emails (with a valid message-id), because the parser can prevent the duplicate to be imported. However, pilerimport can't prevent the duplication for a repeated import for messages not having a message-id.

    Probably one solution to the problem: reset the archive, drop all emails (see the FAQ for how), and start importing again. Make sure you don't run pilerimport twice on the same data. I believe that this should eliminate (most) duplicates. Anyway the missing message-id is a problem for sure.

  8. Arunmani Murugan reporter

    Dear jsuto,

    Thanks for the suggestion. Re-do went well and it works.. seems issue with message id of actual mail data..

  9. Arunmani Murugan reporter

    Dear jsuto,

    As all the import process is done I made change in always_bcc of my zimbra mail server to archive mails in piler. But after that I can see lot of duplicate mail in webui and I verified both the mails having same Message-ID so I ran "reindex -a" since no improvement.

    One more issue am facing now.. I am not getting updated mails in piler webui. it is showing 1 day before mails only but it should display half an hour delayed mails as per piler cronjobs.

    for your reference headers of both original and duplicate mail headers attached. Please help me to resolve this issue. header1.PNG header2.PNG

  10. Janos SUTO repo owner

    Let me take a look via teamviewer. If it's ok for you then contact me on skype (janos.suto).

  11. Arunmani Murugan reporter

    Dear jsuto,

    I tried to communicate you on skype (janos.suto) but no response. my skype name is arunmani.murugan1

  12. Arunmani Murugan reporter

    Dear jsuto,

    Thanks for your support , I have monitored the changes in webui for these days. Still we receive duplicate mails in piler webui after compiling src/message.c with changes. Please help to resolve.

  13. Arunmani Murugan reporter

    Dear jsuto,

    As of today observation, duplication are much reduced..seems like it is fixed.. thanks a lot for the changes.. I can see (n-1) day delayed mails alone in piler webui so delayed to update you sorry for that. Pls tell me how to remove those already duplicated mails till last week. and help to view mails in webui instantly rather than 1 day delayed.

    Thanks in Advance..

  14. Arunmani Murugan reporter

    Hello jsuto,

    I can see mails in piler webui in normal interval now after changing sphinx conf. please help to delete already duplicated mails..

  15. Janos SUTO repo owner

    I'd like you to run the following query, and check how long does it take to return results:

    select id, message_id, count(message_id) as count from metadata where deleted = 0 group by message_id limit 100;
    
  16. Arunmani Murugan reporter

    Dear jsuto,

    Sorry for not updated the status. Issue seems like fixed after your last modifications. regarding discussions, I will update you soon in skype.

  17. Log in to comment