Duplicate mails showing in piler
Dear jsuto,
We are importing our zimbra data(4.9TiB) into piler 1.1.1, while importing data,unfortunately the process interrupted And we started importing the data from the beginning again and the importing process successfully completed. Now, the issue is We are getting each single mail twice in auditor login.
How can I remove those duplicated mails? Kindly advice and guide me to resolve this issue.
Comments (26)
-
repo owner -
reporter Yes, I can see each mail twice in auditor and message count also doubled than actual. We are using below sphinx version,
-sh-4.1$ searchd -piler Sphinx 2.2.9-id64-release (rel22-r5006)
If I reset sphinx indices and reindex, will my mail data loss in piler ?
-
repo owner OK, sphinx version is fine. You emails won't be lost, you just recreate the index data used for searching.
-
reporter Dear jsuto, I have started re-indexer in piler machine. Will get back with its output.
-
repo owner ok, take your time
-
reporter Dear jsuto,
I have done reset sphinx indices and re indexed everything. still i can see same mail twice in auditor login i.e., actual mails 12000 but still it showing around 24000.
-
repo owner Show me the output of the following mysql query. Be sure to format it!
$mysql -u piler -p piler mysql> select id, message_id from metadata order by id asc limit 30;
-
reporter Dear jsuto,
FYI, The needed output in attachment
-
repo owner - changed status to invalid
Running the import twice is not a problem for good emails (with a valid message-id), because the parser can prevent the duplicate to be imported. However, pilerimport can't prevent the duplication for a repeated import for messages not having a message-id.
Probably one solution to the problem: reset the archive, drop all emails (see the FAQ for how), and start importing again. Make sure you don't run pilerimport twice on the same data. I believe that this should eliminate (most) duplicates. Anyway the missing message-id is a problem for sure.
-
reporter Dear jsuto,
Thanks for the suggestion. Re-do went well and it works.. seems issue with message id of actual mail data..
-
repo owner OK, I'm glad that you made it.
-
reporter - changed status to open
-
reporter Dear jsuto,
As all the import process is done I made change in always_bcc of my zimbra mail server to archive mails in piler. But after that I can see lot of duplicate mail in webui and I verified both the mails having same Message-ID so I ran "reindex -a" since no improvement.
One more issue am facing now.. I am not getting updated mails in piler webui. it is showing 1 day before mails only but it should display half an hour delayed mails as per piler cronjobs.
for your reference headers of both original and duplicate mail headers attached. Please help me to resolve this issue.
-
repo owner Let me take a look via teamviewer. If it's ok for you then contact me on skype (janos.suto).
-
reporter Dear jsuto,
I tried to communicate you on skype (janos.suto) but no response. my skype name is arunmani.murugan1
-
reporter Dear jsuto,
Thanks for your support , I have monitored the changes in webui for these days. Still we receive duplicate mails in piler webui after compiling src/message.c with changes. Please help to resolve.
-
repo owner Show me the related log entries for two duplicates.
-
reporter Dear jsuto,
As of today observation, duplication are much reduced..seems like it is fixed.. thanks a lot for the changes.. I can see (n-1) day delayed mails alone in piler webui so delayed to update you sorry for that. Pls tell me how to remove those already duplicated mails till last week. and help to view mails in webui instantly rather than 1 day delayed.
Thanks in Advance..
-
reporter Hello jsuto,
I can see mails in piler webui in normal interval now after changing sphinx conf. please help to delete already duplicated mails..
-
repo owner I'd like you to run the following query, and check how long does it take to return results:
select id, message_id, count(message_id) as count from metadata where deleted = 0 group by message_id limit 100;
-
reporter Hi,
Please find the query output,
-
repo owner I didn't ask for the query output, I asked how long did it take.
-
reporter It took less than 1 minute to display complete output
-
repo owner - changed status to wontfix
You stopped cooperating, thus the issue is closed.
-
reporter Dear jsuto,
Sorry for not updated the status. Issue seems like fixed after your last modifications. regarding discussions, I will update you soon in skype.
-
reporter - changed status to closed
Thanks a lot for all your support. Issue Resolved..
- Log in to comment
So you have 200 hits for every 100 messages? What sphinx version do you have? If that's the case, then reset the sphinx indices, then reindex everything. See the FAQ on how to do that.