Duplicate Mails after Import

Issue #1177 resolved
Saul Godman created an issue

Hi,

I have the following problem that some imported mails are listed twice. These also have identical message IDs.

I have now run the following scripts manually, unfortunately without success:

indexer.main.sh
indexer.delta.sh

What could be the reason?

Piler Version: 1.3.11 build 1001
Sphinx: Sphinx 3.3.1 (commit b72d67b)

Comments (41)

  1. Janos SUTO repo owner

    How is that possible? The metadata table has a constraint to prevent the same message-id in the metadata table.

    So take a message-id that has multiple emails associated in the search results, and verify that this given message-id is in the metadata table in multiple rows.

  2. Saul Godman reporter

    Sorry i had no time to answer. Unfortunately, the error still exists after retesting.

  3. Janos SUTO repo owner

    Can you show me the arrived column as well for these messages?

    Anyway, I was wrong. The table itself has no such constraint, the piler code tries to prevent such duplication.

    Also try setting mmap_dedup_test=1 in piler.conf, then restart the piler daemon.

  4. Janos SUTO repo owner

    OK, one more query please:

    select id, arrived, piler_id, message_id from metadata where id>=2982 and id<=2987;

  5. Janos SUTO repo owner

    Thank you. I suggest to enable mmap_dedup_test=1 in piler.conf, then let’s see if you keep finding such duplicates from now on.

  6. Saul Godman reporter

    Now I'm confused. If I set mmap_dedup_test = 1, it no longer imports any email. If I set mmap_dedup_test = 0, everything is suddenly ok after a new import and no duplicates can be seen.

    Why? :D

    I clean up my Installation before set mmap_dedup_test=1

  7. Janos SUTO repo owner

    Sorry, I was wrong. This feature is designed for the piler daemon. Now I understand that you are running pilerimport. However, I’m still confused. The pilerimport utility processes the emails sequentially. I still don’t get it how it fails to recognize already archived emails.

  8. Saul Godman reporter

    Yes i use pilerimport in a docker container.

    I have now set everything up again, and unfortunately I have the same problem.

  9. Janos SUTO repo owner

    Assuming you have a few thousands of emails to test with let’s try one more thing. Import emails one at a time, eg.

    for i in *.eml; do pilerimport -e $i; sleep 1; done
    

  10. Saul Godman reporter

    Just for info, we import the mails via IMAP and pilerimport.

    Example:

    pilerimport -i imap.my-server.com -u imap-mail@my-server.com -p '<PASSWORD>' -P 993 -f <FOLDER_WITH_MAILS> -r
    

  11. Saul Godman reporter

    Okay when only download the mails with pilerimport -o i do not get .eml files but .txt files like "13303-imap-2033.txt"

  12. Saul Godman reporter

    Yes but the command pilerimport -e works only with .eml files or am I misunderstanding you?

  13. Janos SUTO repo owner

    Well, not really, I’m afraid. I just can’t reproduce the issue. I have a similar test environment also in docker, running pilerimport, and it properly detects duplicates:

    Cipher: TLS_AES_256_GCM_SHA384
    List of IMAP folders:
    => '"INBOX" [\HasNoChildren]'
    skipping => '"[Gmail]" [\HasChildren \Noselect]'
    => '"[Gmail]/All Mail" [\All \HasNoChildren]'
    => '"[Gmail]/Drafts" [\Drafts \HasNoChildren]'
    => '"[Gmail]/Important" [\HasNoChildren \Important]'
    => '"[Gmail]/Sent Mail" [\HasNoChildren \Sent]'
    => '"[Gmail]/Spam" [\HasNoChildren \Junk]'
    => '"[Gmail]/Starred" [\Flagged \HasNoChildren]'
    => '"[Gmail]/Trash" [\HasNoChildren \Trash]'
    processing folder: "[Gmail]/Spam"... found 0 messages
    processing folder: "[Gmail]/All Mail"... found 12 messages
    Syntax Error: Couldn't find trailer dictionary
    Syntax Error: Couldn't find trailer dictionary
    Syntax Error: Couldn't read xref table
    processed:      12 [100%]
    processing folder: "[Gmail]/Sent Mail"... found 0 messages
    processing folder: "[Gmail]/Important"... found 9 messages
    duplicate: 764-imap-13.txt (duplicate id: 2897)
    duplicate: 764-imap-14.txt (duplicate id: 2898)
    Syntax Error: Couldn't find trailer dictionary
    Syntax Error: Couldn't find trailer dictionary
    Syntax Error: Couldn't read xref table
    duplicate: 764-imap-15.txt (duplicate id: 2899)
    duplicate: 764-imap-16.txt (duplicate id: 2902)
    duplicate: 764-imap-17.txt (duplicate id: 2903)
    duplicate: 764-imap-18.txt (duplicate id: 2905)
    duplicate: 764-imap-19.txt (duplicate id: 2906)
    duplicate: 764-imap-20.txt (duplicate id: 2907)
    duplicate: 764-imap-21.txt (duplicate id: 2908)
    

  14. Janos SUTO repo owner

    Btw. I’m not sure if those 2-3000 emails are sensitive or not, but if it’s possible to give access to that mailbox, then I could test with it.

  15. Saul Godman reporter

    Ok no unfortunately that does not work with the access to our mailbox.

    Then two other questions about the commands:

    1. Is it possible to achieve throttling through the limit (-b, -s)?
    2. Is there a debug we can run? Then we could send it to you.

  16. Janos SUTO repo owner

    You may try tweaking the -b and -s options. Also you may try setting verbosity=5, so pilerimport will syslog much more. Or perhaps I may add an option to pilerimport to wait a few milliseconds after importing each email.

  17. Saul Godman reporter

    The option that you always wait a few ms would of course be great. Maybe you can build it in. At least it would solve the problem.

  18. Saul Godman reporter

    Sorry I have to open again.

    After I have now set the -Z parameter, he suddenly imports only 1000 of 2000 mails. I tried again and also changed the -Z parameter but unfortunately without success.

  19. Janos SUTO repo owner

    How adding a small delay before importing an individual email would cause not importing half of the emails?

  20. Saul Godman reporter

    That's a good question... I'll import again today without the Z parameter and see if the 2000 mails are imported again.

  21. Saul Godman reporter

    Hey I did it without the -Z parameter and only 1008/2058 mails were imported. My old Docker container was based on “piler-1.3.11.tar.gz” so the import worked and 2058 mails are imported. Now I pull the source code directly in the Docker project. Has anything changed at the source?

  22. Janos SUTO repo owner

    Yes, since the project is developing, there are changes. However, I don’t think there anything that should explain why only half of the emails are imported. I’m not sure if you have half of the emails as duplicates. Anyway, I can’t help you unless you provide me a similar mailbox with roughly the same amount of emails with some duplicates to see the issue for myself. If you can’t or won’t create it for me, then good luck find the solution for yourself.

  23. Saul Godman reporter

    We don't blame you and are happy and grateful that you are developing this project further!

    Before we think about how we can grant you access to our e-mails in a privacy compliant manner, we would still like to briefly explain what we have done now.

    • In the MySql database, all 2000 mails can be seen in the metadata table
    • However, only the 1000 mails mentioned can be viewed in the Sphinx database

    Do you have another idea how we can debug this? The version from the download "piler-1.3.11.tar.gz" works without problems, and it is strange that we are the only ones who have this problem. We have changed nothing except the source code to the old working version.

  24. Saul Godman reporter

    We have now created another email account for you with one Mail which is not imported. This mail is processed but not displayed at the end in the Mailpiler client administration. Maybe you can see why this happens?

    The Mail has the message_id “<1593837997.519141283248938032.JavaMail.support@geotrust.com>“

    We copy the Mail with Webmailer to the folder “mailarchiv” after this we run the import command.

    <censored content>

  25. Janos SUTO repo owner

    I can’t see any email using this account. Btw. feel free to edit your previous message, and remove the password. You know, it’s a publicly accessible issue tracker.

  26. Saul Godman reporter

    Hello, there is a mail in the inbox that is not displayed by Piler and yes thats ok i change the password of the account later… i create it only for test for you 😉

  27. Log in to comment