Option to ignore duplicates in pilerimport

Issue #714 closed
Jose M. Albarran created an issue

Hi, I'm using pilerimport to load a lot of really old e-mail backups from several old systems (pst, evolution mbox, thunderbird, evolution maildir, google mail), and, in the past, I migrated the mailbox.

So, now, when I use pilerimport for these backups, I get a lot of duplicates.

Can I have an option in pilerimport to ignore duplicates? Or, either, how can I delete the duplicates from storage?

Comments (8)

  1. Janos SUTO repo owner

    Pilerimport detects and ignores duplicates, just prints a warning that the email being imported is a duplicate. Problems usually arise when the email has no message-id. In that case when you rerun the import process, pilerimport archives it again, and assigns a bogus (internal) message-id to prevent the duplication detection to discard the email. So it's best to ensure before the import that all your emails have a unique message-id. If not, then make one.

  2. Jose M. Albarran reporter

    Hi,

    The FAQ is not clear (for me) in this case, because it says it stores duplicates for e-mail without message-id, and I assume that it does for all.

    I have the current situation (see attachment) in GUI. Are these real duplicates? Are consuming storage? Are all without message-id? How can I locate one to check?

  3. Janos SUTO repo owner

    The piler daemon requires a unique message-id, otherwise it discards the email as a duplicate. A duplicate email is discarded, not stored. Pilerimport on the other hand assumes that you want to import an email no matter what. even if it has no message-id. I hope it clarifies that.

  4. Jose M. Albarran reporter

    Crystal clear! thanks! But, which is the meaning of "duplicated message" (see image). Are they real/stored duplicated (with no message-id) or report-only (it tries but rejectect by duplication)?

  5. Janos SUTO repo owner

    Just as the name suggests: number of duplicated messages that hit the piler daemon so far. They are counters for statistics. Since piler deduplicates messages, and stores everything in 1 copy, duplicates are discarded.

  6. Log in to comment