query in piler import option

Issue #588 resolved
Arunmani Murugan created an issue

Dear jsuto, I have few queries in piler,

  1. I am going to import TB's of mail data from zimbra 8.x to mail piler 1.1.1 using pilerimport option but as of my practical observation it would take month time. How can I import these mail data with reduced time into mail piler?
  2. My mail data is 4 years older already as I am using zimbra from 2011 and I import it right now into piler, if I set retention policy to delete mails with 6 years. When the mails will get deleted in piler? any advice will be appreciable.. Thanks Arun..

Comments (14)

  1. eXtremeSHOK

    I wrote a multi-threaded shell script for piler import which we used for migrating from TB's of data from atmail archive to mailpiler, took around 20 hours instead of 4months.

    First we exported all the entire archive of individually compressed emails to /datastore/import and ran the script against this dir, mainly to remove network and protocol latencies.

    1. Data will first be imported, a separate process will run the purge against the older than 6year emails.
  2. Arunmani Murugan reporter

    Dear extremeshok,

    Thanks for the reply, Anyhow am not importing over network it is just from my local HDD as i copied all zimbra mail data to piler machine and even now also it seems taking month time. and regarding my 2nd query if I import four years older data now into piler then what will be the timestamp of those mails in piler? If piler consider those mails with current time when i import mails, then those 4 years older mails will purge after (6+4)10 years only... can you please clarify this...!!

  3. eXtremeSHOK

    Date of the email. Not the import date.

    You'll need custom written scripts to speed the import, note speed is limited by CPU and memory.

  4. Arunmani Murugan reporter

    Dear extremeshok, No luck for me... Pls can you share me the script.. Thanks in advance..

  5. Kanthanathan S

    Hi extremeshok

    can you give some tips on how to create this multithreaded script ... It will be very useful, since we are also having some similar requirement.

    Thanks in advance

  6. Janos SUTO repo owner

    Pilerimport is indeed a single threaded utility allowing you to import one email at a time. However, you can run it in parallel with more instances, eg. pilerimport -d dir1, pilerimport -d dir2, pilerimport -d dir3, etc.

    To speed up the import process make sure you have enough CPU power, a fine tuned mysql server with big enough buffers, and a fast disk.

    Also one other option is to turn off processing attachments provided that indexing attached files is not important. I'll add an option allowing you to disable attachment processing.

  7. Kanthanathan S

    Thanks Janos for the reply.

    Can we temporarily stop indexing, as we have not released the piler server for users yet and then once import is complete, can we start the indexing. Will that overall help

  8. Arunmani Murugan reporter

    Dear Janos,

    I tried running multiple dir with piler import option but import working for last directory alone. this is the command i passed, su - piler pilerimport -d dir1, pilerimport -d dir2, pilerimport -d dir3

    I separated mail data into three directories dir1 dir2 dir3 but dir3 alone imported... any advise on this pls..

  9. Janos SUTO repo owner

    You didn't think you should type "pilerimport -d dir1, pilerimport -d dir2, pilerimport -d dir3", did you? I meant to issue these 3 commands in 3 terminals one command in one terminal, eg.

    pilerimport -d dir1
    pilerimport -d dir2
    pilerimport -d dir3
    

    Btw. the option I mentioned before already exists: extract_attachments, set it to 0 in piler.conf to disable attachment processing, if it's a viable option for you.

  10. Log in to comment