pilerexport Speed for Large Archive

Issue #1258 resolved
Shades created an issue

I’ve been tasked with exporting over 70 million emails we have archived from 2010-2022,

After fixing what was left of a broken Piler setup, I managed to get Pilerexport to work and have been trying to export some of the emails we have. I did the math, and it would take over 100 days to have all of these emails exported.

I’m sorry if this is somewhere in the documentation, but is there any way to dedicate more resources to piler or the pilerexport command?

I’ve assigned 16 cores to the box but the CPU usage seems to stay at 5% with no increase in export speed from 4-16 cores.

Thank you!

Comments (6)

  1. Janos SUTO repo owner

    Unfortunately, pilerexport retrieves emails sequentially one after the other.

    So you have 13 years to export, then try the following. Create dirs like 2010, 2011, 2012, … 2022, and let them be owned by user piler. Now run pilerexport in these directories using the -a and -b options to specify the given year, eg. -a 2010.01.01. -b 2010.12.31 for directory 2010, -a 2011.01.01. -b 2011.12.31 for directory 2011, etc. So what you’ll basically do is to run 13 pilerexport processes in parallel giving you better overall speed.

  2. Shades reporter

    Got it, I started to that instead and it works!

    However, another issue that popped up was that I can’t do large chunks at a time, ex: I can’t do 2018/01/01 - 2018/12/31 (I was able to do yearly exports from 2010-2017) but, if I split it into quarters then pilerexport is able to run. Otherwise it just doesn’t do anything. Not really a big issue, but is that some sort of limitation due to the amount of files being exported? I think I may have to start going down into week by week exports as I’ve heard it got quite busy around 2018, so it’d be a lot easier if there was a way around that.

  3. Janos SUTO repo owner

    There shouldn’t be such limitation. I can imagine that you had much more emails archived in 2018 then in the previous years, so pilerexport needed more time to gather together what’s needed to be exported.

  4. Shades reporter

    I’m not sure why it’s behaving like that either, but I did look and the Piler version installed on this box is 1.1.1 from 09/25/2015…..

    It seems to run for a long time and then it just stops and I’m able to input another command. The only workaround was to split them up into smaller dates where it would begin exporting after a moment.

    However I’m perfectly fine doing the smaller dates, I ended up just setting a script to make it easier, I really appreciate the help!

    Thank you so much!

  5. Log in to comment