pilerimport stops working after random number of eml files

Issue #800 on hold
Matt Byrnes created an issue

I'm importing a very large number of eml files into mailpiler (have a few million that have been exported from another instance of piler... long story) and it will randomly stop importing files and starts spewing out the following for every file:

failed to import: <path_to>.eml (id: 4000000058c4cb8431b28ab400dbf9bc24cc) error importing: '<path_to>.eml'

Every single file will fail from that point forward and the processed count will stop progressing. I have the option to delete imported files turned on and it stops deleting files when it starts failing (thankfully!).

If I grab one of the files listed in the error and do a manual import of that file via

pilerimport -e /path/to/file.eml

it works fine, the email is imported and if i try again it comes up with a duplicate as I would expect.

If I CTRL+C to stop it running then immediately start running pilerimport again it works fine for another batch. It fails after random numbers of files each time, i've had it as low as around 1000 and as high as around 75,000.

I originally thought this was because I was importing from a mounted samba share but it also happens when I have the files on local disk.

I then looked at resources to see if it was memory leaking or something but it seems fine, using very little ram and about 15% CPU.

This is a fresh install on this machine, but the eml files were exported from a poorly built server that had some issues (I had a lot of emails fail verification and produce a 0b file on export, but I am just ignoring those).

This isn't a complete train smash, I just have to restart it, but it's annoying as I have a very large number of emails to get through and it always fails not long after I walk away and leave it running.

piler -V output: piler 1.2.0, build 952, Janos SUTO sj@acts.hu

Build Date: Sat Mar 11 20:55:33 AEDT 2017 ldd version: ldd (Ubuntu EGLIBC 2.19-0ubuntu6.9) 2.19 gcc version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) OS: Linux VEPILER01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Configure command: ./configure --enable-memcached --enable-tcpwrappers --enable-tweak-sent-time --with-database=mysql --localstatedir=/var --with-piler-user=piler MySQL client library version: 5.5.54 Extractors: /usr/bin/pdftotext /usr/bin/catdoc /usr/bin/catppt /usr/bin/xls2csv /usr/local/bin/ppthtml /usr/bin/unrtf /usr/bin/tnef

Thanks!

Comments (16)

  1. Janos SUTO repo owner

    Hmm, it's odd. How much resources your host have (cpu, memory, free disk space)? Is there any clue in the logs why it can't import emails from then?

  2. eXtremeSHOK

    Looks like your using a VM/container and the process is killed by the host for excessive resource usage.

    I've seen similar with clients in the passed. Our commercial import script does get around this.

  3. Matt Byrnes reporter

    Nothing specific in the logs, I was originally looking at mail.log but have noticed stuff in syslog, i'll see if I can get something solid next time it happens, although since I posted this I've imported a couple of hundred thousand messages without issue.

    This is a VM running on a cluster that is under very low load, I have 2 vCPU and 8gig of RAM assigned, 3TB partition for /var and I have a 350gb drive mounted to /var/piler/sphinx to separate that out.

    This is running on top of Hyper-V, The host is a 2x 10-Core Intel Proc with 384gig of RAM and doesn't have a lot of load on it at the moment so I doubt it's killing stuff in the VM.

    I'll keep an eye on it and see if I can get something useful out of the logs.

  4. eXtremeSHOK

    Give your VM 20cpu and 128gb ram.. see if you get further.

    On most VM's if you max out the CPU for extended periods, the host will kill/termiate the process.

  5. Janos SUTO repo owner

    Pilerimport runs as a single process, so there's no use to add countless of cpu cores. The same goes for memory. Pilerimport shouldn't need more memory than twice the message size (give or take).

    Since you reported that you had imported n x 100k messages, I think there must be something with the environment.

  6. Matt Byrnes reporter

    Yeah thats what I thought Janos, I didn't think more resources would help especially since the VM is running at less than 25% CPU. I tried running multiple imports (in separate source & temp folders in separate ssh sessions) to try and 'stress' the machine to reproduce. Currently I'm at 469,550 items on one of the runs, 170,000 on another and a run on a folder with around 400,000 items in it finished without a hitch... So why I was bombing out at single-digit-thousands earlier is beyond me. The load on either the host or the VM hasn't changed since this morning when it was happening regularly so I'm at a loss. I still have a lot more email to import so if the problem pops up again I'll pull the syslog entries and see if they shed any more light on the subject.

    Thanks for your help so far and Piler is a great product :)

  7. Janos SUTO repo owner

    It seems that you run out of available file descriptors pretty randomly. I suggest to group your emails to be imported in directories containing 10k files or so, then run pilerimport with -r option (=remove successfully imported files) - assuming you have the emails at another place as well. Then run pilerimport over these directories.

    Btw. can you try launching another VM with ubuntu 16.04 lts as well? I'm curious if the issue occurs on that platform as well.

  8. Matt Byrnes reporter

    I ended up writing a script that does a for-each on a directory and calls 'pilerimport -e <file>' on each file, then checks the exit code and if it's zero it moves the file (I was using -r originally but I didn't trust my script at first). It seems to be going OK, I'll have to keep an eye on it though.

    I built this on 16.04 originally but had a heap of issues that miraculously went away when I built it on 14.04. I did get everything up and running on 16.04 but I was getting emails coming through blank, I couldn't add retention or archiving rules no matter what I tried (I went through a heap of the articles on here and got a few steps forward but never all the way). I just figured there was something about PHP7 and the combination of stuff on 16.04 that didn't work as I've had quite a few apps not work properly on 16.04, so went back to 14.04 an it was perfect.

    When I get some time I'll spin up a test machine on 16.04 and see how it goes.

  9. Janos SUTO repo owner

    That's also a solution, though you'll fork a lot.

    Ubuntu 16 is officially supported, only I suggest to use mariadb 10.1 instead of the shipped oracle version of mysql 5.7. The latter has some interoperability issues with piler, while mariadb is just fine.

  10. Matt Byrnes reporter

    OK I'll give that a go. As a side note, I tried using the batch option/switch on pilerimport but it doesn't seem to work, not sure if it's only implemented on IMAP or something?

  11. Leszek Piatek

    I think I have similar issue but with difference I'm on I/O + network stressed machine (sentry + elastic apm) - dedicated not VM.

    Always after importing 13804 messages from volume I get "failed to import: error importing: " for any other file in directory.

    For me this works for now: find /path/to/emldir -type f -print0 | xargs -0 -L1 -I % pilerimport -e %

    piler 1.3.1, build 956, Janos SUTO <sj@acts.hu>
    
    Build Date: Thu Feb 8 12:46:43 CET 2018
    ldd version: ldd (Ubuntu GLIBC 2.23-0ubuntu10) 2.23
    gcc version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) 
    OS: Linux docker2.irynek.pl 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
    Configure command: ./configure --localstatedir=/var --with-database=mysql --enable-tcpwrappers
    MySQL client library version: 5.7.21
    Extractors: /usr/bin/pdftotext /usr/bin/catdoc /usr/bin/catppt /usr/bin/xls2csv /usr/local/bin/ppthtml /usr/bin/unrtf /usr/bin/tnef 
    
  12. Janos SUTO repo owner

    Any hints in the logs? Eg. mail log, kernel log, etc.

    Also, the issue seems to be consistent: check the 13804th message, and run pilertest against it to see if the parser can process it properly. Also try running pilerimport -e for this message only.

  13. Leszek Piatek

    I've successfuly imported maildir that was failing using: find /path/to/emldir -type f -print0 | xargs -0 -L1 -I % ionice -c2 -n7 pilerimport -e %

    In previous logs I had only those: failed to import: *.eml

    and errors/notices from pdf / excel files parser

    I didn't manage to check that 1304th message, because find + xargs solution just worked...

  14. Log in to comment