pilerimport segfault
Hello Janos
Thanks for your work.
When converting a large number of emails, received the following error:
pilerimport[2024]: segfault at 0 ip 00007fbf4a58bad3 sp 00007ffdff140258 error 4 in libc-2.23.so[7fbf4a502000+1c0000]
Comments (13)
-
repo owner -
reporter with
mysql -V mysql Ver 15.1 Distrib 10.1.26-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
result 6864 messages imported without errors
*Incorrect string value: ... (errno: 1366)
and
Data too long for column 'body' at row 1* (errno: 1406)
but with other errors and segfault
pilerimport[15362]: error: helper: execl pilerimport[4408]: 40000000599d906c09a54aac002b3d701b54: error opening ... kernel: [ 3065.901507] pilerimport[4408]: segfault at 0 ip 00007f385d7bfad3 sp 00007ffce8c77f28 error 4 in libc-2.23.so[7f385d736000+1c0000]
-
reporter - edited description
-
repo owner My best bet is that for some reason the available file descriptors are consumed. Try the following: import only 1000 emails at a time, and let's see if it solves the problem. In the meantime I'll try to mass import a few 10k messages, and see what happens.
-
reporter I tried to import a different number of emails. I think the problem is not in the number of emails processed, but in the number of errors during processing or in the impossibility of processing a one email.
I also tried different versions of MySQL and MariaDB. I was getting errors mysql_stmt_execute error: Incorrect string value: '\xF0\x9D\x91\x83\xF0\x9D…' for column 'body' at row 1 (errno: 1366) with MySQL version 5.6 and 5.7 and MariaDB 10.2 (and this fix https://bitbucket.org/jsuto/piler/issues/709/mysql_stmt_execute-error-incorrect-string
ALTER DATABASE piler CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; ALTER TABLE piler.sph_index CONVERT TO CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci; ALTER TABLE piler.sph_index CHANGE body body TEXT CHARACTER SET utf8mb4, COLLATE = utf8mb4_unicode_ci;
did not help me) without this errors with only MariaDB 10.1
also I found the following error No permission for <XXX> I tried to fix it like here https://bitbucket.org/jsuto/piler/issues/723/no-permission-for
but
select `to` from v_messages where id=5062; Empty set (0.01 sec)
and I found the header of those messages
was
From: "John Freeman" <jf@censored.org> To: 'Mike Huch'
without e-mail address (fucking outlook)
-
reporter Nov 14 15:09:27 hamburg pilerimport[24013]: error: helper: execl Nov 14 15:09:27 hamburg pilerimport[24014]: error: helper: execl Nov 14 15:09:28 hamburg pilerimport[24016]: error: helper: execl Nov 14 15:09:28 hamburg pilerimport[24017]: error: helper: execl Nov 14 15:09:34 hamburg pilerimport[24061]: error: helper: execl Nov 14 15:09:53 hamburg pilerimport[24298]: error: helper: execl Nov 14 15:09:55 hamburg pilerimport[24311]: error: helper: execl Nov 14 15:09:55 hamburg pilerimport[24312]: error: helper: execl Nov 14 15:09:56 hamburg pilerimport[24314]: error: helper: execl Nov 14 15:09:56 hamburg pilerimport[24315]: error: helper: execl Nov 14 15:10:11 hamburg pilerimport[24442]: error: helper: execl Nov 14 15:10:11 hamburg pilerimport[24443]: error: helper: execl Nov 14 15:10:12 hamburg pilerimport[24445]: error: helper: execl Nov 14 15:10:12 hamburg pilerimport[24446]: error: helper: execl Nov 14 15:14:03 hamburg kernel: [ 1520.493364] show_signal_msg: 15 callbacks suppressed Nov 14 15:14:03 hamburg kernel: [ 1520.493369] pilerimport[23966]: segfault at 0 ip 00007fbdbb4efad3 sp 00007ffd8a084398 error 4 in libc-2.23.so[7fbdbb466000+1c0000]
again segfault, but pilerimport in version 1.3.1 better then in 1.3.0, pilerimport could import more messages before an error occurred
-
reporter I found what was wrong with if the attachment name of file contains a special character
Content-Type: application/msword Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*=utf-8''P;LAN%20Holden%204.doc; filename="P;LAN Holden 4.doc"
I fixed it in a .eml file and import everything without segfault
Could you have a fix parser for pilerimport?
I had to import manually one by one more than 20,000 files to find the problem.
Thank you again for your work.
-
repo owner Thanks for nailing the issue. That must have been hell of a troubleshooting. So just to clarify. When having the following line, you get a segfault:
filename*=utf-8''P;LAN%20Holden%204.doc;
However fixing such a line to the following solved it:
filename="P;LAN Holden 4.doc"
Please confirm it.
-
reporter Hello,
with segfault
Content-Type: application/msword Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*=utf-8''P;LAN%20Holden%204.doc; filename="P;LAN Holden 4.doc"
without segfault
Content-Type: application/msword Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*=utf-8''PLAN%20Holden%204.doc; filename="PLAN Holden 4.doc"
just without ";"
-
repo owner So a single attachment has 2 filenames in the Content-Disposition header field? That's new to me.
-
repo owner - attached 1.diff
Try this diff. It simply truncates the 2nd filename definition, and continues. Let me know if it works for you.
-
reporter Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP) https://tools.ietf.org/html/rfc6266
Thank you for patch! I will try to answer in the near future
-
repo owner - changed status to closed
No news is good news.
- Log in to comment
Hello. I need more information, eg. piler version, your environment, how you invoke pilerimport, any relevant syslog entries from piler, df -h, free -m output, etc. that can be helpful for the troubleshooting.