Piler cannot define name of attachment in cyrrilic
Good day. Some attachment's names in cyrrilyc are not defined by piler. In WebGUI they are displaied as (null).
Pilertest output:
pilertest "/tmp/12/0000020810-Test.eml" locale: ru_RU.UTF-8 build: 836 parsing... post parsing... message-id: 855623385.462054.1391501089014.JavaMail.zimbra@test.com from: somebody somebody@test.com somebody test com (test.com) to: test test@test.com test test com (test.com ) reference: subject: Test body: Test * sent: 1391501089, delivered-date: 0 hdr len: 1769 body digest: 594e7ba193b4427a6d83ddd66b8f6d0f5fcce9e97b14b4a33b8ebdb26fd1f8b9 rules check: (null) retention period: 1612426603 i:1, name=(null), type: application/pdf*, size: 317856, int.name: /tmp/12/0000020810-Test.eml.a1, digest: b5eb708b53dd3b05d2e9643ae3bc4157017522f6c82083f9a0a6c267efd1b50c attachments:pdf, direction: 0 spam: 0
Linux vPiler 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Comments (8)
-
repo owner -
Yes. Because, as i understand it, piler takes attachement's name in 'base64' encoding. And typical name looks like "=?UTF-?Q?=D1=82=D1=83=D1=80=D0=B3=D0=B5=D0=BD=D0=B5=D0=B2=D0=B0_189=2Ezip?=" But in cyrrilic this name consist of only 17 symbols including file extension.
-
repo owner You are right. The parser copies 128 bytes, then tries to decode it. The decoded result is indeed shorter then this limit, however the encoded raw text is longer. Anyway I'll update the parser to use a double sized buffer.
-
repo owner - changed status to resolved
OK, I've just updated the parser to fix this issue. Please try it.
-
Good day. Unfortunately, the problem is not completely solved. The fact that some attachment names are split across multiple lines according RFC 2231. In this case, the parser tries to decode only the first line of filename, but this line isn't terminated by ; or ". So, the decoded result is returned as 'null'.
For example,
Content-Transfer-Encoding: base64 Content-Type: application/vnd.ms-excel; name="=?UTF-8?Q?=D0=9A=D0=BE=D0=BF=D0=B8=D1=8F_=D1=84=D0=BE=D1=80=D0=BC?= =?UTF-8?Q?=D0=B0_=D0=BE=D0=B1_=D0=BE=D1=82=D0=BA=D1=80=D1=8B=D1=82=D0=B8?= =?UTF-8?Q?=D0=B8_=D0=BE=D0=B1=D0=BE=D1=81=D0=BE=D0=B1=D0=BE=D0=BA=2Exls?= =?UTF-8?Q??=" Content-Disposition: attachment; filename*0*=UTF-8''%D0%9A%D0%BE%D0%BF%D0%B8%D1%8F%20%D1%84%D0%BE%D1%80%D0; filename*1*=%BC%D0%B0%20%D0%BE%D0%B1%20%D0%BE%D1%82%D0%BA%D1%80%D1%8B%D1; filename*2*=%82%D0%B8%D0%B8%20%D0%BE%D0%B1%D0%BE%D1%81%D0%BE%D0%B1%D0%BE; filename*3*=%D0%BA.xls
-
repo owner -
assigned issue to
-
assigned issue to
-
repo owner Thanks for pointing it out, I've just committed a fix to solve this issue. Please try it with the latest master branch.
-
Thank you very much, jsuto. confirm that this issue is resolved. Excellent project.
- Log in to comment
Hello, it appears that you have an unusually long filename. It's perfectly fine however, just the parser skips it, since it has a 128 byte long buffer to process the filename, and this email exceeds this limit.
Do you think you have tons of email having such a long and legitimate attachment name?