Piler cannot define name of attachment in cyrrilic

Issue #254 resolved
Former user created an issue

Good day. Some attachment's names in cyrrilyc are not defined by piler. In WebGUI they are displaied as (null).

Pilertest output:

pilertest "/tmp/12/0000020810-Test.eml" locale: ru_RU.UTF-8 build: 836 parsing... post parsing... message-id: 855623385.462054.1391501089014.JavaMail.zimbra@test.com from: somebody somebody@test.com somebody test com (test.com) to: test test@test.com test test com (test.com ) reference: subject: Test body: Test * sent: 1391501089, delivered-date: 0 hdr len: 1769 body digest: 594e7ba193b4427a6d83ddd66b8f6d0f5fcce9e97b14b4a33b8ebdb26fd1f8b9 rules check: (null) retention period: 1612426603 i:1, name=(null), type: application/pdf*, size: 317856, int.name: /tmp/12/0000020810-Test.eml.a1, digest: b5eb708b53dd3b05d2e9643ae3bc4157017522f6c82083f9a0a6c267efd1b50c attachments:pdf, direction: 0 spam: 0

Linux vPiler 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Comments (8)

  1. Janos SUTO repo owner

    Hello, it appears that you have an unusually long filename. It's perfectly fine however, just the parser skips it, since it has a 128 byte long buffer to process the filename, and this email exceeds this limit.

    Do you think you have tons of email having such a long and legitimate attachment name?

  2. nektod

    Yes. Because, as i understand it, piler takes attachement's name in 'base64' encoding. And typical name looks like "=?UTF-?Q?=D1=82=D1=83=D1=80=D0=B3=D0=B5=D0=BD=D0=B5=D0=B2=D0=B0_189=2Ezip?=" But in cyrrilic this name consist of only 17 symbols including file extension.

  3. Janos SUTO repo owner

    You are right. The parser copies 128 bytes, then tries to decode it. The decoded result is indeed shorter then this limit, however the encoded raw text is longer. Anyway I'll update the parser to use a double sized buffer.

  4. nektod

    Good day. Unfortunately, the problem is not completely solved. The fact that some attachment names are split across multiple lines according RFC 2231. In this case, the parser tries to decode only the first line of filename, but this line isn't terminated by ; or ". So, the decoded result is returned as 'null'.

    For example,

    Content-Transfer-Encoding: base64
    Content-Type: application/vnd.ms-excel;
     name="=?UTF-8?Q?=D0=9A=D0=BE=D0=BF=D0=B8=D1=8F_=D1=84=D0=BE=D1=80=D0=BC?=
     =?UTF-8?Q?=D0=B0_=D0=BE=D0=B1_=D0=BE=D1=82=D0=BA=D1=80=D1=8B=D1=82=D0=B8?=
     =?UTF-8?Q?=D0=B8_=D0=BE=D0=B1=D0=BE=D1=81=D0=BE=D0=B1=D0=BE=D0=BA=2Exls?=
     =?UTF-8?Q??="
    Content-Disposition: attachment;
     filename*0*=UTF-8''%D0%9A%D0%BE%D0%BF%D0%B8%D1%8F%20%D1%84%D0%BE%D1%80%D0;
     filename*1*=%BC%D0%B0%20%D0%BE%D0%B1%20%D0%BE%D1%82%D0%BA%D1%80%D1%8B%D1;
     filename*2*=%82%D0%B8%D0%B8%20%D0%BE%D0%B1%D0%BE%D1%81%D0%BE%D0%B1%D0%BE;
     filename*3*=%D0%BA.xls
    
  5. Janos SUTO repo owner

    Thanks for pointing it out, I've just committed a fix to solve this issue. Please try it with the latest master branch.

  6. Log in to comment