Use date from Received header if Date header is badly malformed

Issue #1195 resolved
Jörn Stein created an issue

While importing some previously exported mails (which piler had filed as being sent 1970/01/01) I noticed these mails have really bad Date: headers, sometimes only mentioning the month and day and not even the year. Let’s not start about time zones…

Date: 20091023171906
Date: Tue, 26 Aug   XXXXXXXX XXXXXX
Date: Mon, 03 Nov   XXXXXXXXXXXX
Date: Sat, 30 Dec 1899 00:00:00 +0300
Date: Sun, 24 Aug   XXXX (XWXXXX)
Date: Fri, 04 Jul   Xrodkowoeuropejski czas stand.
Date: $DATE
Date: 2/25/2015 12:32 PM
Date: Wed, 17 Sep   Jerusalem Standard Time
Date: Donnerstag 26-Dez-2002 01:31:03
Date: 20100122133637

I would propose to also use the earliest date from the Received headers and use it in place of the date header if they differ to much. That could look roughly like this (in parser.c / parse_line, after testing for “Received:”):

if (state->message_state == MSG_RECEIVED && state->is_1st_header == 1)
  split buf at ';'
  store last split component in new property state->received_date

when parsing the Date, compare it to state->received_date instead of sdata->now. If the difference 
is more than a few days, use the received date as Sent date.

Please excuse the pseudocode, my C is rather rusty, so I didn’t really try.

I think that would give a much better approximation of the message date. After all, the servers along the way adhere to the format standards much better than some random mail client…

Comments (4)

  1. Jörn Stein reporter

    That works very well, thank you for implementing it. Only thing I noticed is this:

    If the timezone in the received header is the same as my local timezone, it is fine:

    Received: from [192.168.64.228] (unknown [192.168.64.228])
            by hermes2.company.com (Postfix) with ESMTPSA id 862F224A0BBB;
            Thu, 15 Jul 2021 13:33:44 +0200 (CEST)
    
    piler.metadata.sent: 1626348824
    
    root@archive:/tmp#  date +"%Y/%m/%d %H:%M:%S %z" -d @1626348824
    2021/07/15 13:33:44 +0200
    
    root@archive:/tmp#  date +"%Y/%m/%d %H:%M:%S %z" -d @1626348824 -u
    2021/07/15 11:33:44 +0000
    

    But if the timezone is +0100:

    But if the timezone is +0100:
    
    Return-Path: <d544d0cb6abb24a077@textechno.com>
    Received: from murder ([unix socket])
             by hermes (Cyrus v2.3.8) with LMTPA;
             Thu, 08 Nov 2012 20:32:16 +0100
    
    piler.metadata.sent: 1352406736
    
    root@archive:/tmp#  date +"%Y/%m/%d %H:%M:%S %z" -d @1352406736
    2012/11/08 21:32:16 +0100
    
    root@archive:/tmp#  date +"%Y/%m/%d %H:%M:%S %z" -d @1352406736 -u
    2012/11/08 20:32:16 +0000
    

    So in this case the header timestamp does not match the epoch time in the database.

    I manually edited a test mail to have different time zones in the first received header, but the resulting 'sent' time was always off:

    root@archive:/var/piler/export/debug# pisql "select id,piler_id,sent from metadata where id=2670;"
    +------+--------------------------------------+------------+
    | id   | piler_id                             | sent       |
    +------+--------------------------------------+------------+
    | 2670 | 5000000060f82f130b2d75340064acf7e62e | 1225824720 |
    +------+--------------------------------------+------------+
    root@archive:/var/piler/export/debug# pilerget 5000000060f82f130b2d75340064acf7e62e | egrep "^Date:"
    Date: Mon, 05 Jan   XXXXXXXX XXXXXX
    
    root@archive:/var/piler/export/debug# pilerget 5000000060f82f130b2d75340064acf7e62e | head -5
    Return-Path: <tequilakid@yahoo.com>
    Received: from murder ([unix socket])
             by hermes (Cyrus v2.3.8) with LMTPA;
             Wed, 05 Nov 2008 02:52:00 +0900
    X-Sieve: CMU Sieve 2.3
    
    root@archive:/var/piler/export/debug# date +"%Y/%m/%d %H:%M:%S %z" -d @1225824720
    2008/11/04 19:52:00 +0100
    root@archive:/var/piler/export/debug# date +"%Y/%m/%d %H:%M:%S %z" -d @1225824720 -u
    2008/11/04 18:52:00 +0000
    

    Interestingly, when the Date: string is good and parsed, the timezone is handled correctly.

    Since the difference seems to be only an hour independent of the timezone in the received header, and since this only affects a small number of mails with malformed dates, I would consider the issue resolved as it is. Just wanted to point that out.

  2. Janos SUTO repo owner

    Thank you for the exhaustive research you made. Well, right. It’s kind of a workaround. Even in the case of a 1 hour drift it should work fine, because the Received: date parsing comes into play if the Date: header is garbage. Anyway, thanks for reporting and the suggested workaround.

  3. Log in to comment