pilerimport - email dates incorrect

Issue #868 closed
Micro Data Systems Ltd created an issue

I am running pilerimport on around 500,000 emails. These emails span over a period of around 10 years.

Some emails are being given the import date and others are picking up the correct date. Checking the message headers of an email that was given today's date, a date back in 2014 is showing.

One example of part of the headers is as follows (edited to remove identities):

Received: from XXXXXXXX.XXX.com (XXXXXXX.XXX.com [xxx.xxx.xxx.xxx]) by mydomain.co.uk (mydomain.co.uk) (Cipher TLSv1:-SHA:128) (MAILSERVER v1234) with ESMTP id XX50001433136.XXX for user@mydomain.co.uk; Thu, 05 Jun 2014 00:35:12 +0100

Received: from XXXXXXXX.XXX.com ([xxx.xxx.xxx.xxx]) by XXXXXXXX.XXX.com with ESMTP; 04 Jun 2014 23:00:01 +0000

Comments (8)

  1. Micro Data Systems Ltd reporter

    I think that is the cause of the issue, then. Some of the emails being imported do not have the Date: header, so they will be being imported with today as a default date. This may be because they are particularly old and the Mail server at the time did not record the Date: header. Perhaps when this is the case, could the Received: header line not also be parsed for a date/time? I can see that the RFC indicates that the origination date of an email is required, but perhaps when these emails were sent, it was not a requirement.

    Sorry for classing this as a "bug" when it may well be more appropriate for it to be a feature request.

  2. Janos SUTO repo owner

    It's definitely possible, however I believe that it may be easier to fix those old emails, then fixing the parser. I'll give it a second thought later.

  3. Micro Data Systems Ltd reporter

    Thanks. I probably will manipulate the emails in some way. If you could point me in the right direction with the parser, though, I may take a look at it myself. I Imagine that if I am experiencing the issue, other users may also experience it to and it makes sense to me to help you by modifying your code than doing it myself and not sharing the result.

  4. Janos SUTO repo owner

    Actually nobody has reported such an issue yet (besides you). The shown header indicates an email from 2014, and I believe that it's more like a buggy mailer than a valid excuse for an old email.

    Anyway if you really want to fix the parser instead of fixing the emails and adding a proper Date: header, then check src/parser.c, and fix it to your needs. Basically you have to find the first Received: entry, which usually is a multiline header, and extract the RFC822 formatted date, put it to a buffer, then use

    sdata->sent = parse_date_header(buf);
    

    to pass it to the date parser function.

  5. Micro Data Systems Ltd reporter

    I have closed this now as I am quite happy to accept that it is not an issue with MailPiler - more an issue with the emails being presented.

  6. Log in to comment