pilerimport - email dates incorrect
I am running pilerimport on around 500,000 emails. These emails span over a period of around 10 years.
Some emails are being given the import date and others are picking up the correct date. Checking the message headers of an email that was given today's date, a date back in 2014 is showing.
One example of part of the headers is as follows (edited to remove identities):
Received: from XXXXXXXX.XXX.com (XXXXXXX.XXX.com [xxx.xxx.xxx.xxx]) by mydomain.co.uk (mydomain.co.uk) (Cipher TLSv1:-SHA:128) (MAILSERVER v1234) with ESMTP id XX50001433136.XXX for user@mydomain.co.uk; Thu, 05 Jun 2014 00:35:12 +0100
Received: from XXXXXXXX.XXX.com ([xxx.xxx.xxx.xxx]) by XXXXXXXX.XXX.com with ESMTP; 04 Jun 2014 23:00:01 +0000
Comments (8)
-
repo owner -
reporter I think that is the cause of the issue, then. Some of the emails being imported do not have the Date: header, so they will be being imported with today as a default date. This may be because they are particularly old and the Mail server at the time did not record the Date: header. Perhaps when this is the case, could the Received: header line not also be parsed for a date/time? I can see that the RFC indicates that the origination date of an email is required, but perhaps when these emails were sent, it was not a requirement.
Sorry for classing this as a "bug" when it may well be more appropriate for it to be a feature request.
-
repo owner It's definitely possible, however I believe that it may be easier to fix those old emails, then fixing the parser. I'll give it a second thought later.
-
reporter Thanks. I probably will manipulate the emails in some way. If you could point me in the right direction with the parser, though, I may take a look at it myself. I Imagine that if I am experiencing the issue, other users may also experience it to and it makes sense to me to help you by modifying your code than doing it myself and not sharing the result.
-
repo owner Actually nobody has reported such an issue yet (besides you). The shown header indicates an email from 2014, and I believe that it's more like a buggy mailer than a valid excuse for an old email.
Anyway if you really want to fix the parser instead of fixing the emails and adding a proper Date: header, then check src/parser.c, and fix it to your needs. Basically you have to find the first Received: entry, which usually is a multiline header, and extract the RFC822 formatted date, put it to a buffer, then use
sdata->sent = parse_date_header(buf);
to pass it to the date parser function.
-
reporter Thanks for the suggestion.
-
reporter - changed status to closed
I have closed this now as I am quite happy to accept that it is not an issue with MailPiler - more an issue with the emails being presented.
-
Issue
#1053was marked as a duplicate of this issue. - Log in to comment
I need the Date: header line from the problematic email