piler uses received date instead of message sent date

Issue #179 resolved
datapharmer created an issue

We made some changes that required taking piler offline for a bit, which resulted in mail being held in the queue on exchange for a couple days. When the queue for piler was cleared and piler processed the messages they all show under the day they were imported from the mail handler even though the header and date show correctly in the messages (see attached). This makes messages difficult to search for by date as messages don't appear under the date they were sent/received by the end user.

Comments (10)

  1. Janos SUTO repo owner

    OK, I see your point. Btw. what if the message is in the future? Shall I just blindly accept it? Or shall I add some sort of safety that I accept the Date header data if it's max. 24 hours off the current time. Or perhaps I can use the same method for the past: accept the Date header if it's max. 168 hours off the actual time.

  2. datapharmer reporter

    Good question. As different users might have different requirements, my suggestion is a config option: use mail header datetime or use server datetime. You could include a max_offset=168 or something if you want so it will correct for crazy dates, and if you do one week does seem like a good default; if that is too much work I would say the above server/source setting would be sufficient with a simple check that the datetime isn't completely invalid (corruption check).

    Thanks!

  3. datapharmer reporter

    oh, and just to be clear, I would apply the max offset (if enabled) to both past and future as the problem could simply be a misconfigured date/time on the mailpiler server, which would cause "future" dates.

  4. Janos SUTO repo owner

    OK, it seems reasonable to use the Date: header, and I'll add a sanity check that the timestamp extracted from the Date header must be within +/- 168 hours of server time otherwise it will discard the Date header value, and use the actual server time. Is it OK?

  5. Janos SUTO repo owner

    OK, just committed to the master branch. So the parser always takes the parsed Date: header. However the piler daemon allows +/-1 week drift from the current local time. In doubt it uses the actual time.

  6. datapharmer reporter

    will reindexing correct the date or is there a different/better way to get the dates updated for existing archives?

  7. Janos SUTO repo owner

    Not completely. Reindexing updates the sphinx database only that is used for searching. However it doesn't touch the metadata table (it also holds dates). I recommend to reindex 1-200 problematic messages, then check the search results in the gui.

    You can give a numeric range (=serial numbers) for reindex to take. You can get the serial numbers from the metadata.id column.

  8. datapharmer reporter

    Thanks for the info. As you suspected the dates in search results are still inconsistent with the search query even after a reindex. Is there anything else I can try other than reimporting that might fix the date search?

  9. Log in to comment