Pilerimport marking some emails as false duplicates.
I am trying to import messages we have stored in PST files however after running readpst the majority of them are marked as duplicate. When I search for them in Piler I either find them, find parts of the email chain or find nothing at all.
Is this caused by something in Exchange 2003? Possibly a bad use of pilerimport on my end? Some of these are from an old server which has been decommissioned a few years ago.
Here is a sample header:
From: "*" <MAILER-DAEMON> Subject: RE: PORVs To: *** Date: Mon, 23 Dec 2013 15:41:20 +0000 Message-Id: EA6B9842E2B4474581863C3A8A919E670162744259@********.com
When I run pilerimport -e on this file I get
duplicate: 14 (id: 400000005669eabb352cbd54004668b0792b)
Comments (12)
-
repo owner -
Some pst conversion utilities run in a demo/unlicensed mode and can make all the headers identical.
Please check your utility.
-
I ran that select and it returned an empty set.
I'll look into the utility that was used to create the archives.
-
repo owner Try another query:
select * from metadata where id=14;
-
That returned a completely different email. Is piler using the Message-ID field of the header or something else?
-
repo owner When pilerimport runs into a duplicate, it prints or syslogs the already archived id. Note that the parser uses the Message-ID to detect duplicate emails.
-
I have one message-id as MAIL12TRqf69aZbLAX300000030@mail1.corp.nattcompany.com
the other is EA6B9842E2B4474581863C3A8A919E670162744259@mail1.corp.nattcompany.comWould there be problems occuring because of the use of ExMerge to make the PSTs? These are both from our old mail server so I'm not sure how this problem could be occuring.
-
I thought microsoft replaced ExMerge with Import-Mailbox and Export-Mailbox ?
Of those 2 messages what are the the headers of both of them.
-
Our server is 2003.
The one I'm trying to add is:
From: "*" <MAILER-DAEMON> Subject: RE: PORVs To: Chris Tancock Date: Mon, 23 Dec 2013 15:41:20 +0000 Message-Id: EA6B9842E2B4474581863C3A8A919E670162744259@mail1.corp.*****.com X-libpst-forensic-sender: /O=NA/OU=/CN=RECIPIENTS/CN=** MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="--boundary-LibPST-iamunique-232012881_-_-"
----boundary-LibPST-iamunique-232012881_-_- Content-Type: text/plain; charset="windows-1252"
Below is what I got from Piler with that query, I'm not sure which pst it came from.
+----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+ | id | from | fromdomain | subject | spam | arrived | sent | retained | deleted | size | hlen | direction | attachments | piler_id | message_id | reference | digest | bodydigest | vcode | +----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+ | 14 | monitor1@corp.*.com | corp.*.com | MM: High temperature threshold violation for 'Rear T&H' at 'Server Rack'. | 0 | 1448457020 | 1374105461 | 1595030261 | 0 | 1117 | 679 | 0 | 0 | 400000005655b34604af072c005d7a7e6516 | MAIL12TRqf69aZbLAX300000030@mail1.corp.****.com | | e5d10fb48129387ce36b2e7ce4023cb4e40b1beb6bda5140af47cccc32067928 | 7082b6b660a39616b985f212ebe0c876ae5b2b57e278fe97792721fc829c8d72 | 4910f7787d90e56503d1514ea61924dcf4ec824328af7525a0dcdcf98b476070 | +----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
-
repo owner How long is the message-id actually? I saw a somewhat similar problem at a company where outlook produced an insanely long message-id which was truncated when it was inserted to sql.
-
repo owner Any update on the matter? Any more messages marked as false positives?
-
repo owner - changed status to closed
No news is good news.
- Log in to comment
You clearly have some improper mail headers, observe the bogus From address. To check if the message is already archived, run the following sql query:
If you get the same message (metadata) back, then piler is right. Otherwise there must be something wrong.