Pilerimport marking some emails as false duplicates.

Issue #643 closed
Former user created an issue

I am trying to import messages we have stored in PST files however after running readpst the majority of them are marked as duplicate. When I search for them in Piler I either find them, find parts of the email chain or find nothing at all.

Is this caused by something in Exchange 2003? Possibly a bad use of pilerimport on my end? Some of these are from an old server which has been decommissioned a few years ago.

Here is a sample header:

From: "*" <MAILER-DAEMON> Subject: RE: PORVs To: *** Date: Mon, 23 Dec 2013 15:41:20 +0000 Message-Id: EA6B9842E2B4474581863C3A8A919E670162744259@********.com

When I run pilerimport -e on this file I get

duplicate: 14 (id: 400000005669eabb352cbd54004668b0792b)

Comments (12)

  1. Janos SUTO repo owner

    You clearly have some improper mail headers, observe the bogus From address. To check if the message is already archived, run the following sql query:

    select * from metadata where piler_id='400000005669eabb352cbd54004668b0792b';
    

    If you get the same message (metadata) back, then piler is right. Otherwise there must be something wrong.

  2. eXtremeSHOK

    Some pst conversion utilities run in a demo/unlicensed mode and can make all the headers identical.

    Please check your utility.

  3. Chris Tancock

    I ran that select and it returned an empty set.

    I'll look into the utility that was used to create the archives.

  4. Chris Tancock

    That returned a completely different email. Is piler using the Message-ID field of the header or something else?

  5. Janos SUTO repo owner

    When pilerimport runs into a duplicate, it prints or syslogs the already archived id. Note that the parser uses the Message-ID to detect duplicate emails.

  6. eXtremeSHOK

    I thought microsoft replaced ExMerge with Import-Mailbox and Export-Mailbox ?

    Of those 2 messages what are the the headers of both of them.

  7. Chris Tancock

    Our server is 2003.

    The one I'm trying to add is:

    From: "*" <MAILER-DAEMON> Subject: RE: PORVs To: Chris Tancock Date: Mon, 23 Dec 2013 15:41:20 +0000 Message-Id: EA6B9842E2B4474581863C3A8A919E670162744259@mail1.corp.*****.com X-libpst-forensic-sender: /O=NA/OU=/CN=RECIPIENTS/CN=** MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="--boundary-LibPST-iamunique-232012881_-_-"

    ----boundary-LibPST-iamunique-232012881_-_- Content-Type: text/plain; charset="windows-1252"

    Below is what I got from Piler with that query, I'm not sure which pst it came from.

    +----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+ | id | from | fromdomain | subject | spam | arrived | sent | retained | deleted | size | hlen | direction | attachments | piler_id | message_id | reference | digest | bodydigest | vcode | +----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+ | 14 | monitor1@corp.*.com | corp.*.com | MM: High temperature threshold violation for 'Rear T&H' at 'Server Rack'. | 0 | 1448457020 | 1374105461 | 1595030261 | 0 | 1117 | 679 | 0 | 0 | 400000005655b34604af072c005d7a7e6516 | MAIL12TRqf69aZbLAX300000030@mail1.corp.****.com | | e5d10fb48129387ce36b2e7ce4023cb4e40b1beb6bda5140af47cccc32067928 | 7082b6b660a39616b985f212ebe0c876ae5b2b57e278fe97792721fc829c8d72 | 4910f7787d90e56503d1514ea61924dcf4ec824328af7525a0dcdcf98b476070 | +----+-------------------------------+----------------------+---------------------------------------------------------------------------+------+------------+------------+------------+---------+------+------+-----------+-------------+--------------------------------------+----------------------------------------------------------+-----------+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+

  8. Janos SUTO repo owner

    How long is the message-id actually? I saw a somewhat similar problem at a company where outlook produced an insanely long message-id which was truncated when it was inserted to sql.

  9. Log in to comment