Recreate metadata from store after complete loss of database

Issue #690 closed
Dean Pearson created an issue

Due to lack of understanding (and being an idiot) I wiped the database clean. I thought the re-indexing relied on just the files in the store and the key, not the metadata! :(

which moves us on to the question, is it possible to recreate the index with the store content key? Or have I lost 7.5 million emails that were in the archive? :(

Comments (23)

  1. Janos SUTO repo owner

    It was bad move. A very bad one. There's no ready to use tool to fix that. I assume you have no backup of the database, right?

  2. Dean Pearson reporter

    Hi,

    Thanks for responding so quickly. No backup unfortunately. Is there anything available to unencrypt the files within the store back to EML? even if it's manual we do have a couple of programmers working here that might be able to come up with something.

    Thank you

  3. Janos SUTO repo owner

    I'll reproduce what you have done, and try to come up with something usable. Put fingers crossed.

  4. Janos SUTO repo owner

    OK. You have lots of files under /var/piler/store/00 named as 400000.<somehexastuff>.m, eg. 4000000057508ca9263760ec0022a998e271.m. They are the message frame files without the attachments. (The attachments are stored as separate files, eg. 4000000057508ca9263760ec0022a998e271.a1, 4000000057508ca9263760ec0022a998e271.a2, ...

    Now run "pilerget 4000000057508ca9263760ec0022a998e271", and it should return the email (w/o attachements) decrypted and uncompressed. For emails having an attachment, you can see something like:

    --_004_BLUPR17MB00989D3A64DFF74BA4506C66E9740BLUPR17MB0098namp_
    Content-Type: image/png; name="image001.png"
    Content-Description: image001.png
    Content-Disposition: inline; filename="image001.png"; size=13105;
            creation-date="Fri, 13 May 2016 16:10:35 GMT";
            modification-date="Fri, 13 May 2016 16:10:35 GMT"
    Content-ID: <image001.png@01D1AD10.73AEE9E0>
    Content-Transfer-Encoding: base64
    
    ATTACHMENT_POINTER_4000000057508cd116380e9400f01e214626.a1_XXX_PILER--_004_BLUPR17MB00989D3A64DFF74BA4506C66E9740BLUPR17MB0098namp_--
    
    --_42cab280-2f75-499f-860b-f194d666c902_--
    

    The next task is to replace "ATTACHMENT_POINTER_4000000057508cd116380e9400f01e214626.a1_XXX_PILER" with the actual attachment. Run the following:

    pileraget 4000000057508cd116380e9400f01e214626 1

    If you are lucky, it returns the attachment. If it returns nothing, then it means that it was deduplicated, so we need the attachment table to query the original piler id, but we can't, so I have a hard time to think of any workaround about it.

    You have 3 options:

    • as bad as it is, you drop the whole archive, and act like you've just deployed piler

    • you drop the whole archive, and try to reimport those 7.5M emails. And while pilerimport is running, perhaps you should write it to a table a hundred times that I never ever touch the mysql database.

    • get a script or utility that iterates over all the directories, reads and parses all the emails, and fills the metadata, attachment and rcpt tables. In other words: try to salvage as much as you can. Note that you have to accept some losses of attachments, ie. some (I can't tell how many) emails without some/any attachments. I mean the attachments are there on the disk, but there's no way of telling (except for the first message with that specific attachment) that which other email contains that particular attachment, too (without the attachment table).

    Think about it, and if you pick option #3, and want me working on it, the please visit https://piler.io and obtain the 5 or the 10 incidents support package.

  5. Dean Pearson reporter

    Hi,

    Thanks for your response. I'm just wondering what the difference between the 5 / 10 incident support package will get us? Would it guarantee a utility that could populate the metadata or would it just allow you more time to 'try' and resolve it? I have to be able to justify the cost to my boss

    Regards

  6. Dean Pearson reporter

    Also - option two you mention re-importing the 7.5 million emails. Unfortunately the emails only live on mailpiler, they no longer exist on exchange. Unless there's a way of pointing pilerimport at the store?

  7. Janos SUTO repo owner

    OK, I understand your situation. I'm positive that the utility will (and not only try to) populate the metadata, rcpt and attachment tables, however, as I said, expect some messages where the attachment will be missing. Anyway, it's OK to postpone spending money on it until you see it in action. We may arrange a demo where you can see it in action on a few 10k messages.

  8. eXtremeSHOK

    @Dean the current eXtremeSHOK.com Piler Import Dir script will mass import millions of emails quick using upto 32 concurrent imports.

    Even if the emails are compressed, in multiple subfolders(maildir), etc. so thats already done.

    https://www.youtube.com/playlist?list=PLraKhf7nOp9Lv8X9dcNZJiysTQiwZBaOD

    The code for the reconstruction is almost complete, once QA completes I will upload videos.

    Please note these scripts will only be available on piler.io

  9. Dean Pearson reporter

    Thanks both..

    I'll wait on the completion of the code. Would 5 support points be enough to get access to the code?

    Regards

  10. Dean Pearson reporter

    Good Morning,

    Just wondering how payment is made for the support tickets and if credit card is ok? And to confirm the function of the software will be able to re-import emails from the encrypted stored data of mailpiler, which the video suggests.

    Thanks

  11. eXtremeSHOK

    I will update here once i am happy with the code. Those videos are for mass importing of message files.

    There is allot of reverse engineering required to deal with the piler store. ie. the files need to be unencrypted and decompressed.

    Pretty much reverse engineered the c++ code which deals with message storage into bash(command line) which means its takes 0.006s to decrypt and uncompress a message from the store.

    I'm working on a live archive of 5.5 million emails, if the code is not optimized it will be too slow to be of any use with millions of messages

  12. eXtremeSHOK

    Actual recovery of emails with non duplicated attachments is done.

    It seems like I might be able to recover the missing/de-duplicated attachments as well.

    ################################################################################
    Creating Temporary Directory: /tmp/import_tmp/xshok_piler_import/
    ====================
    Runtime Options
    ====================
    maildir /var/piler/store/
    enable_logging 0
    logfile
    temp /tmp/import_tmp/xshok_piler_import/
    quiet 0
    continue 0
    continuefile
    fix 0
    jobs 32
    number 0
    skip 0
    ----------------------------------------------------
    -------------------|| S T A R T E D ||--------------
    ----------------------------------------------------
    ----------------------------
    Directory: /var/piler/store/
    ----------------------------
    found in /var/piler/store/00/54e/35/d6/4000000054ee7e6c15453e6c005eb90435d6.m
    found in /var/piler/store/00/54e/a9/57/4000000054ee7e6c2658893400218909a957.m
    found in /var/piler/store/00/563/38/1a/40000000563adab238537cac00215863381a.m
    
  13. eXtremeSHOK

    Update: Exported / recovered mail is now 100% binary and content identical

    [root@archive ~]# bash test.sh 4000000057472c5a083dcf540089af6327d2
    Found Message: /var/piler/store/00/574/27/d2/4000000057472c5a083dcf540089af6327d2.m
    Found Attachment: /var/piler/store/00/574/27/d2/4000000057472c5a083dcf540089af6327d2.a1
    Exported Message: /tmp/00/574/27/d2/4000000057472c5a083dcf540089af6327d2
    [root@archive ~]# pilerget 4000000057472c5a083dcf540089af6327d2 > /tmp/pilerget-output
    [root@archive ~]# diff -c /tmp/00/574/27/d2/4000000057472c5a083dcf540089af6327d2 /tmp/pilerget-output
    [root@archive ~]# cmp -b /tmp/00/574/27/d2/4000000057472c5a083dcf540089af6327d2 /tmp/pilerget-output 
    

    Code is running over night with 1million mails to identify issues/blockers

  14. Dean Pearson reporter

    Hi Extremeshok,

    Thanks for the update, looks like your making good progress! I'll keep watching here for when your ready and I'll purchase some support.

    Regards

  15. eXtremeSHOK

    Bulk Message + non de-duplicated attachment reconstruction is done, 10000 encrypted and compressed messages with attachments (with de-duplication), some messages have 13+ attachments tallying 20+mbyte. Note: no piler database is used for this example.

    Found Attachment: /var/piler/store/00/54e/6d/cd/4000000054ee880601bb041c00b211046dcd.a1
    Found Message: /var/piler/store/00/54e/6d/5a/4000000054ee8d172a4488a400fa4f666d5a.m
    Exported Message: /datastore/export/00/54e/6d/69/4000000054ee879d12f92b8c00ec92886d69
    Exported Message: /datastore/export/00/54e/6d/cd/4000000054ee880601bb041c00b211046dcd
    Found Message: /var/piler/store/00/54e/6d/5a/4000000054ee89471c071e14007d2aff6d5a.m
    ------------------- WAITING FOR JOBS TO COMPLETE --------------
    Found Message: /var/piler/store/00/54e/6d/5a/4000000054ee8b6b0da3ad24005ac1186d5a.m
    Found Attachment: /var/piler/store/00/54e/6d/48/4000000054ee834414c782ec00d814bd6d48.a5
    Exported Message: /datastore/export/00/54e/6d/5a/4000000054ee8d172a4488a400fa4f666d5a
    Exported Message: /datastore/export/00/54e/6d/5a/4000000054ee89471c071e14007d2aff6d5a
    Exported Message: /datastore/export/00/54e/6d/48/4000000054ee834414c782ec00d814bd6d48
    Found Attachment: /var/piler/store/00/54e/6d/5a/4000000054ee8b6b0da3ad24005ac1186d5a.a3
    Found Attachment: /var/piler/store/00/54e/6d/5a/4000000054ee8b6b0da3ad24005ac1186d5a.a4
    Exported Message: /datastore/export/00/54e/6d/5a/4000000054ee8b6b0da3ad24005ac1186d5a
    --------------------------------------------------------
    -------------------|| C O M P L E T E D ||--------------
    --------------------------------------------------------
    Total Msg: 10000 | Ignored: 0
    Runtime: 0:00:04:10 | Rate: 40.00 m/s | 2400.0 m/m | 144000 m/h | 3456000 m/d
    ##############################################
          Powered By https://eXtremeSHOK.com
    ##############################################
    

    Busy investigating recovery of de-duplicated/missing attachments.

    This will require a 2nd pass on the recovered mails, as the information it requires is only generated when the mails are being recovered.

  16. eXtremeSHOK

    Recovery of de-duplicated/missing attachments is done.

    14 HIT : 4 @ name : 027.JPG
    - Missing Message ID: 4000000054ee87023633c92c009bd21d4862
    - Missing Attachment ID: 4000000054ee87023633c92c009bd21d4862.a3
    - Found Attachment: /var/piler/store/00/54e/ea/2b/4000000054ee850e02b8674c006a4466ea2b.a3
    - Found Message: /datastore/export/00/54e/48/62/4000000054ee87023633c92c009bd21d4862
    - Pointr: 202
    - Key: ATTACHMENT_POINTER_4000000054ee87023633c92c009bd21d4862.a3_XXX_PILER
    = Injecting Attachment into message
    200 HIT : 39 @ name : 2015distrandlistpricelist.pdf
    - Missing Message ID: 4000000054ee83ae1c57227400f0f201a222
    - Missing Attachment ID: 4000000054ee83ae1c57227400f0f201a222.a3
    - Found Attachment: /var/piler/store/00/54e/b0/bc/4000000054ee83811b81679c00c16c9db0bc.a3
    - Found Message: /datastore/export/00/54e/a2/22/4000000054ee83ae1c57227400f0f201a222
    - Pointr: 342
    - Key: ATTACHMENT_POINTER_4000000054ee83ae1c57227400f0f201a222.a3_XXX_PILER
    = Injecting Attachment into message
    

    busy running the scripts against 5.5 million emails.

  17. eXtremeSHOK

    Script had to be split into 2 separate scripts. (1 which does message and found attachment recovery, the other does de-duplicated/missing attachment recovery).

    Still busy testing reliability with attachment recovery for de-duplicated/missing attachments.

    So far i have successfully recovered 1.7 million de-duplicated attachments.

    Accepted to your invite on skype..

  18. Log in to comment