pilerexport based on search

Issue #504 closed
Scott Savarese created an issue

With the possibility of exporting 1000's of messages during legal discovery ("I need all mail that about the client"). I can put the search into the web browser, but would need a select all button to do it. Plus there is a search limit of 1000, which would make this a bit painful to do through the web page...

So, I'm wondering if it is doable to use pilerexport? The search may look like:

to:@clientdomain.com or from:@clientdomain.com or body:client or subject:client

Since it is out of band, I don't care how long it takes and having it output to a zip file would be huge.

Thanks, Scott

Comments (12)

  1. Janos SUTO repo owner

    Tough issue, indeed. Pilerexport doesn't feature such fancy search queries you may type in the gui, however we might be able to solve the problem. Firstly, note the number inside ( and ) in the middle horizontal bar showing the total number of hits sphinx is actually aware of. If it's a sane number (eg. <5000 or so), and you have sphinx 2.2.x, then it's easy to get more than 1000 hits, just edit config-site.php, and fix the MAX_SEARCH_HITS variable (make sure you have enough memory for php to support it).

    However in the longer term it may be a better option to develop an api or something that gives you back all sphinx ids matching the query, then smart up pilerexport to use such id list to find emails to be exported.

  2. Scott Savarese reporter

    If I update MAX_SEARCH_HITS, in the web page, how can I use that to export them all... Is there a "select all" button? Right now I'm only able to select 20 at a time (or whatever is my page size)

  3. Scott Savarese reporter

    That adds the button on the bottom, but increasing MAX_SEARCH_HITS is causing it to return nothing in the search. Even setting it to 1001 returns nothing in my searches. But 1000 produces output.

  4. Janos SUTO repo owner

    What sphinx version do you have? 2.2.x? For earlier releases you have to adjust sphinx.conf to support more hits.

  5. Scott Savarese reporter

    Still nothing. Do I need to do anything after I set it in config-site.php and /etc/sphinx/sphinx.conf? I tried running the 4 indexer cron jobs (2 shell scripts and 2 actual indexer commands) but nothing

    Found this in /var/log/maillog:

    #!
    Feb 17 08:16:19 server piler-webui[46437]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH('') ORDER BY `sent` DESC LIMIT 0,999999 OPTION max_matches=999999' in 0.01 s, 0 hits, 0 total found
    Feb 17 08:20:50 server piler-webui[46438]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE        MATCH('') ORDER BY `sent` DESC LIMIT 0,1000 OPTION max_matches=1000' in 0.03 s, 1000 hits, 789492 total found
    
  6. Janos SUTO repo owner

    I've improved the pilerexport utility to support -w option where you can specify the WHERE part of the sphinx query (just as the GUI issues the query!). And it's able to extract all matching messages. Be sure to have enough disk space.

    To try this feature, get the latest master branch.

  7. Log in to comment