pilerexport based on search
With the possibility of exporting 1000's of messages during legal discovery ("I need all mail that about the client"). I can put the search into the web browser, but would need a select all button to do it. Plus there is a search limit of 1000, which would make this a bit painful to do through the web page...
So, I'm wondering if it is doable to use pilerexport? The search may look like:
to:@clientdomain.com or from:@clientdomain.com or body:client or subject:client
Since it is out of band, I don't care how long it takes and having it output to a zip file would be huge.
Thanks, Scott
Comments (12)
-
repo owner -
reporter If I update MAX_SEARCH_HITS, in the web page, how can I use that to export them all... Is there a "select all" button? Right now I'm only able to select 20 at a time (or whatever is my page size)
-
repo owner Set this in config-site.php:
$config['ENABLE_DOWNLOADING_ALL_SEARCH_HITS'] = 1;
-
reporter That adds the button on the bottom, but increasing MAX_SEARCH_HITS is causing it to return nothing in the search. Even setting it to 1001 returns nothing in my searches. But 1000 produces output.
-
repo owner What sphinx version do you have? 2.2.x? For earlier releases you have to adjust sphinx.conf to support more hits.
-
reporter Looks like:
[root ~]# rpm -qa | grep sphin sphinx-2.0.8-1.el6.x86_64 [root ~]#
-
repo owner Then fix sphinx.conf as well to support more than 1000 hits.
-
reporter Still nothing. Do I need to do anything after I set it in config-site.php and /etc/sphinx/sphinx.conf? I tried running the 4 indexer cron jobs (2 shell scripts and 2 actual indexer commands) but nothing
Found this in /var/log/maillog:
#! Feb 17 08:16:19 server piler-webui[46437]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('') ORDER BY `sent` DESC LIMIT 0,999999 OPTION max_matches=999999' in 0.01 s, 0 hits, 0 total found Feb 17 08:20:50 server piler-webui[46438]: sphinx query: 'SELECT id FROM main1,dailydelta1,delta1 WHERE MATCH('') ORDER BY `sent` DESC LIMIT 0,1000 OPTION max_matches=1000' in 0.03 s, 1000 hits, 789492 total found
-
repo owner Did you restart searchd? Btw. I'd start with 2-3000 at the first time.
-
reporter Duh. So that is what searchd does! It works now...
Thanks.
-
repo owner I've improved the pilerexport utility to support -w option where you can specify the WHERE part of the sphinx query (just as the GUI issues the query!). And it's able to extract all matching messages. Be sure to have enough disk space.
To try this feature, get the latest master branch.
-
repo owner - changed status to closed
- Log in to comment
Tough issue, indeed. Pilerexport doesn't feature such fancy search queries you may type in the gui, however we might be able to solve the problem. Firstly, note the number inside ( and ) in the middle horizontal bar showing the total number of hits sphinx is actually aware of. If it's a sane number (eg. <5000 or so), and you have sphinx 2.2.x, then it's easy to get more than 1000 hits, just edit config-site.php, and fix the MAX_SEARCH_HITS variable (make sure you have enough memory for php to support it).
However in the longer term it may be a better option to develop an api or something that gives you back all sphinx ids matching the query, then smart up pilerexport to use such id list to find emails to be exported.