whoosh index tmp files going to /tmp

Issue #885 new
Former user created an issue

We've noticed that the reindexing is creating several files in the /tmp directory on Linux. below is a conversation I had on IRC with marcinkuzminski. Submitting this to be able to control which directory the whoosh indexing temp files are temporarily stored.

[18:07] <cdsmmeyer> hello.  When I do reindexing I see several files like this in the /tmp directory on linux.  what are they and how can I redirect them to something other than /tmp?
[18:07] <cdsmmeyer> -rw------- 1 rhodecode rhodecode 57481268 Aug  1 10:47 whoosh_HG_INDEX_jwmHIS.run
[18:07] <marcinkuzminski> ahh i didn't have time to answer that on discussion group
[18:08] <marcinkuzminski> cdsmmeyer: it's whoosh temporary files created during indexing
[18:08] <cdsmmeyer> do they get deleted when indexing is done?
[18:08] <marcinkuzminski> they should be gone when indexing is finished and index commited
[18:08] <cdsmmeyer> is there anyway to have them created somewhere else other than /tmp?
[18:09] <marcinkuzminski> I don't know i would have to check in whoosh docs
[18:09] <cdsmmeyer> ok.  so far we love the features of rhodecode :)
[18:10] <marcinkuzminski> great :)
[18:10] <cdsmmeyer> keep up the good work
[18:10] <marcinkuzminski> did you seen our new UI at secure.rhodecode.org and website rhodecode.com ?
[18:11] <marcinkuzminski> btw: https://groups.google.com/forum/#!topic/whoosh/4av9dOUXIPc
[18:11] <cdsmmeyer> i have not yet.  big changes on the way?
[18:11] <marcinkuzminski> yeah, we now started to work full time on it ;)
[18:11] <cdsmmeyer> cool.   wow that topic was created in 2010.  no replies huh?
[18:13] <marcinkuzminski> seems to be fixed
[18:13] <marcinkuzminski> https://bitbucket.org/mchaput/whoosh/issue/48/temp-directories-are-not-deleted-when
[18:14] <marcinkuzminski> from the commit it seems that this is configurable
[18:15] <cdsmmeyer> the location of the temp files?
[18:15] <marcinkuzminski> yes
[18:16] <marcinkuzminski> there is a 'dir=/path' param that can be passed in into a whoosh writer
[18:17] <cdsmmeyer> so not able to put in the production.ini file I take it?
[18:17] <marcinkuzminski> no
[18:17] <marcinkuzminski> we would have to add it

Comments (6)

  1. valentijnscholten

    I think the main problem is that the index files don't get deleted after indexing is complete. Over the weekend our rhodecode server went out of diskspace because of >10GB of whoosh index files being left behind in /tmp.(rhodecode 1.7.1).

  2. Marcin Kuzminski repo owner

    Did the indexer finish properly, or did it exit prematurely ? In case it finishes up and leaves the tmp files, then i think we have to write a cleanup at the end.

  3. valentijnscholten

    I think it finishes succesfully:

    2013-10-12 07:07:27.982 DEBUG [whoosh_indexer] >> COMMITING CHANGES TO FILE INDEX <<

    No more logging after that until the next index run.

    I see a ticket about tmp files in the whoosh tracker, but I am not sure if that's talking about temp dirs only or also temp files. In my opinion whoosh should remove the files when done.


    Maybe an upgrade to newest whoosh?

  4. Marcin Kuzminski repo owner

    we're already using whoosh 2.4.1 (NOT latest but much newer than the version posted that supposedly fixes this issue)

  5. valentijnscholten

    A workaround for now could be to wrap the make-index command inside a (bash) script, which after indexing also removes the tmp index files.

  6. Log in to comment