whoosh.filedb.filestore.FileStorage.temp_storage is eventually failing, randomly

I my test environment I get a lot of whoosh indexes created and destroyed at a fast fast. Eventually I am getting IOErrors in whoosh.filedb.filestore.FileStorage.temp_storage is eventually , randomly

   writer = self.index.writer()
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/index.py", line 464, in writer
   return SegmentWriter(self, **kwargs)
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/writing.py", line 531, in __init__
   self.perdocwriter = codec.per_document_writer(self.storage, newsegment)
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/codec/whoosh3.py", line 84, in per_document_writer
   return W3PerDocWriter(self, storage, segment)
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/codec/whoosh3.py", line 161, in __init__
   self._cols = compound.CompoundWriter(tempst)
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/filedb/compound.py", line 245, in __init__
   self._temp = tempstorage.create_file(self._tempname, mode="w+b")
 File "/home/pombredanne/bin/scripts/eggs/Whoosh-2.5.2-py2.6.egg/whoosh/filedb/filestore.py", line 483, in create_file
   fileobj = open(path, mode)
OError: [Errno 13] Permission denied: '/tmp/MAIN.tmp/n05fcltsozfzct46fmzi882w1w8w.ctmp'

Not sure if the problem is in my code... or else.. but this sounds like this is consistently at the same location. And only on Linux (Python 2.5 or 2.6)

  1. pombredanne NA reporter

    Well let's not get too excited about this.. I will wait a few test cycles but it sounds like a disk failing to me...

    However, just curious: @Matt Chaput why do not you use tempfile for that temp storage? These are RamIndex, which are not file based, so I was surprised to see temp files being written so I digged a bit. RamStorage uses tempdir in whoosh.filedb.filestore.RamStorage.temp_storage but FileStorage does not use a tempdir in whoosh.filedb.filestore.FileStorage.temp_storage

    Yet both create random names rather than use tempfile directly...any reason?

  2. Matt Chaput repo owner

    The idea was to keep the temporary files inside the index dir so it's slightly more visible if a bug results in lots of temp files being left behind, instead of silently filling up $TEMP. (Now that I think about it, I should clear out that directory when a writer locks the index.)

    Anyway once you're putting named files in a known location, using tempfile doesn't give you anything over doing it yourself. Using random names avoids the weird locking thing tempfile does to avoid name conflicts.

    RamStorage uses on-disk temp files because the temp files are used in an external merge sort -- keeping them in memory would defeat the point. If you want to avoid using disk when indexing, increase the limitmb= keyword argument to writer() to allow it to use more memory. :)

