ValueError: mmap length is too large

Issue #380 new
Anonymous created an issue

I'm trying to index the Wikipedia dataset (4.4 million files). I wrote a script to write and commit the files iteratively but after 3.1 million I got the error given below. Even when I try to open the index I get the same error. Any help would be greatly appreciated.

Traceback (most recent call last):
  File "", line 33, in <module>
    searcher = ix.searcher();
  File "C:\Python27\lib\site-packages\whoosh\", line 318, in searcher
    return Searcher(self.reader(), fromindex=self, **kwargs)
  File "C:\Python27\lib\site-packages\whoosh\", line 548, in reader
    info.generation, reuse=reuse)
  File "C:\Python27\lib\site-packages\whoosh\", line 535, in _reader
    readers = [segreader(segment) for segment in segments]
  File "C:\Python27\lib\site-packages\whoosh\", line 524, in segreader
  File "C:\Python27\lib\site-packages\whoosh\", line 614, in __init__
    files = segment.open_compound_file(storage)
  File "C:\Python27\lib\site-packages\whoosh\codec\", line 530, in open_c
    return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
  File "C:\Python27\lib\site-packages\whoosh\filedb\", line 64, in __
    self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
ValueError: mmap length is too large

Comments (3)

  1. Matt Chaput repo owner

    I assume this is on a 32-bit Python?

    To tell Whoosh not to use mmap, try creating/opening the index like this:

    from whoosh.filedb.filestore import FileStorage
    storage = FileStorage(indexdir, supports_mmap=False)
    # To create a new index
    myindex = storage.create_index(myschema)
    # To open an existing index
    myindex = storage.open_index()

    I'll see if I can fix the code to automatically fall back to no mmap on 32-bit machines when the file size is too big.


  2. Log in to comment