4gbyte file limit (due to use of 4 byte int for record indexes) on data storage

Issue #17 invalid
created an issue
Traceback (most recent call last):
  File "test_codernitydb.py", line 136, in <module>
    added = insert_iterator_rows(pci, 10000)
  File "test_codernitydb.py", line 119, in insert_iterator_rows
  File "/usr/local/lib/python2.7/dist-packages/CodernityDB-0.4.2-py2.7.egg/CodernityDB/database.py", line 893, in insert
    self._insert_indexes(_rev, data)
  File "/usr/local/lib/python2.7/dist-packages/CodernityDB-0.4.2-py2.7.egg/CodernityDB/database.py", line 723, in _insert_indexes
    _id = self._insert_id_index(_rev, data)
  File "/usr/local/lib/python2.7/dist-packages/CodernityDB-0.4.2-py2.7.egg/CodernityDB/database.py", line 716, in _insert_id_index
    self.id_ind.insert_with_storage(_id, _rev, value)
  File "/usr/local/lib/python2.7/dist-packages/CodernityDB-0.4.2-py2.7.egg/CodernityDB/hash_index.py", line 757, in insert_with_storage
    return self.insert(_id, _rev, start, size)
  File "/usr/local/lib/python2.7/dist-packages/CodernityDB-0.4.2-py2.7.egg/CodernityDB/hash_index.py", line 661, in insert
struct.error: 'I' format requires 0 <= number <= 4294967295

hi this is in the middle of a 2.2 million-long run, and there's not any info reported about what the value was that caused the exception.

Comments (7)

  1. lkcl reporter

    ok i tried skipping the first 2.1 million records: no exception was reported. i'm now re-running the entire 2.2 million adds, but with a try: except around the offending line, and a repr (print out to file)... BUGGER! :) forgot to add "w" to the open :) let's just do that again....

    ('ff91a82e4c4f4ad291c5300e21613d73', '00012437', 4294967717L, 1937, 'o', 0)

                    f = open("/tmp/x.txt", "w")
                    f.write(repr((key, rev, start, size, status, _next)))

    ok so this is a bug. insert (hash_index.py) is receiving a Long. that's bad.

  2. lkcl reporter

    rrright. tracked this down. the file being created is over 4gbytes in size.

    drwxr-xr-x 2 root root       4096 Apr 30 11:57 _indexes
    drwxr-xr-x 4 root root       4096 Apr 30 11:57 ..
    drwxr-xr-x 3 root root       4096 Apr 30 11:57 .
    -rw-r--r-- 1 root root 4294969654 Apr 30 11:59 id_stor
    -rw-r--r-- 1 root root  122675458 Apr 30 11:59 id_buck

    so yes, it is correct: an exception is correctly being raised. you need to not use int and use long for indexes into the file.

  3. lkcl reporter

    clearly, this is a pretty major limitation on the usefulness of this database engine, which makes it a showstopper for any serious use.

  4. codernity repo owner


    Sorry for late reply (something bad happened to notifications...).

    @lkcl All what you need in your example is just to change Ito Q format for example.

    We migth not explain it very clearly in documentation but we clearly wrote about it in the entry_line_format section

    If you expect that your index might require more than 4294967295 bytes of space or metadata (that’s the max number for I format), change it to Q.

    That is the limitation only for I in python struct (and C obviously), changing it to any bigger number format will be fine.

    Please note, that the same is for "storage". And please note also that on some sizes our "simple sharding" might help too (http://labs.codernity.com/codernitydb/database_indexes.html#sharding-in-indexes)

    There is no such limitation for database itself, just default setting to avoid overhead in most cases. (I vs Q)

  5. Log in to comment