Issue #322 wontfix

No sync operations in filedb

Bastian Blank
created an issue

The whoosh filedb support don't include any calls to fsync or fdatasync. So the result of the writes and the rename is undefined if a crash happens and usually results in an empty file.

To properly write data to disk, it needs to:

  • Call fdatasync before .close().
  • Call fsync after rename on the directory if it cares about the new file to survive.

Comments (5)

  1. Matt Chaput repo owner

    A crash in the middle of writing will not affect the index... it's like an isolated transaction. (You will lose the data added to the writer so far, but that's a different issue than syncing). Any partial files will be cleaned up by the next writer.

    I don't really understand everything about fsync/fdatasync, but I don't see how calling them would change anything... the new segment only because "real" to new readers after the new TOC file is atomically renamed into place, before which all the other files are closed.

    The only thing that would cause problems is if the rename was synced to disk before the contents of the closed files. Is that even possible?

  2. Bastian Blank reporter

    There are no transactions on filesystems. While the rename may be atomic, the order of operations is not. write(), rename() can be---and usualy will be---re-ordered to rename() <crash> write().

  3. Matt Chaput repo owner

    In that case fsync-ing wouldn't help anyway unless you did a "fullfsync" which AFAIK Python doesn't support. Do you have a citation that such a reordering is possible? I would have thought that would be handled properly at the filesystem level. Thanks!

    (Interestingly as an aside, some experimental filesystems do have transactions, but that's not what I was talking about. :)

  4. Log in to comment