Wiki

Clone wiki

bitsy / LargeObjects

Large Objects

Seek operations while accessing large objects have a low amortized cost. So the "No seek" design principle is not particularly useful for large objects. Bitsy's strength lies is quickly querying and updating graphs with a lot of small vertices and nodes in a transactional setting.

There are better technologies to store large objects, such as:

  • A plain old file system: The java.io and java.nio packages are pretty efficient when it comes to storing, loading and streaming large files. You can use an ID or date-based mechanism to save the object.
  • Clustered file systems like HDFS
  • Key-value store like Apache Cassandra

By moving large objects to an external store, you can conserve the memory used by Bitsy and reduce the time taken during startup. The rest of this section discusses how you can combine Bitsy with an external store for large objects while maintaining the ACID guarantees.

Ensuring ACID with large objects

You can place the references to the large objects in vertex or edge properties. The reference could be a URI to the file's path or the key to a key-value store.

When multiple transactions are operating on vertices and nodes, you can follow these rules to ensure ACID guarantees:

  1. Save all large objects before committing the vertices/edges with references to them
  2. Delete large objects after the vertices/edges with the references to them have been deleted and committed
  3. Do not modify any large object once it is created

The last rule calls for immutable large-objects. This ensures that crashes in the middle of updating a large object don't corrupt the database.

Example

Here is a quick example. Consider an application that maintains a vertex per "person" in Bitsy. The image corresponding to each person is maintained in a file system. Using the above-mentioned approach you could model CRUD operations as follows:

Creating the Person for the first time:

  1. Save the image in a file, say /resource/<user-id>/<random-uuid>.jpg. By inserting a random UUID to the path, we ensure that it won't conflict with an older image.
  2. Create the person vertex with a property 'imageURI'
  3. Commit the changes to Bitsy

Reading the Person's details:

  1. Read the person vertex
  2. If the image is needed, use the 'imageURI' property to lookup the image.

Updating the Person's image:

  1. Create a new image with a different UUID
  2. Change the 'imageURI' in the person vertex
  3. Commit the changes to Bitsy

A crash in the middle of this process will leave the person node in its last known state, i.e., pointing to the old image file.

Removing a Person:

  1. Load the person vertex
  2. Keep track of the imageURI in a local variable
  3. Remove the vertex
  4. Commit the changes to Bitsy
  5. Remove the file

A crash in the middle of this process will still leave the person node in a valid state. In the worst case, you could have an extra image file on the file sytem.

Updated