Serialization Format

There are a few benefits to understanding the serialization format used by the Bitsy graph database:

Inspect the text files to see if the vertex and edge properties set by the application are serialized correctly
Use text editors and text processing tools (like sed, awk, perl) to investigate the database contents
Fix corrupt data files: Ideally, you shouldn't have to do this. The Backup and Recovery section discusses how you can make online/offline backups and recover the database in case of failures.

All Bitsy files are text files encoded with the UTF-8 charset. Each file consits of 0 or more records which are separated by a Unix line separator (\n). The format for a record is as follows:

<record type>=<record contents>#<checksum>

The record type is a single character and the checksum is a six-digit hexadecimal number. The rest of this page discusses the various record formats.

Header

The first record of every file is a header record of the form:

H=<Log number>#<Checksum>

Header records are not used in any line other than the first line of a data file. The purpose of the header record is to identify the order in which the logs should be loaded into memory. It also helps identify partially re-organized vertex/edge log files which may occur if the system crashes in the middle of a reorganization process performed by the VEReorg thread.

Note: The log number is important to the database's consistency. You should not delete files that only have the header defined.

Vertex

The vertex record captures a vertex that is inserted, modified or deleted. The record has the following format:

V={"id":"<ID>","v":<version>,"s":<state>,"p":<JSON-encoded map of properties>}#<checksum>

The version is an integer and the state is either M/D referring to modified and deleted vertices (respectively). Any vertex record that has a version number that doesn't match the version number in the in-memory version is an obsolete record and is removed during a re-organization process.

Edge

The edge record captures an edge and is similar to the vertex record. It has the following format (in a single line):

E={"id":"<edge ID>","v":<version>,"s":<M/D>,\
   "o":"<out vertex ID>","l":"<edge label>","i":"<in vertex ID>",\
   "p":<JSON-encoded map of properties>}#<checksum>

The properties "o", "l" and "i" refer to the outgoing vertex, edge label and incoming vertex (respectively).

Transaction

A transaction record captures the end of a transaction flush to the log. It is only present in the transaction log files, viz. txA.txt and txB.txt. The purpose of this record is to capture a successful transaction commit. The format of the record looks like this:

T=<long ID>#<checksum>

The purpose of this record is to recover from crashes where a batch of transactions are only partially written to the transaction log by the MemToTxLogWriter thread. The checksum facilitates the detection of a corrupt state caused by a partial flush. To recover the database to a valid state, Bitsy removes all records after the last valid T record, and doesn't load these records to the in-memory database during startup.

Log

A log record captures the end of a flush from the transaction log to the vertex/edge log. It is only present in vertex and edge log files, viz. vA.txt, vB.txt, eA.txt and eB.txt. The format of this record looks like this:

L=<log counter>#<checksum>

The log counter used here is the log counter of the next transaction log to be flushed into this V/E log. The purpose of this record is to recover from crashes that occur when a transaction log is only partially flushed to a V/E log. Bitsy truncates all records that follow an L record, if its log counter matches that of the header record in txA.txt or txB.txt.

Wiki

bitsy / SerializationFormat

Serialization Format

Header

Vertex

Edge

Transaction

Log