Wiki

Clone wiki

bitsy / SerializationFormat

Serialization Format

There are a few benefits to understanding the serialization format used by the Bitsy graph database:

  • Inspect the text files to see if the vertex and edge properties set by the application are serialized correctly
  • Use text editors and text processing tools (like sed, awk, perl) to investigate the database contents
  • Fix corrupt data files: Ideally, you shouldn't have to do this. The Backup and Recovery section discusses how you can make online/offline backups and recover the database in case of failures.

All Bitsy files are text files encoded with the UTF-8 charset. Each file consits of 0 or more records which are separated by a Unix line separator (\n). The format for a record is as follows:

<record type>=<record contents>#<checksum>

The record type is a single character and the checksum is a six-digit hexadecimal number. The rest of this page discusses the various record formats.

Header

The first record of every file is a header record of the form:

H=<Log number>#<Checksum>

Header records are not used in any line other than the first line of a data file. The purpose of the header record is to identify the order in which the logs should be loaded into memory. It also helps identify partially re-organized vertex/edge log files which may occur if the system crashes in the middle of a reorganization process performed by the VEReorg thread.

Note: The log number is important to the database's consistency. You should not delete files that only have the header defined.

Vertex

The vertex record captures a vertex that is inserted, modified or deleted. The record has the following format:

V={"id":"<ID>","v":<version>,"s":<state>,"p":<JSON-encoded map of properties>}#<checksum>

The version is an integer and the state is either M/D referring to modified and deleted vertices (respectively). Any vertex record that has a version number that doesn't match the version number in the in-memory version is an obsolete record and is removed during a re-organization process.

Edge

The edge record captures an edge and is similar to the vertex record. It has the following format (in a single line):

E={"id":"<edge ID>","v":<version>,"s":<M/D>,\
   "o":"<out vertex ID>","l":"<edge label>","i":"<in vertex ID>",\
   "p":<JSON-encoded map of properties>}#<checksum>

The properties "o", "l" and "i" refer to the outgoing vertex, edge label and incoming vertex (respectively).

Transaction

A transaction record captures the end of a transaction flush to the log. It is only present in the transaction log files, viz. txA.txt and txB.txt. The purpose of this record is to capture a successful transaction commit. The format of the record looks like this:

T=<long ID>#<checksum>

The purpose of this record is to recover from crashes where a batch of transactions are only partially written to the transaction log by the MemToTxLogWriter thread. The checksum facilitates the detection of a corrupt state caused by a partial flush. To recover the database to a valid state, Bitsy removes all records after the last valid T record, and doesn't load these records to the in-memory database during startup.

Log

A log record captures the end of a flush from the transaction log to the vertex/edge log. It is only present in vertex and edge log files, viz. vA.txt, vB.txt, eA.txt and eB.txt. The format of this record looks like this:

L=<log counter>#<checksum>

The log counter used here is the log counter of the next transaction log to be flushed into this V/E log. The purpose of this record is to recover from crashes that occur when a transaction log is only partially flushed to a V/E log. Bitsy truncates all records that follow an L record, if its log counter matches that of the header record in txA.txt or txB.txt.

Updated