Clone wiki

bitsy / DesignPrinciples

Design Principles

The speed and simplicity of Bitsy relies on three design principles: No Seek, No Socket, NoSQL.

No Seek

Bitsy's first principle is "No Seek." All updates to the database are written to the end of a transaction log file. Changes committed concurrently from different threads are written to the disk using file-append operations followed by a force operation to the disk. The rest of this section describes the reason behind the "No Seek" principle.

Seek time: The seek time of a hard disk measures the amount of time required for the read/write heads to move between tracks over the surfaces of the platters -- From PC Guide

Most databases persist information in tree-like structures, such a B-Trees. These data structures are stored on random-access storage devices, typically hard disk-drives. Each query/update on these data structures involves several seek operations on the disk as the query/update algorithm traverses the tree.

The average seek time is in the order of 5-15 milliseconds for typical hard disks, which is 1000x slower than in-memory operations. The bottleneck in the seek operation is the mechanical movement of the drive head. Once the head is moved to the right track, there is a rotational latency which is the time it takes for the spinning disk to get to the correct sector. This is between 2-4ms.

While caches can help speed up queries, the database will trigger one or more seeks per updated element to register updates to the B-Tree. Some databases write the updates to a log and then move them to a B-Tree at a later time. But in the long-run, an ordered on-disk structure (like the B-Tree) cannot perform an update to a large data structure faster than the seek time of the underlying random-access device. The seek-time bottleneck is traditionally addressed by (a) adding more disks to the disk array, or (b) moving to solid-state drives. Both methods are expensive and only improve the seek time to 0.1ms at best -- still 100x slower than RAM lookups.

Although hard-disk drives have high seek times, they are pretty fast at reading from and writing to a continuous block of data. The sequential data transfer rate for a standard hard-disk is around 100MB/sec. If a record is 1KB in size, you could write ~100K records in a second, which is still 2-3 orders of magnitude faster than 100K seek operations.

This is the reason behind the "No Seek" principle. A cheap off-the-shelf hard disk drive can keep up an in-memory database, as long as each updated element doesn't trigger a seek operation. There are a couple of caveats here:

  1. Though a commit operation in Bitsy will not result in a seek operation in the common case, the OS will fire seeks to reach different blocks of the file. Similary re-organization processes that access different files will require a seek to get to each file. The cost of these seek operations, however, can be amortized over the 1000s of records that are processed per seek operation.
  2. Rotational latency, which can be around 2-4ms, can slow down a commit operation from the perspective of a single thread. However, since concurrent commits from different threads are written at once, the rotational latency can be amortized over the threads that commit at the same time. When sufficient threads are operating concurrently, the rotational latency doesn't affect the overall throughput of the system. The write throughput in the benchmarks page demonstrates this effect.

No Socket

Bitsy's second principle is "No Socket." It is an embedded database and is meant to be integrated into an application. The application can be any of the following:

  • A standalone Java program
  • A web-application that implements all "three tiers"
  • A stateful server with a remote interface based on your favorite protocol (REST, SOAP, Thrift, Protocol buffer, RMI, Avro) exposed to a cluster of stateless web-applications.

Typically, databases are designed to be accessed by applications over sockets. This allows multiple applications to access the same database and makes it possible to restart the application without restarting the database.

The downside is that a sockets-based interface requires OS-level calls and serialization/de-serialization of requests and responses which reduces the overall throughput of the system. Every socket interaction at the application level corresponds to several interactions at the database level. While connection pooling addresses the cost of establishing a new connection to the database, the cost of transmission, serialization and de-serialization can not be avoided.

The speed-up gained by avoiding sockets is especially visible while querying Bitsy. Queries in Bitsy can take less than a micro-second on desktop PCs because serialization is not required. This is hard to achieve over sockets.

Socket-based interfaces also complicate the database's design. Since the interface is exposed to other servers, the database must now manage users accounts, and handle authentication and authorization. This is a layer of authentication and security that is different from the application-level authentication and security rules. With an embedded database like Bitsy, the application has sole control over the authentication and authorization rules.

No SQL

If you have come this far, you must have your own reasons to look for a "NoSQL" graph database. Bitsy was originally developed to support an application that talked to a different graph database using the Blueprints API and Gremlin. When the graph database implementation was not fast enough in a multi-threaded OLTP setting, the author of the application built Bitsy.

The biggest issue with SQL is that it is very easy (almost natural) to write poorly performing queries. Tuning a SQL database, on the other hand, is a non-trivial task. SQL was developed as a 4th-generation declarative language. The idea is that the programmer declares what he/she wants to query and system will take care of the rest. This works fine as long the tables are small (<1K rows). Once the tables become big, the "execution plan" for each query starts to matter. Soon indexes have be created, queries have to be re-written, profiling tools have to be launched, DBAs have to be hired, and so on. By the end of the process, any development time saved by the use of SQL is offset by the cost of tuning the queries and the database. The widespread adoption of "3GL" object-relational mapping tools like Hibernate, to hide SQL from the programmer, is a testament to the SQL's failure as a 4GL. Bitsy, on the other hand, has very few tuning parameters. When launched with the allowFullGraphScans option set of false, it guarantees that all operations are executed quickly.

The second issue with SQL is its complexity from the database implementor's as well as the application developer's point of view. SQL is a database implementor's nightmare. The SQL 2003 standard consists of 1300+ BNF rules. The vendors that got together to standardize SQL had their interests in mind (read: barrier to entry) rather than the interest of the SQL programmer. Furthermore, most application developers aren't as comfortable with SQL as they may think, because of its complex syntax, database-specific variants and varying programming styles. The Blueprints API was developed by one expert, Marko Rodriguez, not a consortium of software giants. In the author's opinion, the Property Graph Model is simpler and more dynamic than the relational model with SQL.

Final thoughts

The design principles, "No Seek", "No Socket" and "No SQL" keep Bitsy very simple and very fast. A couple of other recent developments are worth mentioning in the context of Bitsy's design:

  1. Journaled file-systems: Modern file systems support metadata journaling which ensures that files aren't (badly) corrupted in the middle of a write operation. While Bitsy's startup procedure accounts for incomplete writes, and has the logic to repair these problems, it doesn't have to deal with the hard problems of metadata corruption.
  2. Cheap RAM: The cost of RAM has come down drastically over the last few decades. Amazon EC2 offers high-memory instances with up to 68GB of memory. Off-the-shelf servers support upto 512GB RAM. This has enabled a paradigm shift in the database market.

Updated