These benchmarks are based on tests run on a low-end desktop PC ($600 HP Pavilion desktop p7-1287c running Windows 7 with a single 7200rpm hard disk). You can find these benchmarks in the unit test named BitsyGraphTest.
Note: Please model and execute performance tests based on your use-cases. These tests aren't meant to be performance guarantees.
The plot below shows the throughput of a test application that repeatedly commits a small transaction (1 vertex + 1 edge) from multiple threads. The code for this test is available in BitsyGraphTest.XtestMultiThreadedPerformance().
Bitsy's throughput increases as more threads are added till the system is able to perform 50K+ operations/second at ~750 concurrent threads. The performance is lower when the number of concurrent threads is small due to the rotational latency of the disk that slows down the force() operation on the file channel.
By comparison, Neo4J 1.9.2 stays at 1600-1900tps till 750 concurrent threads and hangs beyond that. The test uses an embedded version of the Neo4J Blueprints implementation (Neo4JGraph) that directly operates on the local disk. This is not meant to be a benchmark of Neo4J -- rather a demonstration of the performance benefit of the "No Seek" design principle.
Unlike Bitsy, Neo4J cannot batch the updates from multiple threads to the same location because the updates go to different locations on the disk. The reason is that Neo4J's on-disk structure, like most other databases, is designed to efficiently locate vertices and edges that are not cached. However, cache misses in Neo4J will be throttled by the seek time as well, resulting in a significant performance drop when the entire graph does not fit in memory.
Bitsy is a fully in-memory database that caches the entire database in memory. It offers very good write and read performance for small-medium sized graphs in an OLTP setting. Depending on your use-case, Bitsy may be an attractive alternative to Neo4J and other graph databases.
Read throughput for Bitsy
The following plot show the throughput of a test application that repeatedly traverses a portion of a Bitsy graph. The code for this test is available in BitsyGraphTest.XtestMultiThreadedPerformance().
Bitsy 1.5 performs a lot better than Bitsy 1.0 due to its (mostly) lock-free read algorithms. The 1.0 version used locks that triggered thread-switching and dropped to ~400K operations/second once the number of threads exceeded the number of cores. Bitsy 1.5 achives read throughputs in excess of 10M operations/second when all the four cores are busy. The read performance is also comparable to Neo4J, which has a well-deserved reputation of performing fast graph traversals when the graph fits in memory.
The following plot shows the throughput of another test that has multiple threads traversing a bi-partite graph with 1000 vertices and out-degree 3. Again, both Bitsy and Neo4J are comparable and achieve
Again, the results are very close showing that Bitsy can keep with with Neo4J's reading speeds.
Sizing thread pools
In typical OLTP applications, a thread pool is dedicated to service clients with each thread committing changes to the database. The above plot indicatess that the maximum size of this thread pool when using Bitsy can be as high as 500-1000 (provided the system has enough resources for the application). The best size for the thread pool may be higher or lower depending on the application, the nature of transactions, and the hardware capabilities.
Note: The throughput will automatically scale up and down depending on the number of concurrent requests. However, when fewer active threads are committing transactions, the latency seen by the application will be better.