Frequently Asked Questions
When should I use Bitsy?
The Bitsy graph database is a good fit for you if
- your data can be modeled as a property graph
- your data can fit in memory
- your application is multi-threaded
- your application needs ACID transactions and transparent concurrency control
- your application needs high read/write performance on small transactions
- you need the flexibility to move to a different graph database implementation in the future
What is different about Bitsy?
Bitsy is designed differently from disk-based and cluster-based graph databases. You can read more about Bitsy's design principles in this presentation. Because Bitsy doesn't perform disk seeks, it has a higher write throughput in a multi-threaded setting while offering ACID guarantees.
Bitsy's read throughput is also comparable to the fastest graph databases out there. This presentation discusses how Bitsy achieves cutting-edge read performance.
Can I move from/to a different graph database?
Yes. Bitsy is a strict implementation of the Blueprints API and supports the Tinkerpop software stack. The Blueprints API provides "a common set of interfaces to allow developers to plug-and-play their graph database backend."
To move your data across database, you can write a simple program to all read vertices and edges from your old graph database and store them in the new graph database. In this manner, you can avoid getting locked into any single graph database implementation.
How can I use Bitsy in my application?
Bitsy is an embedded database, which means that you directly invoke its methods through the standard Blueprints API from your application.
To embed Bitsy, your application must be Java-based, i.e., written in Java, Groovy, Scala, Clojure, JRuby, Jython, etc. Reads from your application will access Bitsy's in-memory data-structures, and writes will modify these structures and write the updates to files. Bitsy guarantees clean recovery on file-systems with metadata journaling.
Server-based applications that embed Bitsy can expose a domain-specific APIs corresponding to the business logic layer. The server could even include the presentation layer for small-medium installations. You can configure a large thread pool (50-200) to get the best performance out of Bitsy.
A domain-specific API built on the current-generation of servers and protocols (Tomcat/HTTP, Jetty/HTTP, Jersey/JAX-RS, Thrift, Avro, etc) should be able support thousands if not tens of thousands of transactions per second.
Is there an option to use Bitsy over the network?
You can use Rexster, a component of the Tinkerpop software stack to do this. Rexster supports a binary protocol called RexPro. However, this won't be as fast as the embedded mode, as with any graph database, because each domain-level API call will translate to multiple queries and updates to the graph database.
You can find instructions to use Rexster with Bitsy here. Rexster's Doghouse is great way to view and debug the graph.
Can I use Bitsy in an eco-system with other NoSQL technologies?
Yes. All Bitsy identifiers are UUIDs that don't change over time. The toString() method on these identifiers gives you a String identifier that can be used to access the vertices/edges in the graph. You can maintain such references to Bitsy vertices/edges in external stores, such as search indexes and other databases.
A couple of common use-cases involving complementary NoSQL technologies have been documented in the following pages:
- Large Objects: This page describes how large objects/blobs can be stored outside Bitsy while maintaining ACID guarantees.
- Fancy Indexes: This page describes a way to integrate Bitsy with Lucene or other search indexes
Other methods as possible too. For example, you could use Bitsy to
- Maintain temporary user information such as 'shopping carts' which can later be committed to the main database
- Store the configuration of your application
- Keep track of application statistics that need to survive crashes and restarts
How do I get the best performance out of Bitsy?
Please refer to the page on Tuning Bitsy.
How can I monitor a Bitsy application?
Please refer to the page on Monitoring and Management.
Can I backup a Bitsy database?
Please refer to the page on Backup and Restore.
What is the BitsyRetryException? How do I deal with it?
Please refer to the page on Optimistic Concurrency. The retry mechanism is going to be a part of the Tinkerpop standard.
What is the BitsyException with error code ACCESS_OUTSIDE_TX_SCOPE? How do I deal with it?
"Element references created in a transactional context may not be accessed outside the transactional context", according to the Blueprints standard. The standard continues to say that "in cases where the element reference needs to be accessed outside its original transactional context, it should be re-instantiated based on the element id."
Since other graph databases seem to automatically reload vertices/edges, Bitsy offers a wrapper class named BitsyAutoReloadingGraph which can be used as follows:
import com.lambdazen.bitsy.wrapper.BitsyAutoReloadingGraph; ... BitsyGraph baseGraph = new BitsyGraph(...) BitsyAutoReloadingGraph graph = new BitsyAutoReloadingGraph(baseGraph ); // ... use graph in any number of threads
Bitsy gives me an older version of vertices/edges. What is happening?
The most likely cause is that a transaction is already in progress at the time of the query. The default isolation level for Bitsy is REPEATABLE_READ which ensures that you don't see different properties for the same element in the same transaction. So if you read vertex/edge elements, but don't call commit/rollback, you may see the same information later in the thread.
The best way to avoid these problems is to move to a transaction model such as the one described in the "Recommended method" section, which automatically does the commits and retries. The other option is to use the isTransactionActive() method available in BitsyGraph and BitsyAutoReloadingGraph to see if a transaction is in progress.
How does the open-source license work?
Bitsy is a dual-licensed product. The open-source license is AGPLv3. When you release your product with this license, you must license your product as AGPLv3 as well. Otherwise, you must purchase a commercial license.
Why isn't Bitsy Apache-licensed like other LambdaZen projects?
A project like Bitsy needs to maintained to stay fast, bug-free and on the cutting-edge. The original reason for developing Bitsy was to have a fast database for a semantic Wiki built by LambdaZen. The first release of Bitsy solved this need.
But different users have different needs. The commercial license helps drive improvements and innovation past the first release, by keeping the author motivated. For example, one of the key motivators for lock-free reads in Bitsy 1.5 was customer interest in read-only use-cases with deep graph traversals.
In the future, based on the revenues generated, LambdaZen may hold design and implementation contests to further improve Bitsy.
How does the commercial license work?
Bitsy licenses are tied to the integrated product or service, not to the number of instances or processors used. As long as the license is valid, the product can be hosted in any number of machines, and re-distributed to any number of your customers. Similarly a service that uses Bitsy, like a web-site, can be hosted in any number of domains and exposed to any number of users and customers.
The current license works for any number of integrated products or services that your company develops. You can embed Bitsy in your products as long as your have an valid annual subscription or a perpetual license
Please refer to the commercial web-site for the latest prices and contact information. For most organizations, a perpetual Bitsy license costs less than a couple of weeks of developer time.