Ocelot: A Hardware-Oblivious Database Engine
Welcome to Ocelot, the hardware-oblivious extension for the in-memory column-store MonetDB.
Ocelot utilizes OpenCL to provide hardware-oblivious replacements of MonetDB's relational database operators. The principle goal is to arrive at a database engine that can run (basically) unchanged on any device that has support for OpenCL. We have tested Ocelot on a variety of devices, including CPUs from AMD and Intel, GPUs from AMD and Nvidia, as well as the Xeon Phi accelerator cards from Intel. For more information on the design principles behind Ocelot, we refer to our VLDB 2013 publication Hardware-Oblivious Data Processing For In-Memory Column Stores.
All Ocelot-related source code is located in the repository folder
Supported Operations & Limitations
Ocelot currently supports SQL queries using the major relational operations (selection, projection, grouping, aggregation, join, sorting) on four-byte floating point (REAL) and datatypes. This also includes operations on DATE columns, and equality comparisons on VARCHAR columns.
However, please note that Ocelot is primarily a research prototype, and there are several limitations because of this. In particular:
- We don't support non-trivial string operations (inequality comparison, sorting, like) on string (CHAR / VARCHAR) columns.
- There is a number of unimplemented operators (multi-column sorting, top-k) that limit the number of queries Ocelot can run.
- You will probably run into several glitches :) Even though we tried to make everything as robust as possible, there are very likely many hidden bugs. If you run into any problems, feel free to contact us!
Checking out the Repository
This repository contains a full clone of MonetDB's Mercurial repository with the Ocelot extension included. Therefore, the full database can be built from this repository. After checking out the source tree, make sure to switch to the simple_mem_manager branch via:
hg update -r simple_mem_manager
In order to build Ocelot, you will need:
- A POSIX-compatible operating system. Sadly, we don't support Windows :(
- A working gcc (or clang) build environment
- The following UNIX tools:
- autotools-dev & automake
- flex & bison
- Development versions of the following libraries:
- libssl (OpenSSL)
- An OpenCL development SDK.
Linux Build Instructions
After installing the dependencies, you can configure and build MonetDB with Ocelot via the following commands (from the root directory):
./bootstrap ./configure --prefix=/path/to/install/folder \ --enable-oid32 --disable-int128 \ --enable-optimize --disable-testing --disable-developer --disable-debug \ --with-opencl=/path/top/opencl/sdk/root make && make install
We also recommend adding the MonetDB binary folder
$PREFIX/bin to your system's PATH.
In case the configure script aborts because it cannot find the OpenCL header files, you can specify the path to the OpenCL header files in the environment variable C_INCLUDE_PATH. For example, if the SDK include directory is /opt/AMDAPP/include, you have to run
export C_INCLUDE_PATH=/opt/AMDAPP/include before starting the configure script.
Note: You might see the following error:
...include/CL/clplatform.h:95:11: error: "WIN32" is not defined [-Werror=undef] #elif WIN32
If this happens, modify the line
#elif WIN32 in your OpenCL SDK's clplatform.h to
OS X Build Instructions
The following instructions have been tested on OS X 10.11.2, with dependencies installed via Homebrew:
brew install libtool automake gettext readline openssl libatomic_ops
Create a local ruby gem dir:
mkdir -p $HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/')
LIBTOOLIZE=glibtoolize M4DIRS="/usr/local/opt/gettext/share/aclocal" ./bootstrap
LDFLAGS="-L/usr/local/opt/readline/lib -L/usr/local/opt/libxml2/lib -L/usr/local/opt/openssl/lib"" \ CPPFLAGS="-I/usr/local/opt/libxml2/include/libxml2 -I/usr/local/opt/readline/include -I/usr/local/opt/openssl/include" \ ./configure --prefix=/path/to/install/folder \ --enable-oid32 --disable-int128 \ --disable-testing --disable-developer --enable-debug --disable-assert \ --without-hype --with-readline=/usr/local/opt/readline \ --with-rubygem-dir=$HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/')/
Setting up Ocelot
OpenCL Runtime Driver
In order to use Ocelot, working OpenCL runtime drivers must be registered in the system for all devices. A good tutorial for setting up OpenCL on Linux can be found here: http://wiki.tiker.net/OpenCLHowTo. On OS X systems, OpenCL should already be working out of the box.
Note: On AMD Graphics Cards, it is recommended to set the following environment variables to ensure that Ocelot can use the full device memory:
export GPU_MAX_ALLOC_PERCENT=100 export GPU_FORCE_64BIT_PTR=1
Setting up MonetDB
Before starting MonetDB, you will need to initialize a MonetDB DBFARM (folder where database data is kept). To do this, issue:
monetdbd create /path/to/dbfarm/folder
We also suggest creating a file
.monetdb in your home folder with the following content to prevent having to re-enter the password for every new connection:
Afterwards, you can start the MonetDB merovingian daemon via:
monetdbd start /path/to/dbfarm/folder
Creating a database
After starting the merovingian daemon, you can create a new database via:
monetdb create dbname monetdb release dbname
You can then connect to the database and issue SQL commands via:
mclient -lsql -ddbname
After the database has been loaded, we strongly recommend to set it to read-only:
monetdb stop dbname monetdb set readonly=true dbname
Loading our modified TPC-H database
As discussed in our VLDB publication, we have prepared a modified TPC-H benchmark suite that only uses operations that are supported by Ocelot. In order to use this modified benchmark, you first have to download and build the TPC-H dbgen utility. After building dbgen, you can prepare the raw TPC-H data files via:
cd $DBGEN_DIR ./dbgen -s scale_factor
Here, scale_factor denotes the size of the generated TPC-H instance in GB (i.e., 1 will create a 1GB instance, 0.1 a 0.1GB instance, etc.). This will create a set of .tbl files, containing the raw data for all TPC-H tables. In order to create a new database from these files, you have to move to the following directory in the source tree:
monetdb5/extras/ocelot/benchmarks/tpch/load and issue the following commands:
python prepare.py --tpch_dir=$DBGEN_DIR monetdb create tpch monetdb release tpch mclient -lsql -dtpch tpch_load.sql monetdb stop tpch monetdb set readonly=true tpch
Our modified TPC-H query files can be found under
Before running queries on Ocelot, you have to initialize the Ocelot runtime once. Internally, this sets up the OpenCL context, prepares the internal data structures and compiles the OpenCL kernels for all detected devices. To initialize Ocelot, issue the following command:
mclient -lmal -ddatabase mal> ocelot.init();
Afterwards, you can see a list of discovered devices via
ocelot.listDevices(). For instance:
mal> ocelot.listDevices(); # Device 1: NVIDIA Corporation - Tesla K40m # Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (default device) # Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card
If this list is empty, Ocelot was unable to detect any OpenCL devices. Please ensure that your system has a correctly configured OpenCL runtime and a supported device. By default, Ocelot will select the CPU as its default device, meaning the device that all operators will run on. The default device can be manually set via
mal> ocelot.setDefaultDevice(1); mal> ocelot.listDevices(); # Device 1: NVIDIA Corporation - Tesla K40m (default device) # Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz # Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card
You can display more detailed information about these devices (available memory, core frequency, supported OpenCL version, etc.) via
After finishing your experiments, you can issue
ocelot.release(); to release all resources held by Ocelot.
In order to run SQL queries using Ocelot from the MonetDB SQL interface, you have to enable the Ocelot query rewriter. This rewriter will automatically replace supported MonetDB operators by the corresponding ones from ocelot.
To select the Ocelot rewriter for the current session, open the MonetDB SQL interface and run the following SQL command:
From now on, all SQL queries in the current session will run on Ocelot. You can take a look at the rewritten query plan via:
If you want to switch back to MonetDB, simply select to one of the other MonetDB optimizers. In particular:
set optimizer='default_pipe';will activate the regular, parallelized MonetDB optimizer.
set optimizer='sequential_pipe';will use the sequential MonetDB optimizer, which runs queries on on a single CPU core.