HTTPS SSH

Ocelot: A Hardware-Oblivious Database Engine

Welcome to Ocelot, the hardware-oblivious extension for the in-memory column-store MonetDB.

Ocelot utilizes OpenCL to provide hardware-oblivious replacements of MonetDB's relational database operators. The principle goal is to arrive at a database engine that can run (basically) unchanged on any device that has support for OpenCL. We have tested Ocelot on a variety of devices, including CPUs from AMD and Intel, GPUs from AMD and Nvidia, as well as the Xeon Phi accelerator cards from Intel. For more information on the design principles behind Ocelot, we refer to our VLDB 2013 publication Hardware-Oblivious Data Processing For In-Memory Column Stores.

All Ocelot-related source code is located in the repository folder monetdb5/extras/ocelot.

Ocelot builds on MonetDB, which is developed by the CWI Database Architecture Group. For more information on MonetDB, we refer to the MonetDB Readme and website.

Supported Operations & Limitations

Ocelot currently supports SQL queries using the major relational operations (selection, projection, grouping, aggregation, join, sorting) on four-byte floating point (REAL) and datatypes. This also includes operations on DATE columns, and equality comparisons on VARCHAR columns.

However, please note that Ocelot is primarily a research prototype, and there are several limitations because of this. In particular:

  • We don't support non-trivial string operations (inequality comparison, sorting, like) on string (CHAR / VARCHAR) columns.
  • There is a number of unimplemented operators (multi-column sorting, top-k) that limit the number of queries Ocelot can run.
  • You will probably run into several glitches :) Even though we tried to make everything as robust as possible, there are very likely many hidden bugs. If you run into any problems, feel free to contact us!

Build Instructions

Checking out the Repository

This repository contains a full clone of MonetDB's Mercurial repository with the Ocelot extension included. Therefore, the full database can be built from this repository. After checking out the source tree, make sure to switch to the simple_mem_manager branch via:

hg update -r simple_mem_manager

Build Prerequisites

In order to build Ocelot, you will need:

  • A POSIX-compatible operating system. Sadly, we don't support Windows :(
  • A working gcc (or clang) build environment
  • The following UNIX tools:
    • libtool
    • autotools-dev & automake
    • pkg-config
    • gettext
    • flex & bison
  • Development versions of the following libraries:
    • libssl (OpenSSL)
    • libpcre
    • libxml2
    • zlib
    • libreadline
    • libnlopt
  • An OpenCL development SDK.

Linux Build Instructions

After installing the dependencies, you can configure and build MonetDB with Ocelot via the following commands (from the root directory):

./bootstrap
./configure --prefix=/path/to/install/folder \
            --enable-oid32 --disable-int128 \
            --enable-optimize --disable-testing --disable-developer --disable-debug \
            --with-opencl=/path/top/opencl/sdk/root
make && make install

We also recommend adding the MonetDB binary folder $PREFIX/bin to your system's PATH.

In case the configure script aborts because it cannot find the OpenCL header files, you can specify the path to the OpenCL header files in the environment variable C_INCLUDE_PATH. For example, if the SDK include directory is /opt/AMDAPP/include, you have to run export C_INCLUDE_PATH=/opt/AMDAPP/include before starting the configure script.

Note: You might see the following error:

...include/CL/clplatform.h:95:11: error: "WIN32" is not defined [-Werror=undef] #elif WIN32

If this happens, modify the line #elif WIN32 in your OpenCL SDK's clplatform.h to #elif defined(WIN32).

OS X Build Instructions

The following instructions have been tested on OS X 10.11.2, with dependencies installed via Homebrew:

  1. Install dependencies:

    brew install libtool automake gettext readline openssl libatomic_ops
    
  2. Create a local ruby gem dir:

    mkdir -p $HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/')
    
  3. Run bootstrap:

    LIBTOOLIZE=glibtoolize M4DIRS="/usr/local/opt/gettext/share/aclocal" ./bootstrap
    
  4. Run configure:

    LDFLAGS="-L/usr/local/opt/readline/lib -L/usr/local/opt/libxml2/lib -L/usr/local/opt/openssl/lib"" \
    CPPFLAGS="-I/usr/local/opt/libxml2/include/libxml2 -I/usr/local/opt/readline/include -I/usr/local/opt/openssl/include" \
    ./configure --prefix=/path/to/install/folder \
                --enable-oid32 --disable-int128 \ 
                --disable-testing --disable-developer --enable-debug --disable-assert \ 
                --without-hype --with-readline=/usr/local/opt/readline \ 
                --with-rubygem-dir=$HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/')/
    

Setting up Ocelot

OpenCL Runtime Driver

In order to use Ocelot, working OpenCL runtime drivers must be registered in the system for all devices. A good tutorial for setting up OpenCL on Linux can be found here: http://wiki.tiker.net/OpenCLHowTo. On OS X systems, OpenCL should already be working out of the box.

Note: On AMD Graphics Cards, it is recommended to set the following environment variables to ensure that Ocelot can use the full device memory:

export GPU_MAX_ALLOC_PERCENT=100
export GPU_FORCE_64BIT_PTR=1

Setting up MonetDB

Before starting MonetDB, you will need to initialize a MonetDB DBFARM (folder where database data is kept). To do this, issue:

monetdbd create /path/to/dbfarm/folder

We also suggest creating a file .monetdb in your home folder with the following content to prevent having to re-enter the password for every new connection:

user=monetdb
password=monetdb

Afterwards, you can start the MonetDB merovingian daemon via:

monetdbd start /path/to/dbfarm/folder

Creating a database

After starting the merovingian daemon, you can create a new database via:

monetdb create dbname
monetdb release dbname

You can then connect to the database and issue SQL commands via:

mclient -lsql -ddbname

After the database has been loaded, we strongly recommend to set it to read-only:

monetdb stop dbname
monetdb set readonly=true dbname

Loading our modified TPC-H database

As discussed in our VLDB publication, we have prepared a modified TPC-H benchmark suite that only uses operations that are supported by Ocelot. In order to use this modified benchmark, you first have to download and build the TPC-H dbgen utility. After building dbgen, you can prepare the raw TPC-H data files via:

cd $DBGEN_DIR
./dbgen -s scale_factor

Here, scale_factor denotes the size of the generated TPC-H instance in GB (i.e., 1 will create a 1GB instance, 0.1 a 0.1GB instance, etc.). This will create a set of .tbl files, containing the raw data for all TPC-H tables. In order to create a new database from these files, you have to move to the following directory in the source tree: monetdb5/extras/ocelot/benchmarks/tpch/load and issue the following commands:

python prepare.py --tpch_dir=$DBGEN_DIR
monetdb create tpch
monetdb release tpch
mclient -lsql -dtpch tpch_load.sql
monetdb stop tpch
monetdb set readonly=true tpch

Our modified TPC-H query files can be found under monetdb5/extras/ocelot/benchmarks/tpch/queries.

Initializing Ocelot

Before running queries on Ocelot, you have to initialize the Ocelot runtime once. Internally, this sets up the OpenCL context, prepares the internal data structures and compiles the OpenCL kernels for all detected devices. To initialize Ocelot, issue the following command:

mclient -lmal -ddatabase
mal> ocelot.init();

Afterwards, you can see a list of discovered devices via ocelot.listDevices(). For instance:

mal> ocelot.listDevices();
# Device 1: NVIDIA Corporation - Tesla K40m
# Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (default device)
# Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card

If this list is empty, Ocelot was unable to detect any OpenCL devices. Please ensure that your system has a correctly configured OpenCL runtime and a supported device. By default, Ocelot will select the CPU as its default device, meaning the device that all operators will run on. The default device can be manually set via ocelot.setDefaultDevice():

mal> ocelot.setDefaultDevice(1);
mal> ocelot.listDevices();
# Device 1: NVIDIA Corporation - Tesla K40m (default device)
# Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
# Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card

You can display more detailed information about these devices (available memory, core frequency, supported OpenCL version, etc.) via ocelot.listDevices(true);.

After finishing your experiments, you can issue ocelot.release(); to release all resources held by Ocelot.

Using Ocelot

In order to run SQL queries using Ocelot from the MonetDB SQL interface, you have to enable the Ocelot query rewriter. This rewriter will automatically replace supported MonetDB operators by the corresponding ones from ocelot.

To select the Ocelot rewriter for the current session, open the MonetDB SQL interface and run the following SQL command:

SET optimizer='cl_pipe';

From now on, all SQL queries in the current session will run on Ocelot. You can take a look at the rewritten query plan via:

EXPLAIN <query>;

If you want to switch back to MonetDB, simply select to one of the other MonetDB optimizers. In particular:

  • set optimizer='default_pipe'; will activate the regular, parallelized MonetDB optimizer.
  • set optimizer='sequential_pipe'; will use the sequential MonetDB optimizer, which runs queries on on a single CPU core.