# Ocelot: A Hardware-Oblivious Database Engine Welcome to Ocelot, the hardware-oblivious extension for the in-memory column-store [MonetDB](http://www.monetdb.org). Ocelot utilizes [OpenCL](https://www.khronos.org/opencl/) to provide *hardware-oblivious* replacements of MonetDB's relational database operators. The principle goal is to arrive at a database engine that can run (basically) unchanged on any device that has support for OpenCL. We have tested Ocelot on a variety of devices, including CPUs from AMD and Intel, GPUs from AMD and Nvidia, as well as the Xeon Phi accelerator cards from Intel. For more information on the design principles behind Ocelot, we refer to our VLDB 2013 publication [Hardware-Oblivious Data Processing For In-Memory Column Stores](http://www.vldb.org/pvldb/vol6/p709-heimel.pdf). All Ocelot-related source code is located in the repository folder `monetdb5/extras/ocelot`. Ocelot builds on MonetDB, which is developed by the [CWI Database Architecture Group](http://www.cwi.nl/research-groups/database-architectures). For more information on MonetDB, we refer to the MonetDB [Readme](src/simple_mem_manager/README.monetdb) and [website](http://www.monetdb.org). ## Supported Operations & Limitations### Ocelot currently supports SQL queries using the major relational operations (selection, projection, grouping, aggregation, join, sorting) on four-byte floating point (REAL) and datatypes. This also includes operations on DATE columns, and equality comparisons on VARCHAR columns. However, please note that Ocelot is primarily a **research prototype**, and there are several limitations because of this. In particular: * We don't support non-trivial string operations (inequality comparison, sorting, like) on string (CHAR / VARCHAR) columns. * There is a number of unimplemented operators (multi-column sorting, top-k) that limit the number of queries Ocelot can run. * You will probably run into several glitches :) Even though we tried to make everything as robust as possible, there are very likely many hidden bugs. If you run into any problems, feel free to contact us! ## Build Instructions ### #### Checking out the Repository#### This repository contains a full clone of MonetDB's [Mercurial repository](http://dev.monetdb.org/hg/MonetDB/) with the Ocelot extension included. Therefore, the full database can be built from this repository. After checking out the source tree, ** make sure to switch to the simple\_mem\_manager branch** via: hg update -r simple_mem_manager #### Build Prerequisites #### In order to build Ocelot, you will need: * A POSIX-compatible operating system. Sadly, we don't support Windows :( * A working gcc (or clang) build environment * The following UNIX tools: * libtool * autotools-dev & automake * pkg-config * gettext * flex & bison * Development versions of the following libraries: * libssl (OpenSSL) * libpcre * libxml2 * zlib * libreadline * libnlopt * An OpenCL development SDK. #### Linux Build Instructions #### After installing the dependencies, you can configure and build MonetDB with Ocelot via the following commands (from the root directory): ./bootstrap ./configure --prefix=/path/to/install/folder \ --enable-oid32 --disable-int128 \ --enable-optimize --disable-testing --disable-developer --disable-debug \ --with-opencl=/path/top/opencl/sdk/root make && make install We also recommend adding the MonetDB binary folder `$PREFIX/bin` to your system's PATH. In case the configure script aborts because it cannot find the OpenCL header files, you can specify the path to the OpenCL header files in the environment variable C_INCLUDE_PATH. For example, if the SDK include directory is /opt/AMDAPP/include, you have to run `export C_INCLUDE_PATH=/opt/AMDAPP/include` before starting the configure script. ***Note:*** You might see the following error: ...include/CL/clplatform.h:95:11: error: "WIN32" is not defined [-Werror=undef] #elif WIN32 If this happens, modify the line `#elif WIN32` in your OpenCL SDK's clplatform.h to `#elif defined(WIN32)`. #### OS X Build Instructions The following instructions have been tested on OS X 10.11.2, with dependencies installed via [Homebrew](http://brew.sh/): 1. Install dependencies: brew install libtool automake gettext readline openssl libatomic_ops 2. Create a local ruby gem dir: mkdir -p $HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/') 3. Run bootstrap: LIBTOOLIZE=glibtoolize M4DIRS="/usr/local/opt/gettext/share/aclocal" ./bootstrap 4. Run configure: LDFLAGS="-L/usr/local/opt/readline/lib -L/usr/local/opt/libxml2/lib -L/usr/local/opt/openssl/lib"" \ CPPFLAGS="-I/usr/local/opt/libxml2/include/libxml2 -I/usr/local/opt/readline/include -I/usr/local/opt/openssl/include" \ ./configure --prefix=/path/to/install/folder \ --enable-oid32 --disable-int128 \ --disable-testing --disable-developer --enable-debug --disable-assert \ --without-hype --with-readline=/usr/local/opt/readline \ --with-rubygem-dir=$HOME/.gem/ruby/$(ruby --version | sed -e 's/^ruby \([^p]*\)p.*$/\1/')/ ## Setting up Ocelot #### OpenCL Runtime Driver In order to use Ocelot, working OpenCL runtime drivers must be registered in the system for all devices. A good tutorial for setting up OpenCL on Linux can be found here: http://wiki.tiker.net/OpenCLHowTo. On OS X systems, OpenCL should already be working out of the box. ***Note: *** On AMD Graphics Cards, it is recommended to set the following environment variables to ensure that Ocelot can use the full device memory: export GPU_MAX_ALLOC_PERCENT=100 export GPU_FORCE_64BIT_PTR=1 #### Setting up MonetDB Before starting MonetDB, you will need to initialize a MonetDB DBFARM (folder where database data is kept). To do this, issue: monetdbd create /path/to/dbfarm/folder We also suggest creating a file `.monetdb` in your home folder with the following content to prevent having to re-enter the password for every new connection: user=monetdb password=monetdb Afterwards, you can start the MonetDB merovingian daemon via: monetdbd start /path/to/dbfarm/folder #### Creating a database After starting the merovingian daemon, you can create a new database via: monetdb create dbname monetdb release dbname You can then connect to the database and issue SQL commands via: mclient -lsql -ddbname After the database has been loaded, we **strongly recommend** to set it to read-only: monetdb stop dbname monetdb set readonly=true dbname #### Loading our modified TPC-H database As discussed in our VLDB publication, we have prepared a modified TPC-H benchmark suite that only uses operations that are supported by Ocelot. In order to use this modified benchmark, you first have to download and build the [TPC-H dbgen utility](http://www.tpc.org/tpch/dbgen-download-request.asp). After building dbgen, you can prepare the raw TPC-H data files via: cd $DBGEN_DIR ./dbgen -s scale_factor Here, scale\_factor denotes the size of the generated TPC-H instance in GB (i.e., 1 will create a 1GB instance, 0.1 a 0.1GB instance, etc.). This will create a set of .tbl files, containing the raw data for all TPC-H tables. In order to create a new database from these files, you have to move to the following directory in the source tree: `monetdb5/extras/ocelot/benchmarks/tpch/load` and issue the following commands: python prepare.py --tpch_dir=$DBGEN_DIR monetdb create tpch monetdb release tpch mclient -lsql -dtpch tpch_load.sql monetdb stop tpch monetdb set readonly=true tpch Our modified TPC-H query files can be found under `monetdb5/extras/ocelot/benchmarks/tpch/queries`. ## Initializing Ocelot Before running queries on Ocelot, you have to initialize the Ocelot runtime once. Internally, this sets up the OpenCL context, prepares the internal data structures and compiles the OpenCL kernels for all detected devices. To initialize Ocelot, issue the following command: mclient -lmal -ddatabase mal> ocelot.init(); Afterwards, you can see a list of discovered devices via `ocelot.listDevices()`. For instance: mal> ocelot.listDevices(); # Device 1: NVIDIA Corporation - Tesla K40m # Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (default device) # Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card If this list is empty, Ocelot was unable to detect any OpenCL devices. Please ensure that your system has a correctly configured OpenCL runtime and a supported device. By default, Ocelot will select the CPU as its *default device*, meaning the device that all operators will run on. The default device can be manually set via `ocelot.setDefaultDevice()`: mal> ocelot.setDefaultDevice(1); mal> ocelot.listDevices(); # Device 1: NVIDIA Corporation - Tesla K40m (default device) # Device 2: Intel(R) Corporation - Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz # Device 3: Intel(R) Corporation - Intel(R) Many Integrated Core Acceleration Card You can display more detailed information about these devices (available memory, core frequency, supported OpenCL version, etc.) via `ocelot.listDevices(true);`. After finishing your experiments, you can issue `ocelot.release();` to release all resources held by Ocelot. ## Using Ocelot In order to run SQL queries using Ocelot from the MonetDB SQL interface, you have to enable the Ocelot query rewriter. This rewriter will automatically replace supported MonetDB operators by the corresponding ones from ocelot. To select the Ocelot rewriter for the current session, open the MonetDB SQL interface and run the following SQL command: SET optimizer='cl_pipe'; From now on, all SQL queries in the current session will run on Ocelot. You can take a look at the rewritten query plan via: EXPLAIN <query>; If you want to switch back to MonetDB, simply select to one of the other MonetDB optimizers. In particular: * `set optimizer='default_pipe';` will activate the regular, parallelized MonetDB optimizer. * `set optimizer='sequential_pipe';` will use the sequential MonetDB optimizer, which runs queries on on a single CPU core.