Artifact for the ECOOP Paper "Making an Embedded DBMS JIT-friendly"
by Carl Friedrich Bolz, Darya Kurilova, Laurence Tratt
The paper describes SQPyte: a composed VM allowing the efficient jitted
execution of SQLite queries from within Python programs. The artifact contains
SQPyte, the PyPy-SQPyte bridge as well as benchmarks for comparing the
performance of SQPyte, SQLite and the Java-based embedded database H2.
The abstract of the paper is:
While DataBase Management Systems (DBMSs) are highly optimized,
interactions across the Programming Language (PL) / DBMS boundary are
costly, even for in-process embedded DBMSs. In this paper we show that
programs that interact with the widely-used embedded DBMS SQLite can be
significantly optimized – by a factor of 3.4 in our benchmarks – by
inlining across the PL / DBMS boundary. We achieved this speed-up by
replacing parts of SQLite’s C interpreter with RPython code and composing
the resulting meta-tracing VM – called SQPyte – with the PyPy VM. SQPyte
does not compromise stand-alone SQL performance: it is 2.2% faster than
SQLite on the widely used TPC-H benchmark suite.
We have packaged the artifact as a script that downloads and builds all the
code. We also provide a VirtualBox VM that already contains everything needed
and can be used directly.
Building the Code
If you don't want to build the relevant code yourself, you can use the
VirtualBox VM and jump to the next section.
Otherwise you need to install the following dependencies:
- Java, including a JDK
- the PyPy dependencies, as documented here: http://doc.pypy.org/en/latest/build.html
- latex, if you want to make a PDF showing the benchmark results
Then you can use the ``make.py`` script to download and build all the
needed software of the artifact. The script assumes a unixy environment, most
flavors of Unix (including Mac OS X) should work.
To download and build all the code, run::
$ python make.py download
$ python make.py build
This will take about two hours.
Using the VirtualBox VM
You can run and start the VM with the VirtualBox GUI. The password is ``sqpyte``.
To start the VirtualBox VM from the commandline, first import it::
$ VBoxManage import sqpyte_0.2.ovf
Then to start the VM run::
$ VBoxHeadless --startvm sqpyte -e "TCP/Ports=4455"
The VM will then boot. You can then use a VNC client to connect to the host
port 4455, or you can SSH to host port 3122 (SSH port forward is already
configured in the ovf file):
$ ssh sqpyte@localhost -p 3122
The password is 'sqpyte'. The sqpyte account has sudo rights, should you need
After building has finished, or immediately after starting the VirtualBox VM,
we envision the following scenarios when using the artifact. We have put the
sections of the paper that are relevant for each scenario.
Using SQPyte from PyPy
Section 7 in the paper.
You can start the PyPy with the SQPyte bridge built in using the ``./pypy``
command. This will start a completely standard Python interpreter, compatible
with Python 2.7.10 that contains an extra ``sqpyte`` module. The ``sqpyte``
module exposes a subset of the interface of the standard Python ``sqlite3``
module. It can be used to interact with SQLite databases in the normal ways from
Python, for example like this::
Python 2.7.10 (cbb1faa2a19f, Mar 08 2016, 14:03:27)
[PyPy 4.1.0-alpha0 with GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>> import sqpyte
>>>> conn = sqpyte.Connection("tpch/sqlite/tpch.db")
>>>> # simple arithmetic calculation
>>>> conn.execute("select 1 + 2").next()
>>>> conn.execute("select avg(quantity) from lineitem").next()
>>>> # iterate over query result
>>>> for items in conn.execute("select * from orders limit 5"):
.... print items
(1, 36901, u'O', 173665.47, u'1996-01-02', u'5-LOW', u'Clerk#000000951', u'0', u'nstructions sleep furiously among ')
(2, 78002, u'O', 46929.18, u'1996-12-01', u'1-URGENT', u'Clerk#000000880', u'0', u' foxes. pending accounts at the pending, silent asymptot')
(3, 123314, u'F', 193846.25, u'1993-10-14', u'5-LOW', u'Clerk#000000955', u'0', u'sly final accounts boost. carefully regular ideas cajole carefully. depos')
(4, 136777, u'O', 32151.78, u'1995-10-11', u'5-LOW', u'Clerk#000000124', u'0', u'sits. slyly regular warthogs cajole. regular, regular theodolites acro')
(5, 44485, u'F', 144659.2, u'1994-07-30', u'5-LOW', u'Clerk#000000925', u'0', u'quickly. bold deposits sleep slyly. packages use slyly')
>>>> # access table metadata
>>>> for name, sql in conn.execute("select name, sql from sqlite_master where type = 'table';"):
.... print name, sql
Part CREATE TABLE Part(
partkey INTEGER PRIMARY KEY, name, mfgr, brand, type, size INTEGER, container, retailprice real, comment)
Supplier CREATE TABLE Supplier(
suppKey INTEGER PRIMARY KEY, name, address, nationkey INTEGER, phone, acctbal REAL, comment)
PartSupp CREATE TABLE PartSupp(
partKey INTEGER, suppKey INTEGER, availqty INTEGER, supplycost REAL, comment)
Customer CREATE TABLE Customer(CustKey INTEGER PRIMARY KEY, name, address, nationkey INTEGER, phone, acctbal REAL, mktsegment, comment)
Nation CREATE TABLE Nation(
nationkey INTEGER PRIMARY KEY, name, regionkey INTEGER, comment)
Region CREATE TABLE Region(
regionkey INTEGER PRIMARY KEY, name, comment)
LineItem CREATE TABLE LineItem(
orderKey INTEGER, partKey INTEGER, suppKey INTEGER, lineNumber INTEGER, quantity INTEGER, extendedPrice REAL, discount REAL, tax REAL, returnFlag, lineStatus, shipDate, commitDate, receiptDate, shipInstruct, shipMode, comment)
Orders CREATE TABLE Orders(
orderKey INTEGER PRIMARY KEY, custKey INTEGER, orderStatus, totalPrice REAL, orderDate, orderPriority, clerk, shipPriority, comment)
Producing the Tables from the Paper
Sections 6 & 8 in the paper.
The raw data for the tables in the paper are in the file
``sqpyte-benchmarks/paper-submission.json``. To process and reproduce the tables
in the paper, run::
$ python make.py makepdf
This will produce the file ``results.pdf`` containing all the tables that appear
in the paper. The process takes about 15 minutes.
Within the JSON file, in the benchmark scripts and in the debug output on the
console the names for the various sqpyte variants are different that in the
paper. The mapping is:
- "sqpyte" in the files corresponds to SQPyte in the paper.
- "dontcache" in the files corresponds to SQPyte[no-flags] in the paper.
- "dontlookinside" in the files corresponds to SQPyte[no-inline] in the paper.
- "sqlite" in the files corresponds to SQLite in the paper.
- "java" in the files corresponds to H2 in the paper.
Running the Benchmarks
Sections 6 & 8 in the paper.
You can re-run the benchmarks of the paper. Doing a full re-run is a somewhat
lengthy process that takes a few days. To start it, run::
$ python make.py runbench
You can also run the benchmarks with a lot less iterations, which takes only a
few hours (probably at least three), like this::
$ python make.py quickbench
However, the quick results are not as meaningful, because they contain very few
iterations, thus not giving the JITs enough time to warm up. This gives a
strong advantage to unjitted SQLite, particularly in the TPC-H benchmark set.
The quick benchmark run also does not repeat benchmarks, so there is not enough
data to produce meaningful errors for the numbers.
Both of these commands will put the benchmark results into a file
After running the benchmark, you can produce tables with your own results by
$ python make.py makepdf
(makepdf will use the data from the most recent JSON file).
Looking at the Source Code
Sections 4 & 7 in the paper.
If you are interested in looking at the source code, here are some entry points
The directories created by the ``make.py`` are:
- ``sqpyte`` the SQPyte implementation:
- ``sqpyte/sqlite`` a modified version of the sqlite source code to make it callable from RPython
- ``sqpyte/sqpyte`` the RPython source code of SQPyte
- ``pypy-sqpyte`` a fork of PyPy that contains the PyPy-SQPyte bridge
- ``tpch`` the TPC-H benchmark database generator
- ``sqpyte-benchmarks`` the benchmark set for SQPyte
- ``sqpyte/sqlite/src/vdbe.c`` contains the source code of the SQLite bytecode
- ``sqpyte/sqlite/src/sqpyte.c`` contains the SQLite bytecodes broken into
- ``sqpyte/sqpyte/translated.py`` contains the RPython versions of those
bytecodes that were translated to RPython
- ``sqpyte/sqpyte/function.py`` contains the RPython versions of (aggregate)
functions like ``sum`` and ``avg``.
- ``pypy-sqpyte/pypy/module/sqpyte/interp_sqpyte.py`` contains the RPython
implementation of the ``sqpyte`` module, which is mostly type-conversion
code, and some code interacting with ``sqpyte``.
This virtual machine image is based upon Ubuntu. Information about Ubuntu
licensing can be found at:
A binary and the source for SQPyte is also included. Its constituent
interpreters are licensed separately, both under the MIT License. For more
information, please refer to:
The source code of SQLite is in the public domain:
The TPC-H benchmark is from TPC: