BGEN reference implementation
This repository contains a reference implementation of the BGEN format in C++, originally sourced from the qctool implementation. In addition three utilities - bgenix, which provides indexed access to BGEN files, cat-bgen which efficiently concatenates BGEN files, and edit-bgen which is used to manipulate BGEN metadata - are provided. See the wiki for documentation on these programs.
An example program,
bgen_to_vcf, is also provided; as the name suggests it converts a BGEN file to VCF. This is intended as an example program that shows how to use the BGEN file reading API.
!! Important note on the UK Biobank data
The UK Biobank has released imputed data for the full release in BGEN format, with accompanying bgenix index files. However, these index files appear not to be named in the way bgenix expects by default. Please see here for information on working around this.
The following programs are built with the BGEN repository.
- bgenix - a tool to index and efficiently retrieve subsets of a BGEN file.
- cat-bgen - a tool to efficiently concatenate BGEN files.
- edit-bgen - a tool to edit BGEN file metadata.
This BGEN implementation is released under the Boost Software License v1.0. This is a relatively permissive open-source license that is compatible with many other open-source licenses. See this page and the file LICENSE_1_0.txt for full details.
This repository also contains code from the sqlite, boost, and zstandard libraries, which comes with their own respective licenses. (respectively, public domain, the boost software license, and the BSD license. These libraries are not used in the core BGEN implementation, but may be used in the example programs provided.
A tarball of the latest master branch is available here: http://bitbucket.org/gavinband/bgen/get/master.tar.gz.
Alternatively, use mercurial to download the master branch as follows:
hg clone https://email@example.com/gavinband/bgen -u master
(This command can take a while.)
Additionally, pre-built version of the bgen utilities may be available from this page. Note: the recommended use is to download and compile bgenix for your platform; these binaries are provided for convenience in getting started quickly.
To compile the code, use the supplied waf build tool:
./waf-1.8.13 configure ./waf-1.8.13
Results will appear under the
Note: a full build requires a compiler that supports C++-11, e.g. gcc v4.7 or above. To specify the compiler used, set the
CXX environment variable during the configure step. For example (if your shell is
CXX=/path/to/g++ ./waf-1.8.13 configure ./waf-1.8.13
The sqlite and zstd libraries are written in C; to specify the C compiler you can additionally add
CC=/path/to/gcc. We have tested compilation on gcc 4.9.3 and 5.4.0, and using clang, among others.
If you don't have access to a compiler with C++-11 support, you can still build the core bgen implementation, but won't be able to build the applications or example programs. See the wiki for more information.
BGEN's tests can be run by typing
or, for more recent versions:
If all goes well a message like
All tests passed should be printed.
If you have Robot Test Framework installed, you can instead run the full suite of unit and functional tests like so:
Test results will be placed in the directory
Trying an example
The example program
bgen_to_vcf reads a bgen file (v1.1 or v1.2) and outputs it as a VCF file to stdout. You can try running it
which should output vcf-formatted data to stdout. We've provided further example bgen files in the
will install the applications listed above into a specified system or user directory. By default this is
/usr/local. To change it, specify the prefix at the configure step:
./waf-1.8.13 configure --prefix=/path/to/installation/directory ./waf-1.8.13 install
The programs listed above will be installed into a folder called
bin/ under the prefix dir, e.g.
bgenix will be installed as
Note that in many cases there's no need for installation; the executables are self-contained. The install step simply copies them into the destination directory.
(The installation prefix need not be a system-wide directory. For example, I typically specify an installation directory within my home dir, e.g.
This repo follows the branch naming practice in which
master represents the most up-to-date code considered in a 'releasable' state. If you are interested in using bgen code in your own project, we therefore recommend cloning the
master branch. Code development takes place in the
default branch and/or in feature branches branched from the
default branch. The command given above downloads the master branch, which is what most people will want.