This page documents the index file format used by bgenix. For more information on bgenix, see the wiki page on bgenix.
bgenix index files are sqlite files; as such they are easy to read or manipulate using popular program languages or using the sqlite3 command-line tool. (sqlite version 3.8.2 or above is needed to work with these files).
For example, the command
sqlite3 myfile.bgen.bgi "SELECT * FROM Variant LIMIT 10"
will list the first few variants in the index.
This snippet uses the RSQLite package to load the same information into R:
library( RSQlite ) index = dbConnect( dbDriver( SQLite ), "myfile.bgen.bgi" ) variants = dbGetQuery( index, "SELECT * FROM Variant LIMIT 10" )
And this snippet does the same thing in python:
import sqlite3 index = sqlite3.connect( "myfile.bgen.bgi" ) variants = index.execute( "SELECT * FROM Variant LIMIT 10" )
The index is stored in a single table (called
Variant by default). The schema of this table is as follows:
CREATE TABLE Variant ( chromosome TEXT NOT NULL, position INT NOT NULL, rsid TEXT NOT NULL, number_of_alleles INT NOT NULL, allele1 TEXT NOT NULL, allele2 TEXT NULL, file_start_position INT NOT NULL, size_in_bytes INT NOT NULL, PRIMARY KEY (chromosome, position, rsid, allele1, allele2, file_start_position ) ) WITHOUT ROWID;
By default this table is created using the "WITHOUT ROWID" option. This means that (unlike standard tables in sqlite) the table does not have an extra, hidden
rowid column. Instead, the table is stored on-disk as a sorted table, sorted in lexicographical order by the fields in the
PRIMARY KEY field.
The index table stores the first two alleles of each variant in the index. Other alleles are not stored at the moment;
bgenix currently does not make use of allele information.
size_in_bytes columns specify the range of bytes within the indexed bgen file that contain the data. Implementations may seek to byte
file_start_position in the bgen file, and read
size_in_bytes bytes from the file. The resulting data will then contain the "variant data" and "genotype data" blocks for the corresponding variant.
Newer versions of
bgenix additionally store metadata about the bgen file in a
Metadata table. When loading an index, this information is used to verify that the index file matches the bgen file it is being used for. The schema of this table is:
CREATE TABLE Metadata ( filename TEXT NOT NULL, file_size INT NOT NULL, last_write_time INT NOT NULL, first_1000_bytes BLOB NOT NULL, index_creation_time INT NOT NULL );
The table will have one row; the first three records reflect the name, size, and last write time of the bgen file corresponding to this index. The fourth column contains the first 1000 bytes (or fewer if the file is smaller) of the bgen file.