Clone wiki

bgen / the_BGEN_library_API

This page documents the API for the BGEN library.

High-level API for file reading

The bgen::View class provides a high-level way to read data from a bgen file. It uses the lower-level bgen.hpp API (described below) under the hood, but wraps up many of the details in a user-friendly class. (In addition it can be used with the bgen::IndexQuery class for indexed access.) Here is a brief pseudo-program using the bgen::View class (for a fully working example, see the bgen_to_vcf.cpp example program):

#include "genfile/bgen/View.hpp"

using namespace genfile ;

struct ProbSetter {
 // class to do something with probability data
 // See below for the API
} ;

int main( void ) {
  bgen::View view( "myfile.bgen" ) ;

  std::string chromosome ;
  uint32_t position ;
  std::string rsid ;
  std::vector< std::string > alleles ;
  std::vector< std::vector< double > > probs ;

  while( view.read_variant( &SNPID, &rsid, &chromosome, &position, &alleles ) ) {
    ProbSetter setter ;
    view.read_genotype_data_block( setter ) ;
  }
}

To actually get the above example to do some work, you will need to implement the ProbSetter class. Thus must implement the API described on the parsing genotype probability data page. Example implementations are: the ProbSetter class in the bgen_to_vcf.cpp example program, or the AlleleCounter class in the count_alleles.cpp example program.

The bgen::View class has the following methods:

bgen::View method notes
Open a bgen file constructor Takes filename as argument
List samples get_sample_ids returns data through a setter object. For example, a C++11 lambda function such as []( std::string const& id ) { // do something with id } should work.
Read variant identifying data for next variant read_variant()
Ignore genotype probability data for this variant ignore_genotype_data_block
Read genotype probability data for this variant read_genotype_data_block Data is returned using the parsing genotype probability data API.

Note please note that the API for the bgen::View class is currently somewhat experimental, and might change in a future version.

Accessing data subsets using an index file

The bgen::IndexQuery class can be used with bgen::View to provide a view of a subset of data. Here is an example:

#include "genfile/bgen/View.hpp"
#include "genfile/bgen/IndexQuery.hpp"

using namespace genfile ;

struct ProbSetter ; // as above

int main( void ) {
  bgen::View view( "myfile.bgen" ) ;

  // implement query
  bgen::IndexQuery::UniquePtr query = bgen::IndexQuery::create( "myfile.bgen.bgi" ) ;
  // Use bgen::IndexQuery methods to build a query
  // relevant methods are include_range(), exclude_range(), include_rsids(), exclude_rsids()
  // Here we'll just include a specific range
  query->include_range( "1:0-1000000" ) ;
  query->initialise()
  view.set_query( query ) ;

  // Ok, now iterate as before.
  while( view.read_variant( &SNPID, &rsid, &chromosome, &position, &alleles ) ) {
    ProbSetter setter ;
    view.read_genotype_data_block( setter ) ;
  }
}

Note please note that the API for the bgen::IndexQuery class is currently somewhat experimental, and might change in a future version. In particular we're likely to 1. remove the need to treat IndexQuery as a reference type (i.e. to hold it by pointer) and 2: remove the need to call query->initialise().

Lower-level API for file reading

A lower-level, C-style API for working with BGEN files is found in bgen.hpp. The following contains a table of major functions.

operation function notes
Read offset value read_offset Read first four bytes of given stream
Read header block read_header_block This function reads a bgen header block from the given stream and populations a bgen::Context object with information about the file. The number of bytes read is returned
Read sample identifiers read_sample_identifier_block This function reads a sample identifier block from the the given stream, and returns values using a specific setter object.
Read identifying data for a variant read_snp_identifying_data Data is returned in fields specified as arguments
Read and parse probability data from this variant read_genotype_data_block This operation returns data using the parsing genotype probability data API. To avoid repeated memory allocations when processing many variants, a working buffer must be provided to this function.

Parsing probability data

See parsing genotype probability data.

Updated