Clone wiki

bgen / parsing genotype probability data

This page documents the API that probability "setter" objects (such as those passed to the bgen::View::read_genotype_data_block() function, or the lower-level bgen::parse_probability_data() function) must implement. See the bgen library API for an overview of the API.

Setter API overview

The following pseudocode shows how methods of the setter object are called during a call to parse_probability_data() or bgen::View::read_genotype_data_block():

setter.initialise( number of samples, number of alleles ) ;
setter.set_min_max_ploidy( minimum and maximum ploidy and value counts ) ;
for each sample:
   if( setter.set_sample( index of sample ) ):
      setter.set_number_of_entries( ploidy, number of values, type of values ) ;
      for each probability value for this sample:
         setter.set_value( index, probability value ) ;
setter.finalise() ;

Thus, the setter object must implement six methods - initialise(), set_min_max_ploidy(), set_sample(), set_number_of_entries(), set_value(), and finalise(). A working example of an object implementing this API can be found in the ProbSetter class in the bgen_to_vcf example program.

Note: Currently, the methods set_min_max_ploidy() and finalise() are currently optional - they will be called if they are present and have the correct signature (see below). However, we recommend that all six methods are implemented; we may remove this feature in a future version of the API.

API detail

The following gives more detail on the semantics of each method call above:

setter object method called semantics
initialise( N, K ) once prepare for data for N samples and a variant with K alleles
set_min_max_ploidy( minP, maxP, minV, maxV ) once See below for arguments. Samples for this variant will have ploidy within the bounds minP...maxP inclusive. Similarly, the number of probabilty values for each sample will be within the bounds minV...maxV. This information is useful to preallocate storage.
set_sample( i ) once per sample Prepare to receive data for sample i. (Should return false if do not want the data for sample i).
set_number_of_entries( P, V, order_type, value_type ) once per sample Set ploidy (P), number of probability values (V), and phased data status (order_type) of the sample i.
set_value( j, value ) once per probability value per sample set genotype probability j for sample i to the given value.
finalise() once Called at end of parsing

The method arguments above have specific type requirements. These are:

setter object method argument type requirements
initialise N unsigned integer, convertible from uint32_t
K unsigned integer, convertible from uint32_t
set_min_max_ploidy (return type) must be void
minP must be uint32_t
maxP must be uint32_t
minV must be uint32_t
maxV must be uint32_t
set_sample i unsigned integer, convertible from uint32_t
set_number_of_entries P unsigned integer, convertible from uint32_t
V unsigned integer, convertible from uint32_t
order_type convertible from the bgen::OrderType enum
value_type convertible from the bgen::ValueType enum
finalise (return type) must be void

Note as mentioned above, currently the set_min_max_ploidy() and finalise() methods will only be called if they are declared with the exact signature given in the above table. However, we may make them mandatory in a future version of the API so we recommend implementing them.

Understanding the order_type parameter

The order_type argument to set_number_of_entries() is used to communicate the type of data being returned. Its values are defined in bgen/types.hpp and have the following meanings:

order_type meaning
ePerUnorderedGenotype Data is unphased, i.e. one probability per possible genotype
ePerOrderedHaplotype Data is phased, i.e. one probability per possible allele per haplotype

Additional values declared in bgen/types.hpp are currently unused in the bgen API.

Note: For bgen <= 1.1, all data is unphased so order_type=ePerUnorderedGenotype.

Updated