Merging and conversion of .gen to .bgen

Create issue
Issue #36 new
Former user created an issue

Dear Gavin,

Just as a follow up while I wait to hear from you, I can now confirm that the analysis of ~156,000 (genotyped ~60K and imputed ~90K) ran successfully with a .bgen file generated in the two step process. ie.. merging genotyped.bgen and imputed.bgen to genotyped_imputed.gen and then converting this .gen to .bgen.

I will wait to hear from you on your thoughts on what I could be doing wrong in the one step merging and generation of .bgen file.

Kind Regards, Praveen.

Dear Gavin,

I was trying to merge a set of genotyped UKBiobank variants with imputed UKBiobank variants to create a .bgen/.sample file and analyse the variants in SNPSTATS, BOLT-LMM and Rvtest. Since all the jobs got killed, I did some testing with a subset of 10 variants in ~450K samples.

For genotyped variants ,

  • Extracted 10 variants and 450K samples from PLINK files to create a .bed/.bim/.fam files
  • Converted this .bfile to genotyped.bgen and genotyped.samples in qctool

For imputed variants,

  • Extracted 10 variants and 450K samples from UKBiobank HRC data to create an imputed.bgen and imptued.samples file in qctool.

And then merged these two datasets -

qctool -g imputed.bgen -s imputed.samples -merge-in genotyped.bgen genotyped.samples -og genotyped_imputed.bgen -os genotyped_imputed.samples

When I tried to analyse variants in this merged file, all the jobs got killed. So I tried to generate snp stats using qctool using the command to understand the issue and whether it’s some chromosome/position formatting.

qctool -g genotyped_imputed.bgen -snp-stats -osnp genotyped_imputed.stats

This terminated with the following error -

Processing SNPs : (4/?,4.0s,1.0/s)qctool: ../genfile/include/genfile/zlib.hpp:70: void genfile::zlib_uncompress(const byte_t, const byte_t, std::vector<T>*) [with T = unsigned char; genfile::byte_t = unsigned char]: Assertion `result == 0' failed. Aborted

But if I follow the steps below, the final .bgen/.samples works fine with qctools and all other programs -

qctool -g imputed.bgen -s imputed.samples -merge-in genotyped.bgen genotyped.samples -og genotyped_imputed_stepone.gen -os genotyped_imputed_stepone.samples qctool -g genotyped_imputed_stepone.gen -s genotyped_imputed_stepone.samples -og genotyped_imputed_steptwo.bgen -os genotyped_imputed_steptwo.samples

I tried using zlib compression in the single step of merging and creating merged .bgen files but this didn’t work. While the second option is okay, I am trying to avoid writing a large .gen file to create a .bgen file that works fine given my final merged dataset contains 250,000 variants which is a combination of ~50K genotyped variants and ~150K imputed variants.

Sorry about this long e-mail and I hope I managed to explain the case. I am almost certain that I am missing some options that I should use while merging and creating a .bgen/sample file. But I can’t figure out what’s going on.

Thank you for looking into this in advance.

Kind Regards, Praveen.

Comments (0)

  1. Log in to comment