Just as a follow up while I wait to hear from you, I can now confirm that the analysis of ~156,000 (genotyped ~60K and imputed ~90K) ran successfully with a .bgen file generated in the two step process. ie.. merging genotyped.bgen and imputed.bgen to genotyped_imputed.gen and then converting this .gen to .bgen.
I will wait to hear from you on your thoughts on what I could be doing wrong in the one step merging and generation of .bgen file.
Kind Regards, Praveen.
I was trying to merge a set of genotyped UKBiobank variants with imputed UKBiobank variants to create a .bgen/.sample file and analyse the variants in SNPSTATS, BOLT-LMM and Rvtest. Since all the jobs got killed, I did some testing with a subset of 10 variants in ~450K samples.
For genotyped variants ,
- Extracted 10 variants and 450K samples from PLINK files to create a .bed/.bim/.fam files
- Converted this .bfile to genotyped.bgen and genotyped.samples in qctool
For imputed variants,
- Extracted 10 variants and 450K samples from UKBiobank HRC data to create an imputed.bgen and imptued.samples file in qctool.
And then merged these two datasets -
qctool -g imputed.bgen -s imputed.samples -merge-in genotyped.bgen genotyped.samples -og genotyped_imputed.bgen -os genotyped_imputed.samples
When I tried to analyse variants in this merged file, all the jobs got killed. So I tried to generate snp stats using qctool using the command to understand the issue and whether it’s some chromosome/position formatting.
qctool -g genotyped_imputed.bgen -snp-stats -osnp genotyped_imputed.stats
This terminated with the following error -
Processing SNPs : (4/?,4.0s,1.0/s)qctool: ../genfile/include/genfile/zlib.hpp:70: void genfile::zlib_uncompress(const byte_t, const byte_t, std::vector<T>*) [with T = unsigned char; genfile::byte_t = unsigned char]: Assertion `result == 0' failed. Aborted
But if I follow the steps below, the final .bgen/.samples works fine with qctools and all other programs -
qctool -g imputed.bgen -s imputed.samples -merge-in genotyped.bgen genotyped.samples -og genotyped_imputed_stepone.gen -os genotyped_imputed_stepone.samples qctool -g genotyped_imputed_stepone.gen -s genotyped_imputed_stepone.samples -og genotyped_imputed_steptwo.bgen -os genotyped_imputed_steptwo.samples
I tried using zlib compression in the single step of merging and creating merged .bgen files but this didn’t work. While the second option is okay, I am trying to avoid writing a large .gen file to create a .bgen file that works fine given my final merged dataset contains 250,000 variants which is a combination of ~50K genotyped variants and ~150K imputed variants.
Sorry about this long e-mail and I hope I managed to explain the case. I am almost certain that I am missing some options that I should use while merging and creating a .bgen/sample file. But I can’t figure out what’s going on.
Thank you for looking into this in advance.
Kind Regards, Praveen.