Merging cohorts

Create issue
Issue #67 new
Former user created an issue

Hi,

I have QCTOOL v2.0.3 installed ().

I want to merge three datasets into 1. See the log below. It throws an error. How can I fix this?

Best!

Sander

Welcome to qctool (version: 2.0.3, revision 554d151)

(C) 2009-2017 University of Oxford

Opening genotype files : [ ] (0/3,0.0s,0.0/s) Opening genotype files : [****] (3/3,0.1s,51.3/s) Opening genotype files : [****] (3/3,0.1s,45.5/s) ========================================================================

Input SAMPLE file(s): "aegs1_postqcmichimp_n890.txt" "aegs2_postqcmichimp_n954.txt" "aegs3_postqcmichimp_n649.txt" Output SAMPLE file: "aegs.1kgp3.chr22.sample". Sample exclusion output file: "(n/a)".

Input GEN file(s): cohort 1: (not computed) "../AEGS1_AffySNP5/AEGS1_SNP5_1000Gp3/aegs1.1kgp3.chr22.dose.vcf.gz" (total 1 sources, number of snps not computed). Number of samples: 890

                             cohort 2:
                                     (not computed)  "../AEGS2_AffyAxiomGWCEU1/AEGS2_AxiomGWCEU_1000Gp3/aegs2.1kgp3.chr22.dose.vcf.gz"
                                     (total 1 sources, number of snps not computed).
                  Number of samples: 954

                             cohort 3:
                                     (not computed)  "../AEGS3_GSA/AEGS3_GSA_1000Gp3/aegs3.1kgp3.chr22.dose.vcf.gz"
                                     (total 1 sources, number of snps not computed).
                  Number of samples: 649

                  Total all cohorts: 2493 samples.
    Number of samples (post-filter): 2143

Output GEN file(s): "aegs.1kgp3.chr22.vcf.gz" Output SNP position file(s): (n/a) Sample filter: QC2018_FILTER IN (passed).

of samples in input files: 2493.

of samples after filtering: 2143 (350 filtered out).

========================================================================

VCFFormatSNPDataSink::write_header(): FORMAT entries are: ##FORMAT=<ID=GT,Type=String,Number=1,Description="Genotype"> ##FORMAT=<ID=DS,Type=Float,Number=1,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]"> ##FORMAT=<ID=GP,Type=Float,Number=3,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1 ">

Processing SNPs : (0/?,0.0s,0.0/s)qctool_v203: ../genfile/src/VCFFormatSNPDataSink.cpp:77: virtual bool genfile::{anonymous}::DataWriter::set_sample(std::size_t): Assertion `m_state == eInitialised || m_state == eValueSet || m_state == eSampleSet' failed.

Comments (9)

  1. Gavin Band repo owner

    Looks like it is working in qctool_v2.0.1 but not qctool_v2.0.3 so likely a recent bug. Can you use the earlier version for now?

  2. Gavin Band repo owner

    Fixed in 98bb83e06001 (qctool_v2.0.4)

    Additionally, there was an issue with the order of VCF output fields in qctool_v2.0.1 using your data - this is now fixed so it mirrors the input order.

  3. Sander W. van der Laan

    What is different with our data to what you expect normally? I mean, I am trying to understand why the fields would change order. These data come straight from the Michigan Imputation Server. And I just merge the data using qctool.

  4. Gavin Band repo owner

    Nothing really different - it’s a bug (#68), I didn’t know about this before but it’s pretty annoying if you run into it so I’ve fixed it.

  5. Log in to comment