Error sample/snp stats - source malformed

Issue #70 new
Arjan Boltjes created an issue

I created vcf.gz files per chromosome for a sample of our genotyping data, including a .sample file.
If I check these by calling for sample stats or snp stats this works nicely, resulting in these stats in a text file. Code, e.g. for chromosome 22:

qctool_v204 -g TAX_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz -s TAX_aegs_combo_1kGp3GoNL5_RAW.taxonomisis.sample -sample-stats -osample TAX_aegs_combo_1kGp3GoNL5_RAW_chr22.samplestat.txt

qctool_v204 -g TAX_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz -s TAX_aegs_combo_1kGp3GoNL5_RAW.taxonomisis.sample -snp-stats -osnp TAX_aegs_combo_1kGp3GoNL5_RAW_chr22.snpstat.txt

Subsequently, I renamed the sample names in the vcf.gz files with bcftools, like this:

bcftools_v1.6 reheader -s TAX_sample_names_ordered.txt -o TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz TAX_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz

Checking the sample names shows that the renaming seems to have worked properly

bcftools_v1.6 query -l TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz

However, when I now call for sample or snp stats, I get an error for both calls.

qctool_v204 -g TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz -s TAX_REN_aegs_combo_1kGp3GoNL5_RAW.taxonomisis.sample -sample-stats -osample TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.samplestat.txt


Code line samplestat:
!! Error (genfile::MalformedInputError): Source "TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz" is malformed on line 55..


qctool_v204 -g TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz -s TAX_REN_aegs_combo_1kGp3GoNL5_RAW.taxonomisis.sample -snp-stats -osnp TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.snpstat.txt


Code line snpstat:
!! value = "!! Error reading data for variant rs201906224:16051722:TA:T 22 16051722 TA T: Source "TAX_REN_aegs_combo_1kGp3GoNL5_RAW_chr22.vcf.gz" is malformed on line 55.".
qctool_v204: ../statfile/src/DelimitedStatSink.cpp:85: std::string statfile::{anonymous}::escape_string(const string&, char, const string&, const string&, bool): Assertion `value.find_first_of( begin_quote ) == std::string::npos' failed.

btw, I changed the sample names in the .sample also, via R, either ID_1, ID_2 or both, but this does not seem to the issue. It still works with the unchanged vcf.gz files (based on either ID_1 or ID_2).

Might this be an issue with qctool?
Is the vcf file output from bcftools (renaming bit) different from the vcf file output from qctool (original creation)?
vcf version issues, perhaps?

Comments (1)

  1. Gavin Band repo owner

    Hi Arjan,

    A couple of questions:

    1: are the sample names on line 55 of the file, or is that line something else?

    2: is there anything odd about that line, e.g. trailing whitespace? (hidden Windows-style line endings could cause this too I think)

    3: the message seems to suggest something about quote marks - I don't understand why - there aren’t by any chance quotation marks in the sample names?

    If unsure, can you send me first few lines of the vcf file, including the metadata please? (Send to my email if you don’t want to post it here).

    Best wishes,


  2. Log in to comment