VCFTools Incompatible with VCF4.1

#184 Merged at dcd9809
Repository
saketkc
Branch
default
Repository
galaxy
Branch
default
Author
  1. Saket Choudhary
Reviewers
Description

Trello L https://trello.com/card/vcftools-incompatible-with-vcf4-1/506338ce32ae458f6d15e4b3/633

Adding a small check to skip the assertion of tagNumber being an integer for VCF4.1 files.

I am not sure if there are better ways or beter checks that can be implemented to do this. So, this should require discussions.

The description of ##INFO and ##FORMAT tag as given on [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41] is :

##INFO=<ID=ID,Number=number,Type=type,Description=”description”>

Possible Types for INFO fields are: Integer, Float, Flag, Character, and String.

The Number entry is an Integer that describes the number of values that can be included with the INFO field. For example, if the INFO field contains a single number, then this value should be 1; if the INFO field describes a pair of numbers, then this value should be 2 and so on. If the field has one value per alternate allele then this value should be 'A'; if the field has one value for each possible genotype (more relevant to the FORMAT tags) then this value should be 'G'. If the number of possible values varies, is unknown, or is unbounded, then this value should be '.'. The 'Flag' type indicates that the INFO field does not contain a Value entry, and hence the Number should be 0 in this case. The Description value must be surrounded by double-quotes. Double-quote character can be escaped with backslash (\") and backslash as \.

Comments (3)

  1. Jennifer Jackson

    Hi Jeremy -

    It crossed my mind that v4.1 might have been the issue for this one, thanks for working on a fix! bugs abdulhun@gmail.com 6-10

    Could it somehow be related to this ticket (the second issue, the first one I don't know about)? Seems unlikely since the bcf was created by mpileup in Galaxy. I think the bcftools view tool might just have another problem, or the input data is problematics (I can't figure out what though), but am asking to double check. https://trello.com/c/fgppgOeR

    One of the bug reports from the user about the problem in the ticket is here: jyochem@uwyo.edu 5-13

    Anyway, will this go on Main soon? Did it already? I am restarting the job from the first bug report to test, the wrapper is the same version but I don't think that needs a change.

    The second one I'm not sure about - that job eventually hits wall-time out and errors if the problem still exists. So, just eats up resource.

    I am looking through other recent bug reports that are unresolved to see if any others are related - pretty sure this is the only one, but double checking.

    Thanks! jen

  2. Jeremy Goecks

    Thanks for the contribution Saket. IMO, we need to phase out usage of vcfpytools and move to simply wrapping vcftools. vcfpytools is very old and, as you and others have discovered, not compatible with newer versions of the VCF format. vcftools, OTOH, is under active development and driven by the 1000 genomes project.

    Does anyone else feel strongly about either continuing to use vcfpytools or moving to vcftools?