IMPORTANT: The hgvs code and issues moved to https://github.com/biocommons/hgvs on March 9, 2017. This repo contains the content until that date.
hgvs - Python library to parse, format, validate, normalize, and map sequence variants
The hgvs package provides a Python library to facilitate the use of genome, transcript, and protein variants that are represented using the Human Genome Variation Society (varnomen) recommendations.
A preview release of hgvs 0.5.0 is available, with support for GRCh38 and local sequence sources. See installation notes.
- Parsing is based on formal grammar.
- An easy-to-use object model that represents most variant types (SNVs, indels, dups, inverstions, etc) and concepts (intronic offsets, uncertain positions, intervals)
- A variant normalizer that rewrites variants in canoncial forms and substitutes reference sequences (if reference and transcript sequences differ)
- Formatters that generate HGVS strings from internal representations
- Tools to map variants between genome, transcript, and protein sequences
- Reliable handling of regions reference-transcript discrepancy
- Pluggable data providers support alternative sources of transcript mapping data
- Extensive automated tests, including those for all variant types and "problematic" transcripts
- You are encouraged to browse issues. Please report any issues you find.
- Use a pip package specification to ensure stay within minor releases for API stability. For example, hgvs >=0.4,<0.5.
These examples are for the upcoming 0.5.0 release.
See Installation instructions if you have installation troubles.
$ mkvirtualenv hgvs-test (hgvs-test)$ pip install --upgrade setuptools (hgvs-test)$ pip install hgvs (hgvs-test)$ python >>> import hgvs.dataproviders.uta >>> import hgvs.parser >>> import hgvs.variantmapper # start with these variants as strings >>> hgvs_g, hgvs_c = 'NC_000007.13:g.36561662C>T', 'NM_001637.3:c.1582G>A' # parse the genomic variant into a Python structure >>> hp = hgvs.parser.Parser() >>> var_g = hp.parse_hgvs_variant(hgvs_g) >>> var_g SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T) # SequenceVariants are composed of structured objects, e.g., >>> var_g.posedit.pos.start SimplePosition(base=36561662, uncertain=False) # format by stringification >>> str(var_g) 'NC_000007.13:g.36561662C>T' # initialize the mapper for GRCh37 with splign-based alignments >>> hdp = hgvs.dataproviders.uta.connect() >>> evm = hgvs.assemblymapper.AssemblyMapper(hdp, ... assembly_name='GRCh37', alt_aln_method='splign', ... replace_reference=True) # identify transcripts that overlap this genomic variant >>> transcripts = evm.relevant_transcripts(var_g) >>> sorted(transcripts) ['NM_001177506.1', 'NM_001177507.1', 'NM_001637.3'] # map genomic variant to one of these transcripts >>> var_c = evm.g_to_c(var_g, 'NM_001637.3') >>> var_c SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A) >>> str(var_c) 'NM_001637.3:c.1582G>A' # CDS coordinates use BaseOffsetPosition to support intronic offsets >>> var_c.posedit.pos.start BaseOffsetPosition(base=1582, offset=0, datum=1, uncertain=False) # VARIANT NORMALIZATION # rewrite ins as dup (depends on sequence context) >>> import hgvs.normalizer >>> hn = hgvs.normalizer.Normalizer(hdp) >>> hn.normalize(hp.parse_hgvs_variant('NM_001166478.1:c.35_36insT')) SequenceVariant(ac=NM_001166478.1, type=c, posedit=35dupT) # during mapping, variants are normalized (by default) >>> c1 = hp.parse_hgvs_variant('NM_001166478.1:c.31del') >>> c1 SequenceVariant(ac=NM_001166478.1, type=c, posedit=31del) >>> c1n = hn.normalize(c1) >>> c1n SequenceVariant(ac=NM_001166478.1, type=c, posedit=35delT) >>> g = evm.c_to_g(c1) >>> g SequenceVariant(ac=NC_000006.11, type=g, posedit=49917127delA) >>> c2 = evm.g_to_c(g, c1.ac) >>> c2 SequenceVariant(ac=NM_001166478.1, type=c, posedit=35delT)
There are more examples in the documentation.
Citing hgvs (the package)
The hgvs package is intended to be a community project. Please see Contributing to get started in submitting source code, tests, or documentation. Thanks for getting involved!