Mouse GO annotations should reference MGI

Issue #100 new
Chris Mungall created an issue

mouse Reelin: https://www.wikidata.org/wiki/Q14331165

(aside: have you considered affixing the species to the label, at least for non-human, to avoid masses of identically named proteins once you start bringing in more species?)

When I click on one of the biological processes, such as spinal cord patterning (GO:0021511) it says:

Stated in UniProt
UniProt ID Q60841
retrieved 10 August 2016
language of work or name English

This is misleading. The annotation was made by MGI, not UniProt.

Additionally, it would be useful to see other evidence and provenance, as in:

http://amigo.geneontology.org/amigo/gene_product/MGI:MGI:103022

(I believe this is coming as it was in Ben's talk at ICBO)

I would commit a PR but I find the SPARQL queries on the uniprot server a bit opaque. If you are interested in python code for pulling directly from source files I may be able to contribute.

Comments (4)

  1. Sebastian Burgstaller

    Thanks for your input! Adding species info to the label instead of the description is a good idea.

    Regarding the GO references: Following the discussion here (https://github.com/OBOFoundry/OBOFoundry.github.io/issues/285), I am currently working on how to represent best the complete annotation information available through QuickGO/Amigo. Currently, I use the QuickGO REST API to retrieve the annotations. I will post updated annotation ref examples in the coming days.

    -sebastian

  2. Chris Mungall reporter

    Thanks!

    Related question: it seems this repo has a variety of different methods for pulling in external info. There are some ontology-specific py files that use rdflib (rdflib is a bit low level and slow for this IMHO), but also a generic one that uses the OLS API. And it sounds like you're switching from uniprot sparql to QuickGO API. Is there a deliberate decision to go with services rather than static exports? There are pros and cons both ways - is there a high level design document for these decisions? It might make it easier for casual drive-by commenters such as myself to advise.

  3. Sebastian Burgstaller

    For the OBO space, I wrote a generalized OBO importer which queries the OLS REST API directly and generates the ontology structure in Wikidata. In order for a certain ontology which should be imported into Wikidata, the ontology properties and xrefs need first to be mapped to Wikidata properties (or the properties need be proposed for creation in Wikidata). This has worked pretty well for GO. The code still using rdflib is essentially being phased out. Are there any alternatives you would consider which are also easily accessible from Python(3)?

    Performance-wise, using OLS is not limiting, as our real bottleneck is the write calls per minute the Wikidata API allows us to perform. OLS allows me, with few lines of code, to walk through the graph json and sync an ontology with Wikidata. That seems substantially more convenient than parsing an OWL file, especially in Python.

    Regarding GO annotations: Initially, when I started out with this, the Uniprot SPARQL endpoint did not return the full GO annotation set for a Uniprot ID (did not check yet if that has changed now) and also no details on how the annotations came into existence, this is why I chose to query them separately using QuickGo. Furthermore, QuickGO receives annotation updates faster and one API call returns the comprehensive annotation for one Uniprot ID, including provenance.

    If there are more efficient ways I am not aware of, I would be happy to look into them. Thanks!

    -sebastian

  4. Chris Mungall reporter

    For general purpose python manipulation of ontologies, a json format is the way to go. Sometimes I use jena to make jsonld but the direct translation is not very programmer-friendly. We tend to use bbop-graph a lot for our stack but embarrassingly we don't have a direct conversion from an ontology yet. Having said that the OLS API sounds like it's fine for your purposes.

    Also for annotation, I see how going one uniprot ID at a time is useful for WD update purposes. Though if you did want to go a static file approach with python wrappers we can talk about that.

    Another tangent here: some groups associate GO terms to the gene, some to the protein (and some to the transcript, but the utility of this is questionable for non-ncRNA). For protein associations sometimes an isoform ID is used if the function is isoform specific, but otherwise the protein ID stands in as a kind of superclass of all the isoforms. There are some gotchas you probably don't care about now, I expect you just want to spread annotations from reference proteins to genes as a post-processing step and either ignore isoforms or treat it as a qualifier for now.

  5. Log in to comment