Wiki
Clone wikienterobase-web / EnteroBase Backend Pipeline: MetaParser
Top level links:
- Main top level page for all documentation
- EnteroBase Features
- Registering on EnteroBase and logging in
- Tutorials
- Using the API
- About the underlying pipelines and other internals
- How schemes in EnteroBase work
- FAQ
MetaParser
- MetaParser
- Overview
- API
- MetaParser URI
- Download metadata with SRA run accession code
- Download metadata with sample accession code
- Downloading sets of metadata
- Downloading metadata associated with assemblies
- Return EnteroBase designation for a NCBI source identifier
- Return EnteroBase designation for a Google geographic category
- Other available methods
Overview
MetaParser implements the automated downloading of:
- the automated downloading of all GenInfo Identifiers (GI numbers) in NCBI with the genus designation Salmonella, Escherichia / Shigella, Yersinia or Moraxella, and the corresponding metadata (via ENTREZ utilities)
- parsing of the metadata into a consistent, EnteroBase format
In order to accomplish these tasks, MetaParser provides a standalone RESTful service which responds to JSON requests for downloading metadata from NCBI that are associated with short reads archives or assemblies. MetaParser also re-formats the metadata into standard EnteroBase formats.
Given an API command, MetaParser:
- Downloads metadata associated with genomic SRAs.
or
- Downloads metadata associated with genomic assemblies.
- Automatically parses downloaded metadata using the Natural Language Toolkit library in Python, and assigns source information to three EnteroBase categories:
- Source Niche
- Source Type
- Source Detail
- Automatically parses geographic information using Google Geocoding API, and assign it to five EnteroBase categories:
- Continent
- Country
- first-level administrative division (Province/State)
- Second-level administrative division (County/Municipality)
- City
API
MetaParser URI
In the examples below, the MetaParser URI is configuration dependent, depending on which system MetaParser runs.
Download metadata with SRA run accession code
Metadata with a given SRA run accession code may be downloaded with the meta
query method.
An example is provided below of downloading metadata with the SRA run accession codes SRR1664288 and SRR1664287.
An HTTP GET request is made to the URL
http://<MetaParser Host>/ET/meta/metadata?run=SRR1664288,SRR1664287
Download metadata with sample accession code
Metadata can also be downloaded given a sample accession code using the meta
query method. For example,
in order to download the metadata for sample accession code SRS753484 make a GET request to the URL.
http://<MetaParser Host>/ET/meta/metadata?sample=SRS753484
Downloading sets of metadata
It is possible to download sets of metadata using the dump
query method.
Also, it is possible to download a set of metadata that is most recent for some query. This is useful, for example,
when keeping up to data a local archive of some subset of the data. For example, in order to download all
metadata released over the last 2 days for the specified taxa (i.e. organism "Salmonella") use the URL below in
a GET request. (The "reldate" parameter specifies the number of days in the past that we want data for.)
http://<MetaParser Host>/meta/dump?organism=Salmonella&reldate=2
The above example also illustrates the use of the "organism" parameter which is used in another example below. This example also illustrates the use of fetching the data in chunks - using the "start" and "num" parameters - in order to walk through the data resulting from a query. In order to download the first 10 sets of metadata for the specified taxa (useful for pagination) make a GET request to the URL:
http://<MetaParser Host>/ET/meta/dump?organism=Salmonella&start=0&num=10
Downloading metadata associated with assemblies
The metadata associated with assemblies may also be downloaded using the assembly
query method
and this can use some of the same
parameters. For example, in order to download metadata associated with the first 10 genomic assemblies
for the specified taxa (i.e "Salmonella") make a GET request to
http://<MetaParser Host>/ET/meta/assembly?term=Salmonella&start=0&num=10
Return EnteroBase designation for a NCBI source identifier
The EnteroBase designation for a NCBI source identifier may be obtained using the host_format
query
method. For example, in order to obtain the EnteroBase designation for the NCBI source identifier "tuna"
make a GET request to
http://<MetaParser Host>/ET/meta/api/host_format?raw=tuna
Return EnteroBase designation for a Google geographic category
Queries may be made for the EnteroBase designation for a Google geographic category using the geo_format
query method. For example, in order to obtain the EnteroBase designation for "London" make a GET request
to the URL
http://<MetaParser Host>/ET/meta/api/geo_format?raw=London
Other available methods
Automated assignments can be edited by curators using the api/host_curation
and api/batch_curation
endpoints.
Updated