Commits

Sean Davis committed 84b92ae

Added pubmed query functionality, closes issue #1

  • Participants
  • Parent commits 218c5cd

Comments (0)

Files changed (5)

 Package: Rpressa
 Type: Package
 Title: A collection of useful code for bioinformatics
-Version: 1.0.3
-Date: 2010-06-14
+Version: 1.0.4
+Date: 2012-07-17
 Author: Sean Davis <sdavis2@mail.nih.gov>
 Maintainer: Sean Davis <sdavis2@mail.nih.gov>
 Description: A collection of useful code snippets and classes for sequencing, microarray, and general bioinformatics.
-Depends: methods, Biobase, Biostrings, affy, IRanges, ShortRead
+Depends: methods, Biobase, Biostrings, affy, IRanges, ShortRead, XML
 Suggests: CGHbase
 License: GPL-2
 LazyLoad: yes
+pubmedQuery = function(search) {
+  library(XML)
+   return(xmlRoot(xmlTreeParse(sprintf("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%s&tool=R",search))))
+}
+
+getQueryCounts = function(genes,terms) {
+  retmat = matrix(rep(0,length(terms)*length(genes)),nc=length(terms))
+  colnames(retmat)=terms
+  rownames(retmat)=genes
+  pb = txtProgressBar(max=length(terms)*length(genes),style=3)
+  i = 0
+  for(gene in genes) {
+    for(term in terms) {
+      i=i+1
+      setTxtProgressBar(pb,i)
+      search = sprintf("%s AND %s",term,gene)
+      doc=pubmedQuery(search)
+      if(length(getNodeSet(doc,'//ErrorList'))>0) {
+        # If we are here, an error or "not found" occurred.
+        retmat[gene,term]=0
+      } else {
+        retmat[gene,term]=as.integer(xmlValue(getNodeSet(doc,"/eSearchResult/Count")[[1]]))
+      }
+    }
+    Sys.sleep(0.33)
+  }
+  return(retmat)
+}

File man/getQueryCounts.Rd

+\name{getQueryCounts}
+\alias{getQueryCounts}
+\title{
+Return query counts for batch PubMed queries
+}
+\description{
+Trying to discover the biological meaning of a set of genes can
+be challenging.  This function simply does batch PubMed queries
+and returns a matrix of record counts in matrix form with genes
+as rows and terms as columns.
+}
+\usage{
+getQueryCounts(genes, terms)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{genes}{
+    A character vector of gene names (or miRNAs, etc.).  These will
+    become the rows of the query count matrix returned.
+  }
+  \item{terms}{
+    A character vector of terms to pair with each gene.  These will
+    become the columns of the query count matrix returned.
+  }
+}
+\details{
+%%  ~~ If necessary, more details than the description above ~~
+}
+\value{
+A numeric matrix with column names taken from "terms" and 
+row names taken from "genes".  Each entry in the matrix
+is the count of records in PubMed for that pair of gene/term.
+}
+\references{
+\url{http://www.ncbi.nlm.nih.gov/pubmed/}
+}
+\author{
+  Sean Davis <sdavis2@mail.nih.gov>
+}
+\seealso{
+\code{\link{pubmedQuery}}
+}
+\examples{
+genes = c('BRCA1','AKT','PIK3CA','CDKN2A')
+terms = c('cancer','cell cycle','tumor suppressor','oncogene')
+qcounts = getQueryCounts(genes,terms)
+qcounts
+}
+\keyword{ misc }

File man/pubmedQuery.Rd

+\name{pubmedQuery}
+\alias{pubmedQuery}
+\title{
+Return results of a PubMed query
+}
+\description{
+Given a search string, perform a PubMed query and return the XML result.
+}
+\usage{
+pubmedQuery(search)
+}
+\arguments{
+  \item{search}{
+  A single seach string as entered in PubMed.
+  }
+}
+\value{
+The raw XML result from an eSearch call to PubMed.
+}
+\references{
+See \url{http://www.ncbi.nlm.nih.gov/books/NBK25500/} for details.
+}
+\author{
+Sean Davis <sdavis2@mail.nih.gov>
+}
+\seealso{
+\code{\link{getQueryCounts}}
+}
+\examples{
+search = "science[journal] AND breast cancer AND 2008[pdat]"
+res = pubmedQuery(search)
+res
+}
+\keyword{ misc }

File man/targeted.Rd

-\name{targeted}
-\alias{targeted}
-\docType{data}
-\title{
-Sample AlignedRead data set for targeted sequencing application
-}
-\description{
-This is a dataset that was generated using in-solution capture technology to capture some exons of genes of interest.  The regions of capture are described in another related dataset.
-}
-\usage{data(targeted)}
-\format{
-  The format is:
-Formal class 'AlignedRead' [package "ShortRead"] with 8 slots
-  ..@ chromosome  : Factor w/ 66767 levels "0:0:1","0:0:10",..: 66767 66767 22181 7859 66767 66767 66767 66767 66767 66767 ...
-  ..@ position    : int [1:6349344] NA NA NA NA NA NA NA NA NA NA ...
-  ..@ strand      : Factor w/ 3 levels "-","+","*": NA NA NA NA NA NA NA NA NA NA ...
-  ..@ alignQuality:Formal class 'NumericQuality' [package "ShortRead"] with 1 slots
-  .. .. ..@ quality: int [1:6349344] 0 0 0 0 0 0 0 0 0 0 ...
-  ..@ alignData   :Formal class 'AlignedDataFrame' [package "ShortRead"] with 4 slots
-  .. .. ..@ varMetadata      :'data.frame':	7 obs. of  1 variable:
-  .. .. .. ..$ labelDescription: chr [1:7] "Analysis pipeline run" "Flow cell lane" "Flow cell tile" "Cluster x-coordinate" ...
-  .. .. ..@ data             :'data.frame':	6349344 obs. of  7 variables:
-  .. .. .. ..$ run      : Factor w/ 1 level "90814": 1 1 1 1 1 1 1 1 1 1 ...
-  .. .. .. ..$ lane     : int [1:6349344] 5 5 5 5 5 5 5 5 5 5 ...
-  .. .. .. ..$ tile     : int [1:6349344] 1 1 1 1 1 1 1 1 1 1 ...
-  .. .. .. ..$ x        : int [1:6349344] 0 0 0 0 0 0 0 0 0 0 ...
-  .. .. .. ..$ y        : int [1:6349344] 1947 1976 623 330 552 90 133 315 410 447 ...
-  .. .. .. ..$ filtering: Factor w/ 2 levels "Y","N": 2 2 2 2 2 2 2 2 2 2 ...
-  .. .. .. ..$ contig   : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
-  .. .. ..@ dimLabels        : chr [1:2] "readName" "alignColumn"
-  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots
-  .. .. .. .. ..@ .Data:List of 1
-  .. .. .. .. .. ..$ : int [1:3] 1 1 0
-  ..@ quality     :Formal class 'SFastqQuality' [package "ShortRead"] with 1 slots
-  .. .. ..@ quality:Formal class 'BStringSet' [package "Biostrings"] with 5 slots
-  .. .. .. .. ..@ super          :Formal class 'BString' [package "Biostrings"] with 6 slots
-  .. .. .. .. .. .. ..@ xdata          :Formal class 'RawPtr' [package "IRanges"] with 2 slots
-  .. .. .. .. .. .. .. .. ..@ xp                    :<externalptr> 
-  .. .. .. .. .. .. .. .. ..@ .link_to_cached_object:<environment: 0x566412c> 
-  .. .. .. .. .. .. ..@ offset         : int 0
-  .. .. .. .. .. .. ..@ length         : int 253973760
-  .. .. .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. .. .. .. .. ..@ metadata       : list()
-  .. .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
-  .. .. .. .. .. .. ..@ start          : int [1:6349344] 1 41 81 121 161 201 241 281 321 361 ...
-  .. .. .. .. .. .. ..@ width          : int [1:6349344] 40 40 40 40 40 40 40 40 40 40 ...
-  .. .. .. .. .. .. ..@ NAMES          : NULL
-  .. .. .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. .. .. ..@ elementType    : chr "integer"
-  .. .. .. .. .. .. ..@ metadata       : list()
-  .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. .. .. ..@ metadata       : list()
-  ..@ sread       :Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
-  .. .. ..@ super          :Formal class 'DNAString' [package "Biostrings"] with 6 slots
-  .. .. .. .. ..@ xdata          :Formal class 'RawPtr' [package "IRanges"] with 2 slots
-  .. .. .. .. .. .. ..@ xp                    :<externalptr> 
-  .. .. .. .. .. .. ..@ .link_to_cached_object:<environment: 0x566412c> 
-  .. .. .. .. ..@ offset         : int 0
-  .. .. .. .. ..@ length         : int 253973760
-  .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. .. .. ..@ metadata       : list()
-  .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
-  .. .. .. .. ..@ start          : int [1:6349344] 1 41 81 121 161 201 241 281 321 361 ...
-  .. .. .. .. ..@ width          : int [1:6349344] 40 40 40 40 40 40 40 40 40 40 ...
-  .. .. .. .. ..@ NAMES          : NULL
-  .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. ..@ elementType    : chr "integer"
-  .. .. .. .. ..@ metadata       : list()
-  .. .. ..@ elementMetadata: NULL
-  .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. ..@ metadata       : list()
-  ..@ id          :Formal class 'BStringSet' [package "Biostrings"] with 5 slots
-  .. .. ..@ super          :Formal class 'BString' [package "Biostrings"] with 6 slots
-  .. .. .. .. ..@ xdata          :Formal class 'RawPtr' [package "IRanges"] with 2 slots
-  .. .. .. .. .. .. ..@ xp                    :<externalptr> 
-  .. .. .. .. .. .. ..@ .link_to_cached_object:<environment: 0x566412c> 
-  .. .. .. .. ..@ offset         : int 0
-  .. .. .. .. ..@ length         : int 0
-  .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. .. .. ..@ metadata       : list()
-  .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
-  .. .. .. .. ..@ start          : int [1:6349344] 1 1 1 1 1 1 1 1 1 1 ...
-  .. .. .. .. ..@ width          : int [1:6349344] 0 0 0 0 0 0 0 0 0 0 ...
-  .. .. .. .. ..@ NAMES          : NULL
-  .. .. .. .. ..@ elementMetadata: NULL
-  .. .. .. .. ..@ elementType    : chr "integer"
-  .. .. .. .. ..@ metadata       : list()
-  .. .. ..@ elementMetadata: NULL
-  .. .. ..@ elementType    : chr "ANYTHING"
-  .. .. ..@ metadata       : list()
-}
-\source{
-The data were generated in the Genetics Branch, National Cancer Institute, National Institutes of Health by Ogan Abaan.
-}
-
-\examples{
-data(targeted)
-targeted
-}
-\keyword{datasets}